To elaborate on your external clock idea: of course it's possible to use an external input as a clock, it just makes very little sense doing so for audio, unless you can use this clock for the whole design. If you implement a digital circuit with two independent clocks, you will need to create a block which transfers data between clock domains. Such transfers have an inherent problem known as metastability which essentially means if two independent clock edges arrive with a certain timing, they will violate the setup/hold times on your triggers, which can then output an uncertain logic level until the next clock pulse.
As a result, your circuit will never be 100% reliable, and you will have an MTBF (mean time between failures) metric associated with your inter-domain data transfer. To get a working design, people increase this MTBF to a high enough value: 1 failure per billion years sounds acceptable, doesn't it? Increasing MTBF is done by chaining several synchronization stages of flipflops in a sort of FIFO, so it has a price in terms of resources used to implement the flipflops, and the latency in data transfers.
Sampling the clock pin (as @alex suggests) has the same metastability problem: if the edge arrives at an unfortunate time, you can end up missing your single-cycle pulse, or generating two pulses instead of one. However, the problem will be much less visible in the first place, since only 1 bit is affected, and the cost of dealing with it (if needed) in terms of resources will be proportionally smaller: in your case, 4 times as small compared to the 4 data inputs.