Oliver, you seem to be conflating a number of fundamental concepts all together. This is why you are getting pushback as your question lack focus and to give a meaningful answer would be difficult.
First up, 'synchronous microarchitecture' are fluff words. So let's concentrate on cache miss and pipeline stall.
A cache miss won't necessarily cause a pipeline stall - these are two separate mechanisms.
For the pipeline, think of a car production line. For whatever reason, incoming parts (instructions) are not available. What happens to the production line? It stalls. What does this entail? Nothing moves on the production line. Time is wasted. Obviously this is something we want to avoid due to inefficiency.
The idea of the cache is to anticipate what memory is going to be accessed and keep a fast, local copy. For our car production line example, the cache is the local store of parts (instructions). When that local store is depleted, then the next request is going to be a miss. Simply we have to wait until we get another shipment from main memory. This means our production line stalls.
Note that a pipelined processor doesn't necessarily involve a cache and a cache doesn't necessarily require a pipelined processor. This is why you want to simplify your scope.
in summary:
- The cache controller can take as long as it wants. In reality, we use a cache for a performance increase so we want it to be a fast as possible. 1 clock? maybe, but that's implementation dependant. It should be faster than a main memory access otherwise there would be no use for the cache.
What the the cpu do when it stalls? send a 'bubble' down the pipeline? Again, that's implementation dependant. The simple answer is ' the cpu does no useful work'.
As for the PC (program counter), that doesn't necessarily update each clock. Once again, implemenatation dependant.
- Does the cache controller inform the cpu when the memory request is satisified? Yes. There clearly needs to be some mechanism for this to occur. How else would the cpu know to continue? If the main memory cycle time was fixed, maybe we could assume X clock cycles and no need a handshake. Implementation dependant.