On the Atmel SAM-D21 series microcontrollers, many peripherals use a clock which is asynchronous to the main CPU clock, and accesses to these peripherals must go through synchronization logic; on peripherals whose clock is slow relative to the CPU time, this can add some really huge delays. For example, if the RTC is configured to use a 1024Hz clock (as appears to be the design intention) and the CPU is running at 48Mhz, reading the "current time" register will cause bus logic to insert over 200,000 wait states (a minimum of five cycles of the 1024Hz clock). Although it's possible to have the CPU issue a read request, execute some other unrelated code, and return 200,000+ cycles later to fetch the time, there doesn't seem to be any way to actually read the time any faster. What is the synchronization logic doing that would take so long (the time is specified as being at least 5 ⋅ PGCLK + 2 ⋅ PAPB and at most 6 ⋅ PGCLK + 3 ⋅ PAPB)?
By my understanding of synchronization, a single-bit synchronizing circuit will delay a signal by 2-3 cycles of the destination clock; synchronizing a multi-bit quantity is a little harder, but there are a variety of approaches which can guarantee reliable behavior within five cycles of the destination clock if it's faster than the source clock, and only a few cycles more if it's not. What would the Atmel SAM-D21 be doing that would require six cycles in the source clock domain for synchronization, and what factors would favor a design whose synchronization delays are so long enough as to necessitate a "synchronization done" interrupt, versus one which ensures synchronization delays are short enough as to render such interrupts unnecessary?