Substantial edit--note that David Kessner's answer was written in response to the original posting; view the edit history to see what he was responding to
From what I've read of digital design, there is a very strong tendency toward the use of strictly synchronous circuits in which the only 'sequential' subsystems are flip flops which share a common clock. Signals which cross between clock domains almost always require double synchronizers.
I've seen a number of articles that suggest that fully-asynchronous designs are very hard, and are prone to having unforeseen pitfalls. I can certainly appreciate that if the inputs to any type of latching element have no specified timing relationship, it's mathematically impossible to absolutely guarantee anything about the output, and that even getting things to the point where odd behaviors are unlikely enough that, for practical purposes, they don't happen is often difficult without a double synchronizer.
A number of blogs also talk about the evils of gated clocks, and suggest that it is much better to feed an ungated clock to a latch along with a "latch enable" signal, than to gate the clock. Gated clocks not only require great care in their implementation to avoid 'runt' clock pulses, but unless extreme care is taken to balance out delays, circuits operated from separately-gated clocks must be viewed as being in their own clock domain.
What I haven't seen discussed much is the notion of circuits which use sequential subsystems that aren't all triggered by the same clock, but will always be stable within a certain duration of a clock edge. If one is trying to implement something like an N-bit event counter, having many flip flops all driven by a common clock will require, at minimum, charging and discharging the gates of 2N transistors with every clock transition. If one were to instead use a 'ripple' arrangement for the first few stages, one could substantially reduce the frequency of signals reaching the upper stages, thus reducing current consumption.
I've seen a few processors that feature an asynchronous prescalar stage on the input of a counter, but none of the prescalars I've seen allow for the processor to read them. Further, nearly all of the chips I've seen that have such prescalars make it impossible to write to the timer value without clearing the prescalar. My suspicion is that on many such devices, the prescalar does not actually clock the main counter, but instead is used to determine, on any given cycle of the system clock, whether or not the counter should be advanced. While some such systems provide a mode in which one of the counters may be set to "fully asynchronous" mode, allowing operation within sleep, it tends to avoid gaining or losing counts if one needs to use the timers for anything other than a full-period overflow and have them count consistently when switching between waking and sleeping.
It would seem that some of these problems could be eased by the use of a graycode counter, and that the implementation of such a counter could be eased by the use of a "semi-synchronous" design as described above. It's possible to design a relatively compact and fast asynchronous bidirectional quadrature-input graycode counter which will tolerate metastability on either input as long as the other is stable (during the time one input is metastable, one output will be undefined; provided the metastable input stabilizes before the other input has a transition, the output will resolve itself to the proper state). The outputs would not be synchronous to any particular clock, but if the inputs change on a particular clock edge, the relationship to the outputs would be predictable. Has anyone ever heard of such a circuit being used?