Can an FPGA design be mostly (or completely) asynchronous?

Question

We had a very short FPGA/Verilog course at university (5 years ago), and we always used clocks everywhere.

I am now starting out with FPGAs again as a hobby, and I can't help but wonder about those clocks. Are they absolutely required, or can an FPGA-based design be completely asynchronous? Can one build a complex bunch of logic and have stuff ripple through it as fast as it can?

I realise that there are a whole lot of gotchas with this, like knowing when the signal has propagated through all parts of the circuit and the output has stabilised. That's beside the point. It's not that I want to actually build a design that's entirely asynchronous, but just to improve my understanding of the capabilities.

To my beginner eye, it appears that the only construct that absolutely requires a clock is a reg, and my understanding is that a typical FPGA (say, a Cyclone II) will have its flip-flops pre-wired to specific clock signals. Is this correct? Are there any other implicit clocks like this and can they typically be manually driven by the design?

I know Simon Moore at the university of Cambridge did a lot of research into asynchronous design, including getting a test chip fabricated. It requires an entirely new set of design tools, and has strange side effects: execution speed inversely proportional to temperature, for example. — pjc50, Dec 08 '10 at 15:32
This is actually a research field. See for example, this paper: [Dynamics of analog logic-gate networks for machine learning](https://aip.scitation.org/doi/10.1063/1.5123753?af=R&ai=1gvoi&mi=3ricys&feed=most-recent) — Julien Siebert, Mar 08 '23 at 14:27

score 29 · Accepted Answer · answered Dec 07 '10 at 13:31

29

A short answer would be: yes; a longer answer would be: it is not worth your time.

An FPGA itself can run a completely asynchronous design no problem. The result you get is the problem since timing through any FPGA is not very predictable. The bigger problem is the fact that your timing and resultant design will almost definitely vary between different place and route sessions. You can put in constraints on individual asynchronous paths making sure that they do not take too long, but I'm not quite sure that you can specify a minimum delay.

In the end it means that your design will be unpredictable and potentially completely variable with even a slight design change. You'd have to look through the entire timing report every time you change anything at all just to make sure that it would still work. On the other hand, if the design is synchronous, you just look for a pass or fail at the end of place and route (assuming your constraints are setup properly, which doesn't take long at all).

In practice people aim for completely synchronous designs but if you need to simply buffer or invert a signal, you don't need to go through a flip flop as long as you constrain it properly.

Hope this clears it up a bit.

answered Dec 07 '10 at 13:31

Andrey

936
5
7

3

I had to use some devices with asynchronous FPGA designs. They were hard to work with. Please at least use timing constraints – Tim Williscroft Dec 07 '10 at 22:00
1

While it's true that it's possible to implement asynchronous designs with an FPGA, most FPGAs are built to support specifically synchronous designs. They have plenty of resources (PLLs, clock distribution circuits, and huge amount of flip-flops) which will be wasted in an asynchronous design. – Dmitry Grigoryev Dec 09 '15 at 11:42
4

This answer doesn't provide particularly good advice. You can create a clockless FPGA and it actually simplifies place and route, removes a ton of problems regarding timing requirements, and due to fine grained pipelining can have measurably higher throughput. The real problem comes when you try to map a clocked circuit to a clockless FPGA because they have very different timing characteristics. It can be done, it just requires a bit more front-end processing to do the conversion. http://vlsi.cornell.edu/~rajit/ps/rc_overview.pdf – Ned Bingham Mar 02 '17 at 19:06
You CAN do delay-insensitive design. I designed a small circuit that stores a bit in a flip-flop and raises a signal when it detects that the bit has been stored. It also detects if the bit has actually been received (as opposed to appearing as a zero because of delay sending a 1), and is immune to glitches. The circuits have to communicate via handshake and use these kinds of components to interact; the circuits themselves just wait for their output to be complete and then do all the communications. No clock. – John Moser Apr 05 '20 at 04:55

score 26 · Answer 2 · answered Jan 15 '11 at 21:38

26

"Can one build a complex bunch of logic and have stuff ripple through it as fast as it can?" Yes. Entire CPUs have been built that are completely asynchronous -- at least one of them was the fastest CPU in the world. http://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchronous_CPU

It irks me that people reject asynchronous design techniques, even though they theoretically have several advantages over synchronous design techniques, merely because (as others here have said) asynchronous designs are not as well supported by the available tools.

To me, that's like recommending that all bridges be made out of wood, because more people have woodworking tools than steel-working tools.

Fortunately, some of the advantages of asynchronous design can be gained while still using mostly synchronous design techniques by using a global asynchronous local synchronous (GALS) design.

answered Jan 15 '11 at 21:38

davidcary

17,426
11
66
115

I feel exactly the same way about the modern tendency to [route PCBs on a square grid](http://electronics.stackexchange.com/questions/7913/why-is-there-such-a-strong-preference-for-45-degree-angles-in-pcb-routing), although the benefits of migration are much less significant. – Roman Starkov Jan 16 '11 at 00:30
@romkyns - That's more down to the fact that writing PCB software that uses non-rectilinear grids is *hard*. – Connor Wolf Feb 28 '11 at 23:39
I just stumbled upon this answer of yours to an earlier question. GALS seems to be a term for designs which take a number of synchronous blocks and interconnect them even though they are asynchronous to each other. Is there a term for devices which are clocked by different clocks which have a known timing relationship (e.g. rising edge of clock X (X+) will no later than rising edge of Y (Y+), and will occur significantly before the falling edge of Y (Y-); X+ can be used to clock data derived from data clocked by Y+ but not vice versa; Y- clocks data derived from X+). – supercat Sep 08 '11 at 18:19
1

@supercat: I suspect you're alluding to [four-phase logic](http://en.wikipedia.org/wiki/four-phase_logic). It's one of the multi-phase [clock signals](http://en.wikipedia.org/wiki/Clock_signal) that seems to be forgotten. – davidcary Sep 11 '11 at 03:33
I wasn't thinking of dynamic logic. I was simply thinking of how to ensure proper causal relationships with clock signals that might be slightly skewed. If a rising edge of clock #2 is derived by combining clock #1 with some other logic, such that it will occur after a rising edge of clock #1, using a rising edge of clock #1 to latch a signal that changes on a rising edge of clock #2 would generate a race condition. Using a falling edge of clock #2 instead should be safe. – supercat Sep 11 '11 at 04:07
@supercat: Right. Perhaps you're thinking of systems with a [two-phase clock](http://en.wikipedia.org/wiki/Clock_signal#Two-phase_clock) or some other multi-phase clock system. Let me know if you find a better term for these systems. – davidcary Oct 12 '11 at 04:08
1

@davidcary: Sort of, except both "phases" on one wire--one phase being controlled by the rising edge, and one by the falling edge. Basically, I'd divide latch clocks into four categories: clean rising, clean falling, late rising, late falling. Latches clocked by (L/CB) a clean rising or falling edge could take data from any rising or falling edge. L/CB a late rising edge could take data from L/CB clean rising edge any falling edge. L/CB by late falling edge could take data from L/CB clean falling or any rising. – supercat Oct 12 '11 at 15:00
1

@davidcary: Provided that the fastest propagation time for any latch exceeds the longest hold time, and provided that the longest signal path from a clock edge, through clock gating logic and "late" latches triggered by that edge, to any latch triggered by the following edge, does not exceed the minimum time between clock edges, I would think such a design should be completely reliable and free of internally-generated metastability under any combination of propagation delays. – supercat Oct 12 '11 at 15:05
@supercat -- interestingly, I recently designed a processor that uses this kind of approach: I have a single clock input which runs multiple phases in order (1) allow multiple register updates per cycle while using register file design that only has a single input port and (2) have a two-stage pipeline complete from instruction read to register writeback in 1.5 cycles so that pipeline hazards only last a single instruction length. There's an overview [here](https://hackaday.io/project/159003-c61) and a more detailed writeup here [this forum thread](https://hackaday.io/project/159003-c61). – Jules Jul 15 '18 at 13:04
I'm working on a PCB layout for it as and when I find time (so far it's only simulated) but I'm using an inverter with a capacitance added to its output to provide a slightly delay on the register file write clock in order to allow time for everything else to finish before the result is actually written. In simulation everything looks good... we'll see what happens when I have an actual board for it. :) – Jules Jul 15 '18 at 13:07
@Jules: Thank you. That CPU design looks fascinating. – davidcary Jul 17 '18 at 22:16

score 6 · Answer 3 · answered Feb 28 '11 at 22:32

One factor not yet mentioned is metastability. If a latching circuit is hit with a sequence of input/transitions such that the resulting state would depend upon propagation delays or other unpredictable factors, there is no guarantee that the resulting state will be a clean "high" or "low". Consider, for example, an edge-triggered flip flop which is currently outputting a "low", and has its input change from low to high at almost the same time as a clock edge arrives. If the clock edge happens long enough before the input change, the output will simply sit low until the next clock edge. If the clock edge happens long enough after the input change, the output will quickly switch once from low to high and stay there until the next clock edge. If neither of those conditions applies, the output can do anything. It might stay low, or quickly switch once and stay high, but it might stay low for awhile and then switch, or switch and then some time later switch back, or switch back and forth a few times, etc.

If a design is fully synchronous, and all the inputs are double-synchronized, it is very unlikely that a timing pulse would hit the first latch of a synchronizer in such a way as to cause it to switch at the perfect time to confuse the second latch. In general, it is safe to regard such things as "just won't happen". In an asynchronous design, however, it is often much harder to reason about such things. If a timing constraint on a latching circuit (not just flip flops, but any combination of logic that would act as a latch) is violated, there's no telling what the output will do until the next time there's a valid input condition that forces the latch to a known state. It is entirely possible that delayed outputs will cause the timing constraints of downstream inputs to be violated, leading to unexpected situations, especially if one output is used to compute two or more inputs (some may be computed as though the latch was high, others as though it were low).

The safest way to model an asynchronous circuit would be to have almost every output circuit produce an "X" output for a little while whenever it switches between "0" and "1". Unfortunately, this approach often results in nearly all nodes showing "X", even in cases which would in reality have almost certainly resulted in stable behavior. If a system can work when simulated as having all outputs become "X" immediately after an input changes, and remain "X" until the inputs are stable, that's a good sign the circuit will work, but getting asynchronous circuits to work under such constraints is often difficult.

score 6 · Answer 4 · answered Dec 07 '10 at 16:01

6

Yes, you can. You can ignore the flipflops completely and build it all out of LUTs. And/or you can use the state elements of most Xilinx FPGAs as (level triggered) latches instead of (edge-triggered) flipflops.

answered Dec 07 '10 at 16:01

Martin Thompson

8,439
1
23
44

4

A danger with that is that unless one restricts the logic compiler, it may produce logic which has *negative* propagation time for some gates. For example, if one specifies `X=(someComplexFormula)` and `Y=X & D`, and if the compiler substitutes that formula for X and determines that `X & D` is equivalent to `A & D`, the compiler might replace compute Y in terms of A and D, rather than in terms of X, thus allowing the computation of Y to proceed faster than that of X. Such substitutions are valid with combinatorial logic, but wreak havoc on asynchronous sequential logic. – supercat Mar 02 '15 at 15:06
1

@supercat - I've never worked with Xilinx's tools, but when I've worked with Altera FPGAs, you've always had the option of specifying any critical paths as connected gate modules rather than in RTL, at which point any such optimisations are disabled. – Jules Jul 15 '18 at 13:31
@Jules: All of my programmable-logic designs have used Abel, which is a somewhat goofy language, but makes it possible to specify things in ways that some CPLDs can implement, but which might pose difficulties for a VHDL or Verilog synthesis tool. For example, on one of my projects, I exploited the fact that Xilinx parts have clock, async set, and async reset, to implement an async-loadable shift register. If I need to do such things in an FPGA, having never used Verilog or VHDL, how should I learn what's needed to do that? BTW, if memory serves, I used T flops for the shifter, and... – supercat Jul 16 '18 at 17:29
...the timing was such that the async write could only occur at times when the T input would be low, assuming that if a nop-clock occurred near the start of a write pulse, the async write would extend far enough beyond it as to ensure a stable value, and if the nop-clock occurred near the end, it would simply be latching a still-stable value. I'm not sure how one could efficiently handle such cases in VHDL or Verilog. – supercat Jul 16 '18 at 17:33
1

@supercat - taking a similar problem, looking at the Cyclone IV Device Handbook I see that the best approach to the same problem would be to use the "LAB-wide synchronous load" option (a "LAB" is a group of 16 logic elements, so if the size of such a register doesn't end up a multiple of 16 bits then some bits will be wasted, but this seems the most useful option anyway). I now have two options: I can write functional verilog that will require the synthesis tool to pick a way of implementing the register required (which would usually be the best option), or, if I have strict timing ... – Jules Jul 16 '18 at 23:47
1

... requirements I can force it to hard wire this: looking through the [list of available low-level modules on the device](https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/catalogs/lpm.pdf) I find `lpm_ff` can implement a d- or t-type flip flop with synchronous load. By using this module I can be sure that these functions will be exactly mapped to the low-level features of the device without the potential that they're optimized away. – Jules Jul 16 '18 at 23:53

score 5 · Answer 5 · answered Dec 09 '15 at 14:19

Really there are THREE types of designs.

Combinatorial. There are no clocks and no feedback paths and the system has no "memory". When one or more input changes the changes ripple though the logic. After some time the output settles into a new state where it remains until the inputs change again.
Synchronous sequential. A system is built out of registers and blocks of combinatorial logic, the registers are clocked by a small number (often 1) of clocks. If there are multiple clocks then special precuations may be needed on signals that pass from one clock domain to another.
Asynchronous sequential. There are feedback paths, latches, registers or other elements that give the design memory of past events and that are not clocked by easilly analysed clock lines.

In general when synthisizing/optimising combinatorial logic the tools will assume that all that matters is what the final result is and the maximum time needed to settle on that result.

You can build a design that is purely combinatorial and it will get to the right result. The outputs may change in any order and may change several times before reaching their final values. Such designs are very wasteful of logic resources. Most logic elements will spend most of their time sitting idle whereas in a sequential system you could have reused those elements to process multiple data items.

In a sequential synchronous system all that matters is that the outputs of the combinatorial block have settled to their correct state when they are clocked into the next flip flop. It doesn't matter what order they change in or whether they are glitches along the way. Again the tools can easilly turn this into logic that provided the clock is slow enough gives the right answer (and they can tell you if the clock you want to use is slow enough).

In an asynchronous sequential system those assumptions go out of the window. Glitches can matter, order of output changes can matter. Both the tools and the FPGAs themselves were designed for synchronous designs. There has been much discussion (google asynchrous FPGA design if you want to know more) about the possibility of implementing asynchrnous systems either on standard FPGAs or on specilly designed ones but it still lies outside mainstream accepted design practice

score 4 · Answer 6 · answered Dec 07 '10 at 14:05

Of course if your design requirements are slow enough that a lot of internal delays are still orders of magnitude longer than times that you care about, then it's not a problem, and you can look at the timing report to keep an eye on this, but there's a limit to what you can usefully do with no internal state info. If you just want to make something like a 100 input multiplexer then fine, just remember that each input will have a different propagation delay. In fact you may get some interesting and chaotic effects with large numbers of unpredictable-delay oscillating feedback loops - maybe a fully async FPGA based synthesiser could be the next 'analogue'..

score 4 · Answer 7 · answered Dec 07 '10 at 15:56

4

As @Andrey pointed out it is not worth your time. Specifically the tools don't do this, so you would be completely on your own. Plus since they have built-in registers, you wouldn't save anything by not using them.

answered Dec 07 '10 at 15:56

Brian Carlton

13,252
5
43
64

score 2 · Answer 8 · answered Dec 07 '10 at 13:31

2

Yes. If you have no process type constructs then it shouldn't do things like inferring registers. There will be things like onboard memory that require clocks, although if you really want to you could probably generate these asynchronously.

answered Dec 07 '10 at 13:31

mikeselectricstuff

10,655
33
34

1

Did you mean to make this a single answer? – Kevin Vermeer Dec 07 '10 at 21:33

score 1 · Answer 9 · answered Jul 15 '18 at 12:16

1

FWIW I thought I should add that one obvious goal in asynchronous logic solutions woukd be the global reduction in power consumption.

Those global clock/PLL/buffers burn lots of Joules.

As FPGA solutions meander into battery powered arenas (e.g, Lattice Icestick) then this aspect will acquire much more attention.

answered Jul 15 '18 at 12:16

Harrygoz

11
1

This is a good point, although it's worth considering that a badly designed combinatorial circuit performing the same operation as a sequential circuit could in some cases make a lot of transient transitions as partial results are calculated and the final output is updated to account for them, and in CMOS circuits (as most FPGAs are) power consumption is roughly proportional to the number of transitions. Clocks can cause unnecessary transitions, but you can also make a lot of power reductions by disabling clocks in parts of the circuit that aren't needed at the moment. – Jules Jul 15 '18 at 13:24

Can an FPGA design be mostly (or completely) asynchronous?

9 Answers9

Linked