2

I couldn't find any info about this. Is there a general rule or does it change for all vendors(altera or xilinx)?

Lets assume I have a flip flop and I want to wire 10 flip-flops to its output. Normally, in my university lab using cheap (1$-2$) ICs, 10 was max value. Maybe same for FPGA? Or can it carry heavy loads like 2000 flip-flops wired to same output? Is this related to "drive strength" mentioned in verilog/vhdl? They dont mention a constraint.

In future, when I have an FPGA, I will try some floating-point compute accelerator that need to broadcast a variable to all cores.

As "Dave Tweed" commented, it must be "fanout" and he says its an implicit control by design tool. Any more info? How many gates are dedicated if it is implicitly driven?

I'm addressing inner (dynamic) parts of fpga which I will build some cores. Not the outer parts.

Thank you for your time.

  • 2
    The general term for what you're asking about is "fanout". In general, the FPGA design tools will handle this for you by replicating logic where necessary, so it isn't something you need to deal with explicitly. – Dave Tweed May 03 '16 at 12:19
  • Are you addressing internal or external load to the FPGA? If it's external, the rules are the same as for every IC. Each pin has a maximum current which it can drive or consume. The signal's rise and fall time depend on the maximum current and the sum of all load capacities connected to the pin. For an internal load, the rules are different. The FPGA has internal buffers to drive the short- and long-lines. If needed, the synthesis algorithms will duplicate logic, flip-flops or buffers to split the load. – Paebbels May 03 '16 at 12:27
  • 1
    In case you are curious, I have one design where one clock node has a fanout of 20000+ nodes. The largest non-global fanout in that design (i.e. standard routing) is about 2700. So you can connect many many things together. – Tom Carpenter May 03 '16 at 12:58
  • @Tom Carpenter, does it add any latency for it? I mean, is it some tree structure under implicit things? – huseyin tugrul buyukisik May 03 '16 at 13:01
  • It works out the connections itself. In my case the design runs at 250MHz and it meets the timing requirements to do so. Timing is one of the key aspects of this and the fitting tools are good enough to optimise things (location, duplication, etc.) to do its best to meet the design requirements. – Tom Carpenter May 03 '16 at 13:11

1 Answers1

1

You can send a signal on to many many other destinations (the number of destinations is called the "fanout" of the signal). The more destinations it goes to, the longer your critical timing path potentially becomes though, so the fmax of your design may suffer.

The tools will usually replicate the logic that drives those many nets if the timing becomes slower than you have requested in order to try and meet your timing target.

For Xilinx, this list of appnotes offers advice for reducing fanout when this is on your critical path (http://www.xilinx.com/support/answers/9410.html)

Martin Thompson
  • 8,439
  • 1
  • 23
  • 44
  • So it trades that availability with latency / frequency. Either I add multi level branching to replicate the output and add latency from each level or let it implicitly do that itself and look at fmax? Thank you. – huseyin tugrul buyukisik May 03 '16 at 14:07
  • If you have the tools, you can add several cycles of latency and let the tools figure out how to split it across the whole delay of the system (this is called "re-timing") – Martin Thompson May 03 '16 at 14:18
  • Well, it doesn't 'usually' duplicate the logic, unless you gives it a command to do so, other than that, it is correct. – FarhadA May 04 '16 at 10:02
  • 1
    @FarhadA good point - yes I should have written "duplicate the flipflops" (or at least that is my experience, maybe I turned on a switch in the distant past and propogated it to each new project...) – Martin Thompson May 05 '16 at 11:58