Exotic semiconductors for fast digital ASIC

Question

I am researching exotic semiconductors for a digital ASIC with a few million logic gates which should run as fast as possible within a $30 million budget. (Specifically, I need to do a single fully-parallel 4096-bit multiplication repeatedly. For more context, I am building an ASIC to compute this Verifiable Delay Function.)

It seems there are semiconductors that perform better than silicon in terms of speed of logic operations, including gallium arsenide, gallium nitride and indium phosphide. My research suggests that these semiconductors are generally used for analogue ASICs as opposed to digital logic, so it is hard to tell which semiconductors are viable for my use case.

Which semiconductor is viable for a digital ASIC with millions of logic gates (say, ~20 million gates) and can provide the fastest performance in terms of speed of logical gates?

Edits in response to comments

Budget: Our maximum budget is in the tens of millions of dollars, ~$30 million.
Speed: To quantify the speed, we ideally need someone spending $1 billion to be at most 2x faster than us. Notice the Verifiable Delay Function (VDF) is inherently sequential, so lots of parallelism does not help.
SiGe process technology: I understood GaAs can give a significant speed bump with respect to SiGe. If 100nm GaAs is faster than 7nm SiGe then size of SiGe is not relevant. As for GaAs, we're only looking to use existing process technology.
Pins: We do not need a large number of pins. The reason is there is a single 4096-bit input and a single 4096-bit output per VDF run with 10 minutes of intermediary repeated multiplications spanning 10 minutes. The I/O speed is marginal compared to the multiplication speed.
Power and cooling: The ASIC should be runnable by individuals without power supply and cooling that is much more sophisticated than a top-of-the-range GPU.
Graphics technology: As I understand, graphics technology is optimised for massively parallel computation. The ASIC we want needs to be optimised for speed of sequential computation, i.e. latency.
Obfuscation/reverse engineering: The ASIC will be developed for an open-source project (namely, Ethereum). The ASIC will itself have an open-source circuit design.
More context: See these slides that explain the use of the ASIC for a blockchain random number generator.

It mostly comes down to two things: how fast do you need to clock it at? (does it need to run at 1GHz? 10GHz, 100?) and how much do you want to spend? SiGe can hit 5GHz pretty easily even with half a billion gates [(the PC CPU overclocking record is nearly 9GHz)](https://valid.x86.fr/lpza4n) and people like Inphy make CMOS chips that can hit [28GHz (as 56Gbit PAM4 uses 28GHz of bandwidth)](https://www.inphi.com/products/optical-phy/). So... how much speed do you really, really need? — Sam, Aug 20 '18 at 22:55
Please don't say "as fast as possible"...that is absolutely meaningless as an engineering specification. How many gigawatts do you have? How many megadollars do you have? Have you investigated what technologies are used by the graphics processors, since they would seem to be in your ballpark? — Elliot Alderson, Aug 20 '18 at 23:36
This looks like a very odd request. Speed of digital logic clearly depends on feature size of switching elements - transistors. If you want something special fast, you need to make them small, to compete with 7 nm, 10 nm, 14 nm. To make new process small, functional, and repeatable, you need huge investment in technology, deep UV lithography, sophisticated etching/polishing-whatever technology, process bring up and qualification. It takes 100s of man-years and billions of dollars. Maybe you need to look at existing offers from established CMOS technology, where all this was already done? — Ale..chenski, Aug 20 '18 at 23:41
I think the other general rules for running fast are use lower voltage and have a very stable power source. It appears you are performing identity verification by performing a calculation that ordinary processor types couldn't calculate it fast enough. I see FPGAs have 26M gates in some cases, but they sure don't have 4096 pins, so I suppose you'll need to allocate some gates to producing multi pin serial output. That's a pretty large number. — K H, Aug 21 '18 at 02:47
I have edited the question to answer your queries :) Thanks! — Randomblue, Aug 21 '18 at 07:26
Good edits. You mention that a 1 gigadollar spend by a competitor should ideally only double your performance. What's the basis for this hope/estimation? Is there some reason that this would be more of a concern than someone spending their own 30 megadollars for an equal ASIC, or even less to emulate with a single-run device? — K H, Aug 23 '18 at 01:30
Part of the point is obviously to decrease the benefit of additional cores in trying to complete the calculation on time, but you should also evaluate what you can build into an ASIC that would be difficult or laborious to emulate on FPGA architecture, IE a gate structure that by nature wastes a large amount of the FPGA, or that requires more layers of bus or interconnects, some critical component, than are supported. Force your opponent into ASICs of their own to ensure their spend (including reverse engineering) exceeds your own hopefully. — K H, Aug 23 '18 at 01:34
@KH: The 1 G$ entity is an "adversary" or "attacker" in a cryptographic sense (or more precisely, cryptoeconomic sense). It is not a competitor. I'm assuming that 1 G$ is enough to build an ASIC, and a better one than a 30 M$ ASIC. At this price/performance range, FPGAs are likely irrelevant. The hope is that the 30 M$ ASIC squeezes out almost all the performance that can be had with an ASIC, within a factor of 2x. — Randomblue, Aug 23 '18 at 09:38
@KH: Regarding an attacker spending their own 30 M$ to build an equal ASIC, that would be a waste of money because they could just buy the ASIC that would be sold publicly. These slides may help understand what is going on https://docs.google.com/presentation/d/13OAGL42yzOvQUKvJJ0EBsAAne25yA7sv9RC8FfPhtyo/edit#slide=id.g70ee374122251f8f_77 — Randomblue, Aug 23 '18 at 09:41
@Randomblue That makes sense. A blockchain random number generator is what I'm reading from your slides. So I guess in order to attack your blockchain an attacker needs to quite significantly outperform, not just match you. Especially because your application is clock speed dependent, you may wish to consider liquid and or sub-ambient cooling. Pushing your asic to its limits for speed in the same type of scenario you'd use to overclock a cpu. It would likely at least be worth testing one of your asics this way once you've built it, just to be sure. — K H, Aug 23 '18 at 10:21
My guess is we can design an ASIC that runs at a reasonable clock speed (say, 1GHz) which is immune to overclocking. Immunity to overclocking could be done by designing the propagation delay between two gates in the critical path to be tight, i.e. take exactly 1ns. That way overclocking the ASIC will prevent proper operation and yield junk outputs. — Randomblue, Aug 23 '18 at 10:35
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/82123/discussion-between-k-h-and-randomblue). — K H, Aug 23 '18 at 10:36
@Sam: "SiGe can hit 5GHz pretty easily even with half a billion gates" => Can you point to SiGe ASICs with millions of gates? — Randomblue, Aug 24 '18 at 23:39
@Randomblue I was basing that on work by Intel, they have been using SiGe in the channels of their FETs since [at least 45nm (pg17 they talk about SiGe channels)](http://download.intel.com/pressroom/kits/advancedtech/pdfs/VLSI_45nm_HiKMG-presentation.pdf). Technically it's SiGe on Silicon as opposed to pure SiGe but the FETs at least are SiGe, the overclocking record is nearly 9GHz from back in 2012. E2V makes ADCs that go to 5Gsps using a [SiGe BiCMOS process](https://www.e2v.com/news/e2v-selects-jazz-semiconductor-for-next-generation-analogue-to-digital-converter-adc-products/) — Sam, Aug 27 '18 at 08:07

score 6 · Answer 1 · answered Aug 21 '18 at 06:21

6

I'll bet that you don't want raw speed, but speed per dollar, and operations per Joule. In which case, silicon CMOS, because of the huge investment in it, is the 500lb gorilla you should go with.

answered Aug 21 '18 at 06:21

Neil_UK

158,152
3
173
387

Bimpelrekkie · Answer 2 · 2018-08-21T07:35:08.730

I agree with Neil_UK's answer that a "standard" CMOS process is your only choice.

Sure there are technologies to make faster logic, I once designed a frequency divider where the input was working at 30 GHz. That design was using SiGe NPN transistors. However, in that design this frequency divider was only a very small part of the chip. The power consumption of the circuit is so high that if you would make a much more complex digital circuit designed to run at 30 GHz the power dissipation would be far too high making a practical implementation impossible.

My point is that there is always a compromise between speed and power dissipation. Since your circuit requires a lot of gates (it is fairly complex) power dissipation will be the limiting factor.

You see the same in modern CPUs, these contain many cores. When one or only a few cores are used they can run at an increased clock speed. When many cores are used, the clock speed is limited immediately or after a while as the CPU heats up (thermal throttling).

As with CPUs you can get the best performance if you parallelize your design as much as possible, that will result in a lower clock speed at the circuit level but increased overall throughput.

I understand that parallelization isn't what you're looking for but personally I do think you should consider parallelizing as much as possible. Even if you would (try) to get around the power dissipation problem by dividing the circuit over multiple chips running at high speed, that's still parallelization to me. Then you would need to distribute the data signals to the chips, with equal trace lengths, that will introduce delays. That will be a challenge to get right.

The algorithm is designed to be "inherently sequential" so parallelism does not help past a certain point. How many gates was your 30GHz frequency divider and how much power was it consuming? — Randomblue, Aug 21 '18 at 07:30
Think in the order of less than 100 gates with only about 10 running at 30 GHz, 10 at 10 GHz etc... total power consumption was about 400 mW. That wasn't the fastest SiGe technology so there is some room for improvement. Anyway it is also possible to make CMOS run at 30 GHz or higher, this has been done. Then again only for small circuits as power consumption is the limiting factor. — Bimpelrekkie, Aug 21 '18 at 07:39
To add to this, VLSI is really only done on Si. Most of the exotics are used on very high value *low volume* applications (relative to Si). The EDA tools, processes etc are not as refined as for Si, and much more expensive. If you're looking for 20M gates, Si is the only practical option given your budget (don't forget a lot of that money is going to be burnt on design and verification...). — awjlogan, Aug 21 '18 at 09:18
I agree with @awjlogan practically all digital design flows (Verilog / VHDL => RTL => layout) are for CMOS processes. It does not mean it cannot be done/made for esoteric processes but doing that will cost you lots of (wo)manpower as cell libraries will need to be made. The design can also be made manually but that also requires more effort. — Bimpelrekkie, Aug 21 '18 at 09:37

score 5 · Answer 3 · edited May 05 '22 at 05:49

This is building on the other answers so far, but just my thoughts.

Given your budget, and the desire to compete with an entity who's budget is nearly 40X your own, you should not try to use exotics for your application. The major costs in designing this ASIC are going to be:

People. I assume you will be paying people to work on this full time as this is not a project that can feasibly be undertaken as an evening project (not withstanding point 2 below). You will need HDL developers, verification engineers, and implementation engineers. All of these are specialised skills with corresponding price tags. In particular, implementation engineers for exotics are (very) low in quantity and high in demand (especially if they're good). Don't expect much change from $1-5M (depending on location) per year.
EDA tools. These are expensive, just to license. You also need a lot of them, and multiple seat licenses. HDL compilers, RTL synthesis tools, simulators, layout tools etc. Each license is likely to be on the order of $100K per seat. Don't forget you also need the computing power and infrastructure to run them as well; you will need a pretty powerful cluster.
Design. Most tools and process design kits are mature for Si given the volume and revenue for this market. For your exotic, expect less-than-ideal models, especially for cutting edge process nodes. You will need to develop or buy standard cells for your exotic substrate. There will be many fewer than for Si.
Manufacture. There are speciality exotic fabs, but they are just that: special. The volumes are low, the wafers are (much) smaller, and costs are much higher (a rough estimate is 100-1000X per mm\$^2\$ compared to Si).

Even after this, there's little guarantee that you'll get the improvement you think you'll get just by running faster. A lot of very clever people have invested a huge amount of time and money in Si, and you'll be re-inventing the wheel for a lot of things (e.g. standard cells, power control etc) and probably be doing it worse. Fabs will often provide standard cells optimised for their process; it would be foolish not to use this. This will erode the advantage of using the exotic in the first place.

Unfortunately, open sourcing the design code doesn't allow you to manufacture the ASIC without a lot of other investment. Now, your $1B competitor can eat a lot of these costs and even if you open source the RTL, they can do the rest of the things which you simply can't open source. For example, semiconductor fabs are very cagey about releasing their in-house process models. You should do a very thorough audit as to the advantage of open sourcing in this case; manufacture simply doesn't scale in the same way software distribution does so the pros and cons are very different.

To answer your questions:

Budget limits everything (of course). Given the disparity to your hypothetical competitor, $30M would be much better spent on high quality people to develop a good architecture rather than trying to get "free" performance from the materials and process used. As my comments above hopefully illustrate, this "free" performance will be anything but free!
Good architecture will mitigate a lot of the advantage of going to an in-house exotics design. There is still potential for scaling in GaAs and other exotics. This may become relevant in the (near?) future - keep your powder dry to take advantage of that.
SiGe is closer to Si, so you may be able to use this more freely, although it will still be more expensive than Si. GaAs is more specialised, and is usually used for its high ft in RF designs where area cost is less of a concern. Going from 100 nm to 10 nm gives you (to a first order) you 100X more transistors to implement your excellent architecture. Of course, architectural improvement usually scales as \$\sqrt{N_\mathrm{transistors}}\$, so probably around 10X performance gain overall. Bear in mind though, that even $1B is nowhere near enough to push a fully new process through, so the chances are your competitor will still be using Si.
SERDES for 4096 bits is a lot of registers - this is going to cost a lot of power and area for no performance benefit on your exotic wafer. Given you can fit whole processors in fewer than 4096 registers (let alone 8192), this illustrates the issue there. Area is much cheaper on Si.
Going to smaller transistors means higher power density, hence more need for power control, i.e. bits that are turned off (dark silicon). A lot of work has gone into analysing and reducing power consumption while maintaining acceptable performance. A critical factor is your expected activity. Will it be working full throttle 24/7, or will it be periodic? This will make a big difference to your design.
A $1B competitor has no care about obfuscation if the reward is high enough. Don't be hubristic in thinking your design is the perfect implementation.

To summarise, you should spend your money on the people and tools that develop your architecture and algorithm (don't forget that!). This is likely to provide the best return for your relatively constrained budget by leveraging the massive investment in both tools and process for Si. Simply using a faster material is highly unlikely to give you the improvement it seems on paper by raising the clock rate, given all the other steps in designing and making an ASIC.

Personally, I would target a "cheap" Si node (probably something like 22 or 28 nm) to get your design up and running. If it's successful, you can use the scaling benefits to go to smaller (and more expensive) nodes, leveraging the work you've done already and the work done by the fabs. In the interim, as you are developing an ASIC, you can push the operating conditions wider as well, as compared to a CPU/GPU which has to work across a huge and unknown range of conditions. For example, you can specify the cooling equipment that should be used. This will further erode any advantage by going to higher performance materials.

Thank you for this detailed answer :) Would you agree that SiGe is the less exotic of the non-Si semiconductors, and that SiGe is probably within the realm of our budget? Do you have any intuition as to whether 90nm SiGe would perform better or worse than 22nm Si? What about 7nm Si? — Randomblue, Aug 23 '18 at 15:14
@Randomblue It's very hard to say, as a 22 nm Si FET isn't just a 90 nm Si FET shrunk 10X. There's huge amounts of materials engineering (including SiGe channels!) and solid state physics involved. A difficulty in going to 7 nm is the power density; not good for full time full loading (happy to expand answer). SiGe *may* be in your manufacturing budget, but with the factors in my answers I think it would be outside your overall budget, given how much cheaper/mature Si CMOS is and how many gains you can get elsewhere. — awjlogan, Aug 23 '18 at 15:53

Obelisk Ken · Answer 4 · 2018-09-08T20:43:59.143

I looked into these exotic materials as well for our next-generation PoW mining ASICs and, as others have stated, they are just not ready for volume production yet.

For about $20 million though, you can get a design and masks at 7nm, which, as you likely know, is the best Si process currently available for volume production. Getting fab time for 7nm is quite challenging though. In fact, getting fab time for 14nm or under often requires waiting months or years depending on the process node and specific fab.

In addition, fabs are going to require that you show them that you have the financial ability to follow through with a large enough wafer order to make it worth their while. This makes a 7nm project, including wafers, come in somewhere around $50-100 million depending on the fab's mask cost, cost per wafer, and minimum number of wafers. That's before building the hardware to contain the ASICs, which will typically double the costs. This can vary a lot though based on the number of ASICs required in each unit, power requirements, cooling requirements, etc.

The Obelisk Launchpad program is intended for projects just like this that require transparency and openness. In fact, Launchpad requires that the resulting ASIC design be open sourced. By default, the Launchpad process is oriented around a 22nm design, but that can easily be changed to something else. Disclaimer: I work for Obelisk.

You may also find this blog post on The State of Cryptocurrency Mining useful to understand more about the ASIC manufacturing process.

Exotic semiconductors for fast digital ASIC

4 Answers4

Linked