As a methodology RTL (register transfer level) has been around for about three decades, and there are many digital design tools that understand it. However, as an approach to digital design it has an underlying assumption that transistors are well-behaved and you can fiddle with gate sizes and implementation to line up timing on a particular (global) clock efficiently.
In the new world of sub-10nm FinFETs, Silicon variability has increased and that means that underlying assumption no longer holds, a chain of RTL logic blocks is only as fast as the weakest link – and a weak link is more likely.
A couple of design approaches work around that problem, a) use body-biasing (FDSOI or CiFETs) to speed up or slow down troublesome cells, or b) use asynchronous design.
If you want to use asynchronous design your choices are limited, the only company successfully deploying asynchronous logic is ETA Compute with their DIAL2 technology. However, there is a secondary problem that an RTL design specification is somewhat sub-optimal since it includes clocks (synchronous FSM), where a purely data-flow description (asynchronous FSM) would be easier to work with all round.
Interestingly the nearest thing to asynchronous FSM descriptions you see floating around today is neural-networks, so one suspects that neural network descriptions will supplant RTL as the medium for low-level hardware descriptions for synthesis.
NB: if you want to verify your asynchronous IC, or even just DVFS, you need a high performance analog simulator.
Asynchronous Design
As part of my MSEE program, I took a course on Asynchronous Sequential Switching Circuits back in the late 1980s. I still have my book by the same name from Stephen Unger. After taking the course I tried to incorporate the techniques into the ECL ASIC I was working on for a project at Tektronix. I was designing a 250 MHz ECL ASIC at a time when the fastest CMOS PCs were running at 25 MHz. When I talked to my graduate advisor, a Tektronix PhD, about the design, he reminded me that the fundamental assumption that made asynchronous design work was that gates delays were significantly longer than net-trace delays. He told me that asynchronous design would not work with the very fast ECL gates that I was using, and we have the similar situation with high speed design today. For asynchronous designs to work, you are going to need a very smart place and route tool to match the important trace delays. I have heard that there is some work going on at the University of Utah on this topic, but I have not had a chance to meet with the professor conducting the research.
The reason clocked logic continues to rule the digital design space is because we have very efficient tools to simulate the logic, check the static timing analysis and capture the functional coverage intent. It is much harder to prove that the timing works on an asynchronous design.
For these reasons, I keep my eye on the asynchronous design space but mastering the placement and skew-controlled routing put me in the skeptic’s camp.
The main problem is that you need to verify the behavior with an analog simulator, since the asynchronous approaches depend heavily on charge transfer, and there may be race conditions. Since nobody likes to do that with SPICE, it requires languages like Verilog-AMS with analog event-driven capability and higher level behavioral modeling. That’s one of the reasons I posted about doing an AMS simulator with Xyce, and using analog simulation techniques for catching CDC errors – getting an asynchronous flow together requires some tool support that isn’t common yet.
SystemVerilog doesn’t work well because it can’t do potential & flow on a wire (voltage and current), only one of them, so modeling the settling of an asynchronous logic circuit accurately is difficult. The statistical modeling methods used for the CDC debug can be applied to asynchronous circuits.