Physical RTL Synthesis

Until now, all synthesis tools have been built the same way: turn the RTL into gates, and then optimize the gates to meet the constraints. It’s as if C language compilers all worked by turning the C into machine instructions, and then optimizing the machine instructions. In principle, with enough runtime and enough clever optimization techniques, working at the machine instruction level might discover a higher-level optimization such as pulling a constant sub-expression out of a loop. However, it is much better just to do higher-level optimizations at the higher level in the first place. Modern C compilers are indeed built this way, with global optimizers that look at a high-level representation of the program, and a straightforward peephole optimizer cleaning up final details at the machine instruction level at the very end. Oasys RealTime Designer is the first synthesis tool built in an analogous way.

The basic approach used by other synthesis tools was built up historically. The first logic synthesis tools simply optimized gate-level netlists derived from schematics. RTL synthesis was added on top of that foundation. The RTL would be read in and reduced to a control/dataflow data-structure, which would be turned into gates and then the gate-level optimizer would grind away until the design met its timing constraints. At that point the netlist would go to place and route. When physical information became more important, some placement information was integrated into the optimization phase.  But for the last twenty years synthesis has been built around the heart of gate-level optimization.

This has two disadvantages. The first is that gate-level optimization is a very low-level local optimization meaning that major changes are either impossible or take a large number of small incremental computations to achieve. This leads to inferior results, and to unacceptably long run-times.

The second disadvantage is that optimizing at a very low level requires enormous amounts of data, since by definition there are a lot of gates. This limits the maximum size of the design due to memory capacity (and run-time) and means that chip-scale designs must be split into smaller blocks for synthesis, and then those smaller blocks must be re-stitched for physical design. The problems with this block-synthesis approach manifest themselves as endless iterations when assumptions made in the synthesis tool, which can only see a single block at a time, turn out to be invalidated by the physical design tool when all the blocks are considered as a group. The design that emerges from place and route no longer satisfies its constraints.
Oasys RealTime Designer is the first tool to pull placement ahead of synthesis and so enable high-level optimization equivalent to the more powerful transformations that modern compilers can make to software programs. Physical RTL synthesis works by partitioning the RTL into placeable pieces, and then refining those down into actual library cells so that there is always a full placement that goes with the timing values.

Being able to work at the level of the entire chip leads to better results since all aspects of the chip are taken into account, including modules, all synthesized blocks, all placement and even routing congestion, which can be displayed as a congestion “temperature map”, showing highly congested areas in hot red color and areas with low congestion in cooler blues.

Furthermore, congestion maps give good early feedback to RTL designers if they are building up problems that will eventually be hard to solve during place and route. For example, straightforward implementations of a crossbar switch are easy for the RTL designer to write but the enormous multiplexor thus created results in unmanageable congestion in the center of the design.

Working at a higher level produces orders of magnitude better performance: an ordinary 32-bit PC can synthesize designs of tens of millions of gates in an hour or two. This compares to 64 bit workstations requiring literally weeks of run time to, often, fail to achieve an acceptable (or sometimes any) result. Moreover, since the entire chip is handled at once, the synthesis process is repeatable and predictable. A small change to the input RTL will not result in a wildly different result.

The alternative is to split the design into multiple blocks for synthesis, before attempting to stitch them back together again for physical design. However, the only quality of results that actually counts is what finally comes through place and route, not what an intermediate synthesis step computes for each sub-block. Unfortunately, once a design is broken down into bite-sized blocks, like humpty dumpty, it’s just not possible to “put it together again”.



This page has been visited 1,090 times since September 15th, 2009

This is an ApogeeInvent Dynamic Website