RTL

Synthesis Needs to Change to Serve Modern Chip Design

By Tets Maniwa Senior contributing editor for New Tech Press Copyright March 2009 by Footwasher Media

As EDA tools evolve, the resulting products try to increase automation. Unfortunately, the last great advance was from schematics to language-based design starting with the first synthesis tools in the mid-80's. Designs have grown more challenging and complex over the past 25 years as process geometries have gone from 0.5mm to 32nm and design from 100,000 gates to over 100,000,000 gates.

As process dimensions shrink below 90nm, synthesis tools must provide more information to the back-end tools than previous generations of the processes and tools.  While there was hope for the potential of C synthesis (and there are several of those solutions in the market today) the vast majority of IC design teams still start by writing the RTL themselves because of quality of results concerns in high end chips.

Gary Smith, principal analyst at GarySmithEDA, believes physical synthesis tools are needed for timing closure for all designs below 90nm, and have been in place as necessary EDA tools for the high performance designs since 2002. As the industry moves to 65nm and below, however, everyone needs physically aware synthesis tools to get timing closure. The main players in physical synthesis are Cadence, Magma, and Synopsys, the market leader with 54 percent of the market with the rest evenly split between Cadence and Magma. Currently, however, Synopsys only has a slim technology lead, effectively making synthesis a commodity.

Pie in the sky?

The vendors claim their offerings do the job, pointing to advances in algorithms, databases, internal architectures, and a speed boost from the CPUs make it possible to synthesize multi-million gate designs. The basic tools have changed from just translating gates and registers to netlists, to optimizing compilers addressing logic optimization, timing, power, and even some physical effects.

"DC Ultra’s Topographical technology and Design Compiler Graphical improve overall flow predictability, which helps manage overall design flow and reduce iterations," said Gal Hasson, senior director of marketing, RTL synthesis and test at Synopsys.  "When the synthesis has physical knowledge, the result is a better start point for place and route, improvements in turn-around times, and increased productivity."  Hasson claims in the past 3 years, changes in synthesis have lead to much better estimates of loading and timing as the wireload models were replaced by physical information. The resulting improved netlist leads to reduced design time and fewer iterations.

Jonathan Smith, product marketing manager at Magma, observes Magma has integrated the analysis of the physical effects and the routing to address the issues in the main flow rather than trying to fix the problems at the end. "This integration minimizes the creation of chaotic designs where a fix for one problem creates a new problem. Among the underlying technology advances are gain-based model for gates, allowing the tool to adjust drive strengths as a function of the implied loading. Integration RTL, physical, and other analyses helps to reduce overall run time and reduces the surprises that happen in physical implementation. "

Starting points

Users, however, are struggling with many design challenges, not all tool related. Karl Pfalzer, principal engineer at AMCC, states that place and route effects are not addressed at the design level. In SOC's, the designers focus on architecture and micro-architecture levels to optimize system-level performance.

"Part of their problem is while working at optimizing the block they may not be aware of system-level problems such as  complex cross-block interfaces, " Pfalzer said.  "Another is the quality of the IP. They use a lot of different types of  IP.  Often the IP is measured (and purchased) solely on its functional merit."

However, a lot of this IP has poor implementation-level quality for: internal clock-domain crossings, "lint" and constraints. "The IP users don't always understand how the IP interacts with other logic but don't have the time to fully look into the IP blocks so they use the IP as is," Pfalzer added.

Leon Stok, director, of EDA at IBM Systems and Technology Group, notes that especially for high-performance, high-frequency internal designs with clocks greater than 3 GHz, his group uses IBM synthesis and timing tools. These tools allow them to mix gate and transistor-level synthesis for fine-grained control in areas like data path circuits. This allows IBM to create a design flow based on design requirements and technology.

WYSIWYG?

Most power users use a "standard RTL flow," a basic methodology taking RTL run through synthesis with wire-load models to develop a gate-level model, which is then used to create a virtual place and route design. For larger designs, problems include design size and complexity, library selection, hierarchy and partitioning, run times, and the need to include changes in the design while moving to tape-out.

The greatest challenge is design size. There are capacity limits at the block level, which is driven by costs. Companies try to reduce the number of design teams and designers for each design, but the limits of capacity require more blocks and more design teams. The primary process to overcome the size issue is to use a hierarchical design flow. They synthesize lower-level modules then assemble those modules into larger modules and synthesize the interfaces until they're finished. From synthesis they go to floorplans and then to physical design.

The problems with hierarchical design is the preliminary partitions may not reflect the designer's final design. "Blocks are not  always  well defined," Pfalzer noted. "Complex logic  is often shared between blocks and it  can be  hard to develop proper timing budgets and micro-architectures to achieve optimal performance."

Pfalzer added that they want to use all  of  the  implementation tools as  often and as early as possible .  Design closure is an iterative process, best done using successive refinement.  "We cannot wait until the RTL is functionally done before we start detailed implementation.  It is best to find even the smallest problems early and fix them then."

Stok agreed. "In many areas the problem is logic designer productivity, which is actually a combination of design and verification and can still take a lot of changes and improvements. Most of the work still is at the RTL,and without a path to synthesis the higher-level languages just add another layer to the flow without improving productivity."

In comparison, physical design has made a lot of progress in increasing automation over the last 10 years. To a greater extent, physical design productivity has kept pace with the increases in design complexity and size due to process scaling.

Ameesh Desai, senior director, design tools and methodology at LSI Corporation, said their flow is a standard SOC methodology, but does require them to first partition designs into reasonably sized pieces in order to complete synthesis in a reasonable amount of time. "Because the process is iterative to some degree, it's difficult to determine how much of the time is actually spent only in synthesis and not in other corrections. The problem is if we open all the libraries, synthesis will generally choose only the fastest elements, without regard to power consumption. This bias in synthesis requires significant additional work to bring power back within budget."

Desai adds constraints may not be correct or valid when the design netlist goes to place and route. DC topological and DC graphical seem to be better but LSI is not using these tools to the full extent possible. In addition, the way they use the tools are still not solving all the problems that show up when the constraints and the synthesized final netlist prevent design closure.

The overall throughput is also a major barrier according to Yoshi Inoue, chief engineer for the Design Technology Division of LSI Product Technology Unit at Renesas.  Inoue asserts that their synthesis process goes through many iterations to include the engineering changes and the mismatches that happen between the design and the physical implementation. Not only does this take a lot of time, but also costs a lot of money. Their benchmark is 20 million gates in 20 hours with a large set of CPUs running Design Compiler. They don't like the high cost for the tools, especially when they have to invoke many licenses at a time. Generally this process takes about one week to complete, but may take many iterations before design closure.

Ken Saito, senior engineer for Renesas EDA advanced technology development, adds another part of the problem is the effort it takes for designers and synthesis to fix errors. "We are not always positive their design meets the constraints and the amount of time it takes to process the design is a problem. One final challenge is that the bottom up methodology cannot check all of the nets in a single pass, leading to multiple trials and iterations. We have to adjust the libraries per module to ensure a reasonable starting point and match to all of the requirements: logic, timing, power, etc."

Wishing upon a star

Increased automation would enable a full chip flow and eliminate the synthesis bottleneck. Next generation tools will need better coupling between RTL and physical issues, larger capacity, multivariate analysis, and optimizations over a wider range of parameters. Obviously, much greater capacity and speed are a major part of the equation, but not necessarily sufficient for a tooling and methodology change. The increase in process technology constraints and design rules makes the synthesis job even harder.

Capacity increases, however, will be memory and runtime constrained. Memory is a limiting factor in multicore computers because the shared memory becomes a bottleneck and total memory size is too small to hold the whole design at one time. For distributed computers in a CPU farm, the existence of non-shared memory eliminates the size limits, but becomes communications limited instead.

Gary Smith suggests the next stage in synthesis tools will be power-aware capabilities, especially for power modes and level changes. Physically aware synthesis needs a trial placement mode for better loading estimates for design closure.

"Real physical synthesis would be a panacea to their current situation," Pfalzer opined. "Take RTL , a reasonable floorplan and top-level constraints  in, and then get a good physical design as an output. There are no tools to do this.  Instead intermediate levels are used to build the blocks and then work through design issues to develop good enough constraints, floor plans, integrate the IP, etc. to get to a decent place and route."

Pfalzer said Cadence’s Chip Estimate and InCyte tools may help in this regard, but he's not convinced.

Stok asks for greater capacity, which helps always. "IBM used to have flat flows that give the tools lots of leverage, because are not hindered by many constraints and small design blocks. A higher capacity tool would enable fewer constraints and larger blocks, which enables the tool to reduce turnaround time while increasing quality of results. "

Due to the capacity limits of existing tools, designers must create hierarchies and partitions. A tool with larger capacity would give designers less chance to go wrong in creating the different hierarchies and partitions that result from conflicting requirements and targets at the block- and chip-level designs. "We must have good tool support and much greater levels of automation to continue to get the most out of our technology processes." Stok stated.

The increasing complexity and quantity of technology rules for a process mean that the tools must evaluate much more information in their algorithms to get good results. A tool which can simplify this process while providing good density and yield would help tremendously but it also needs to co-optimize many new and different parameters to address the requirements of the 30 nm node. We will need to start experimentation to find the optimal priorities in the mixture of parameters and libraries.

LSI's Desai would like some form of more accurate power optimization. Due to the wide range of parameters within the cells they use, the T-max, channel scaling, power, speed, and others they have to perform some experiments to confirm which libraries and cells to use as a starting point for synthesis. "The problem is if we open all the libraries, synthesis will generally choose fastest parts without regard to power consumption, requiring significant additional work to bring power back within budget."

Due to the limitations of the tools, LSI has to use a hierarchical design process. They would like to see a tool that will not require as much partitioning and able to handle much larger blocks in a single pass or, conversely, allows more flattening to minimize this level of effort. Some of the floorplanning tools are starting to address some of these inherent structural problems and discrepancies between synthesis and physical implementation.

Another area that would be very helpful is the ability to evaluate multi-corner effects. The problem here is related to the library issue, synthesis can get you to a design closure point that may not be realizable or may be only locally optimized but not globally efficient. And finally, the ability to confirm constraints that correlate with the back-end requirements. Currently, constraints may not be correct or valid when the design netlist goes to place and route.

Renesas' Inoue considers an ideal tool would be something that works in a top-down fashion with a capacity of greater than 50 million gates. "It would be very helpful if the tool can handle multiple constraints and optimizations simultaneously. In addition, more efforts at physical optimization like DC topological at synthesis would improve the overall design flow tremendously."

This article was sponsored by Oasys Design Systems