Jay Singh, Plato Networks

Last week I met Jay Singh of Plato Networks. He was one of the earliest users of Oasys RealTime Designer and I wanted to find out what his experience was. Jay got involved with Oasys early on, before the tool was mature, and has given a lot of feedback over the last year and a half.

A bit of background on Plato. They are building 10 gigabit PHY solutions for the next generation of data centers. Big complex chips with enormous amounts of interconnect.

Jay thinks that it’s great that RealTime Designer is so fast but that is not the most valuable aspect of it for him. The biggest strength is the consistency of the results from synthesis with what comes out after place and route. But it is fast. On difficult blocks he found that it was 20x to 60x the speed of traditional synthesis tools.

If he provides a floorplan, then he reckons the results are 100% predictable. What RealTime Designer says is what you will get. Furthermore, you’ll get it faster since the netlist seems to be very friendly to the place and route tools and designs seem to go through physical flow much more quickly and smoothly. Speed improvements of up to 3X 
have been observed in the existing backend flow, using same set
 of scripts, simply by swapping traditional tool netlist with the RealTime Designer generated one.
 You simply don’t get that consistency with other synthesis tools. With them, after place and route the results may be better or they may be worse than predicted. The prediction simply isn’t very good. To cap it off, in some cases the blocks are 20% smaller than with traditional synthesis.

One interesting thing I didn’t know is that RealTime Designer sometimes creates more instances than traditional synthesis, which typically works hard to reduce instance count. For example Jay saw 2.5X more instances in a connectivity-intensive design and despite that the backend physical implementation had more than 2X speed improvement over traditional synthesis.

Why is this? Oasys knows timing and placement from the very beginning. Traditional synthesis tools simply try and minimize the instance count. This used to be a sensible thing to do but now that interconnect is as a big a problem as cell area that doesn’t seem to be true any more. To take a simple example: if a signal is used in two places on a chip, and its inverse is also needed in both places, traditional synthesis tends to create one inverter and then run the inverted signal to both places that it is needed. If those two points are close together then this is a good decision. If they are far apart, it makes more sense to put a small inverter at both points, which pushes up the instance count but removes a long wire and reduces congestion. Decisions like this about how to structure the netlist can only be taken when the RTL, the placement and the timing are all available at once, which is just the approach taken by RealTime Designer.

His view of adoption of RealTime Designer is that the scripts are very simple. It is easy to use and the commands are very consistent with other synthesis vendors so there is a shallow learning curve. In fact, because RealTime Designer just takes the entire design and synthesizes it, there are only a few commands that get used much. Cross probing between schematic and RTL source works really well.

It runs perfectly in batch mode and Jay likes to set it up to run a whole matrix of different performance tradeoffs overnight since it is so fast. In effect, instead of using intellect to work out how to tweak the performance, just burn up lots of computer time (cheap) and do 20 runs. It has the features of other synthesis tools but Oasys have made it simpler.

Harry the ASIC guy says “plausible”

Harry the ASIC guy already blogged once about Oasys, pretty much with the skeptical view that Oasys’s results were too good to be true, it’s Ambit or get2chip all over again, and who cares about synthesis anyway.

But he came back and took a real look, getting the suite demo during DAC. His second blog on Oasys starts off with the same Groundhog Day, story repeating itself point of view. But Paul did more than give him a demo, he explained at least a little of the secret sauce inside Oasys:

According to Paul van Besouw, Oasys decided to take an approach they call “place first”. That is, rather than spend a lot of cycles in logic optimization before even getting to placement, they do an initial placement of the design as soon as possible so they are working with real interconnect delays from the start. Because of this approach, RealTime Designer can get to meaningful optimizations almost immediately in the first stage of optimization.

A second key strategy according to van Besouw is the RTL partitioning which chops the design up into RTL blocks that are floorplaned and placed on the chip. The partitions are fluid, sometimes splitting apart, sometimes merging with other partitions during the optimization process as the design demands. The RTL can be revisited and changed for a new structure during the optimization as well. Since the RTL partitions are higher-level than gates, the number of design objects in much fewer, leading to faster runtime with lower memory foot print according to van Besouw. Exactly how Oasys does the RTL partitioning and optimizations is the “secret sauce”, so don’t expect to hear a lot of detail

Interestingly, I’d written a blog entry before DAC giving an overview of some of the technology under the hood, since I thought that people are very skeptical of “trust us, we’re clever” as an explanation of how Oasys can do something that Synopsys et al cannot. And, during demos at DAC, people wanted to have at least an idea of how the tool worked so that they could convince themselves that it might be as good as the writing on the box claimed. But we decided it gave away too much too early and so it hasn’t yet run.

Harry the ASIC guy ends up doing a Mythbusters on Oasys, to decide whether their claim is “confirmed” or “busted.” Having started off very skeptical, once he understood some of the way the tool worked, he at least got himself to “plausible.” Of course, like anyone, he wants to see the tool run on a variety of designs in a non-demo situation. All tools look good in demos since it is a very controlled environment. Plus there are various things that Oasys RealTime Designer does not yet address, most notably support for CPF and UPF.

There are some interesting comments on Harry’s blog entry too. Cadence objects that RTL compiler isn’t at all like Design Compiler, Rubix thinks that Oasys needs a clock optimization tool like the one from, say, maybe, Rubix. And everyone points out that marketing claims aren’t worth the paper they’re not printed on any more, it all comes down to benchmarks.

Actually, I don’t think even benchmarks are that interesting. In the end, the most convincing thing of all will be when some of Oasys’s customers go public on the designs that they have successfully taped out. As Harry says, this is the gold standard of EDA: are the dogs eating the dog food?

Ten rules for corporate blog like this one

I only started blogging on EDAgraffiti at the start of the year and over here just a couple of weeks ago. The two blogs are very different, one being essentially my opinion on whatever I feel like giving my opinion on, and the other being a corporate blog. I came across and interesting list of ten rules about corporate blogging that is worth thinking about. I won’t cover everything it says, you can read the whole thing.

But here are the ten rules with a couple of comments:

  1. A blog does not magically generate traffic.
  2. A good corporate blog requires long-term commitment.
  3. Teaser feeds are a wasted opportunity.
  4. You are not “engaging” anyone.
  5. Press releases shouldn’t appear on a blog.
  6. You sound like a faceless corporation.
  7. You need to show the warts and all.
  8. Marketeers often make bad bloggers.
  9. You expect too much from your readers.
  10. Your competitors will read your blog; get over it.

Items 1 & 2 are really an admission that you have to earn the right to be listened to and that, now matter how good the content, building a readership takes time.

Items 3 & 4 are not yet relevant to the Oasys blog since we’ve not (or rather not yet, I hope) set up either RSS feeds or comments. So this blog is still very much in broadcast mode rather than anything approaching a conversation.

Items 5, 6 and 7 are about making sure that this blog has a more human face. People like to talk to people not corporations. Scoble’s blog at Microsoft was wonderful at this, putting a human face on Microsoft at least partially by being tough on Microsoft in areas where Microsoft was not perfect (aka sucked).

Item 8: well, I was an engineer before I was a marketeer. Also, my plan with this blog is to get other people (customers, engineers etc) to write some of the content. It can’t just be a marketing channel or it fails to earn the right to be listened to.

Item 9: be punchy. Both in the sense of being brief and in the sense of being edgy.

Item 10: Hi there Synopsoids.

How did Oasys get started?

There were three founders of Oasys, Paul van Besouw, Johnson Limqueco and Harm Arts. When we all worked together at Ambit, Paul was the lead for the “front end”, the RTL synthesis part of the product, and Johnson and Harm were the main engineers working on the “back end,” the gate-level optimization. Everyone knew that wireload models were at the end of their useful life and so like every other synthesis team Ambit also had a physical synthesis project, PKS.

Building an old-style synthesis product, and bolting on physical optimization convinced the three of them that it was the wrong approach. The first thing a modern place and route tool does is usually to remove all the buffering and downsize all the gates, so it didn’t make a lot of sense to spend a lot of time in synthesis carefully creating them. But at least down to about 0.1um that approached worked well enough to get timing closure. However, in a current generation process this fails since the netlist that comes out of synthesis is not a good starting point to get to one that actually closes timing since it typically requires more aggressive changes to the netlist than the place and route tool is capable of. So sometimes the netlist that comes out of synthesis closes easily, perhaps wasting area; sometimes it is impossible to close. The reality is that the “timing” of the netlist that comes out of synthesis bears almost no resemblance to the timing of that netlist once it has been through place and route. This is such a large problem that ASIC companies are apparently asking customers to have 50% timing margin to give themselves some chance of closing timing.

In an old-style synthesis tool, the RTL synthesis is done naïvely. The RTL is parsed and transformed into a control/dataflow graph (CDFG). This graph is then walked and an initial implementation into gates and registers is created without any regard to timing or other constraints. In fact the timing engine isn’t even turned on at that point. A level of optimization then takes place analogous to Karnaugh maps. After that the gate-level optimization really goes to work and grinds away. And grinds away. And grinds away until hopefully the timing constraints are met or heuristics decide that it is time to give up. It is only at this stage that physical information is taken into account, but due to capacity limitations, the synthesis hierarchy doesn’t match the physical design hierarchy and the physical information is basically bogus. It is here that Johnson and Harm’s complaints “give us a better netlist Paul” was one of the motivations to create Oasys.

Why not put all the effort into generating a really good placed netlist from the RTL? So good, in fact, that a traditional optimization back end would not be required. It wasn’t clear whether it would be possible to build a front end that good, but it seemed worth a try. So the three of them designed the basic architecture of a Chip Synthesis product and started to code.

That was 5 years ago. If you design a tool in a completely new space, then early customers might be interested in it when it is still half-baked. It may still deliver value that they cannot get in any other way. If you design a tool in an existing space, then early adopters really are not interested until it starts to get better results than the existing solutions, which is a much higher bar. A lot of code and a lot of testing has to take place before any tool is better than Design Compiler, the main incumbent. Eventually Oasys got there and the performance and capacity were compelling to some early adopters. They were on their way.