The team needs to design and run design validation simulation analysis to choose an optimal design point. Each simulation run is costly in time, and lowering the required time would save money.
To design the future’s world class leading server computer interfaces, the HSB-SI Team chose IBM Bayesian Optimization Accelerator (BOA), which reduced the number of simulations required by 99.3%.
140xfaster time to solution with higher accuracy than legacy method
99%less cores utilized to arrive at a higher confidence solution
Less than 1%error in solution using IBO
Business challenge story
The IBM High Speed Bus Signal Integrity (HSB-SI) team was a key part of the design and development of Summit and Sierra, two of the world’s fastest supercomputers. As they look to transform the industry again with the next generation of Power architecture, lengthy simulation timelines for chip-to-chip communications were hampering their efforts to make the most of the limited time that any product has in development.
The team performs high speed bus signal integrity design and analysis for server systems using the IBM Power Processors. A key factor giving these supercomputers a performance advantage is being able to run the communication between chips at the fastest speed possible. The team designs the route that the signal takes from one chip to another, so that the signal sent from the transmitter and arriving at the receiver can be deciphered without errors. To design or analyze the signal integrity of a channel between two chips and ensure error-free communication, these engineers need to do numerous simulations, which can take several days.
Dr. Jose Hejase, a senior engineer on the team, explains, “To guarantee error free communication, the [signal integrity] engineer has to make difficult decisions about design constraints on a channel. How to design the printed circuit board? Should I use connectors? What type of cables should I use? Will the channel still work under worst case manufacturing tolerances? Each design is unique and there is no single recipe that answers all the questions. Doing this analysis using our traditional means can be very time consuming, tedious, and engineering intensive.”
Every day, companies that design electronics call upon their signal integrity (SI) teams to make difficult decisions about design options, and this team needed a new solution to free up their engineers from days of simulations to obtain optimal channels for very low bit error rate (practically error-free) chip-to-chip communication.
IBM BOA vs. Traditional
The engineers on the HSB-SI team must ensure error-free communication between chips at maximum speed. The traditional process to analyze these design channels is engineering intensive and can take several days to arrive at an optimal channel design combination. The team had an opportunity to cut this time dramatically by implementing an optimizer: IBM Bayesian Optimization Accelerator (BOA).
Dr. Hejase believes Bayesian Optimization Accelerator was the optimal solution for the team as they tackled this problem. “Using BOA is a way to greatly reduce the number of simulations needed to come to an optimal point and free up the engineers to think of how to design the systems of the future now that they will no longer need to do less tedious work using traditional methods.”
BOA is a statistical framework that uses machine learning to model and minimize an arbitrary objective function. As Dr. Hejase notes, “To go through all of the channel design combinations and/or tolerances, the engineer has to do substantial thinking and planning to determine a case study to analyze. This analysis consists of multiple simulations, and these simulations can take many days. This is where BOA comes in. BOA is an optimization tool that is driven by machine learning. Instead of us having to do an intensive sweep of all the different combinations and/or tolerances to be considered and the simulation tool settings, BOA is able to learn from a relatively small amount of simulation iterations to arrive at the optimal result for a particular channel.”
Discovering optimal designs faster
After this success, the HSB-SI team is working to incorporate BOA into their mainstream process as they continue to produce world-changing computer systems.
In addition, BOA will allow engineers to do tolerance analysis for the channel. This is important to ensure that channel designs will still perform well even under worst case manufacturing tolerances. “When designing a channel, there are multiple channel design options to consider. Within those channel designs are tolerances. Each individual segment of a particular channel will have tolerance associated with it. Once you sort all the tolerances together you might have more than 1,000 channel combinations. This computation is ideal for BOA. Instead of the engineer having to intensively go through each combination and determine the performance metric, BOA will do the optimization to arrive at the channel worst case performance under tolerance with high confidence.”
BOA is designed to help product and design teams introduce products and features faster by lowering their design time to generate new and better products. BOA is a flexible and fast solution that has saved the compute time for the HSB-SI team and will allow them to get high confidence channel designs at hugely reduced time. Moreover, BOA’s generated, machine learned models are potentially vehicles to decrease dependence on certain simulation tools in specific scenarios to analyze channel performance. Last but not least, one benefit from BOA is that the sensitivity of channel performance to certain channel component design properties can be learned thus providing engineers hints on where to spend most effort in future designs for best SI performance.
“Machine learning and optimization is a hot area in the electronic design industry. In some way or another companies are moving towards that direction. Having BOA is a great advantage for us at IBM.” – Jose Hejase, Ph.D.
IBM Power Systems
IBM Power Systems is a family of servers fueled by POWER processor technology and software built to crush clients’ most advanced data application-from the mission-critical workloads they run today to the next generation of AI.
 Based on IBM internal testing during POWER10 development comparing IBM Bayesian Optimization Accelerator run on one IBM Power System AC922 server to traditional brute force ‘design of experiments’ methodology implemented by IBM, based on industry best practice and run on non-accelerated x86 Linux architecture. Results valid as of July 2018. In testing, traditional methods required 11260 minutes to reach final result, and IBM Bayesian Optimization Accelerator required 80 minutes (99.3% reduction).
Take the Next Step
To learn more about IBM Power Systems, please contact your IBM representative or IBM Business Partner, or visit the following website: https://www.ibm.com/it-infrastructure/power