Share this post:
HPC Deployment of QISKit
The recent surge of interest in quantum computing is largely due to the approach of “quantum advantage,” a point at which quantum computers will exceed the capabilities of the largest classical supercomputers when applied to a relevant and important application use case. Conversely, quantum computing simulation is a vital component in the development of quantum applications and libraries. Large-scale simulation of ideal quantum systems enables researchers to debug their applications for use on devices which will be available in the future, while high-fidelity noise simulation allows researchers to investigate, under controllable conditions, the behavior and efficiency of libraries when deployed on a realistic, modern quantum system.
A little over one year ago, in preparation for the more widespread use of quantum systems for computation, IBM Research made the Quantum Information Software Kit (QISKit) available to anyone interested in learning how to encode and simulate algorithms designed for a quantum computer. QISKit allows users to run their quantum circuit-based experimental programs on either a real quantum computer or on a quantum circuit simulator running on a classical computer in the Cloud or a laptop.
As one would expect, given both the potential performance of quantum computers and the differences between quantum and classical computers, relatively high computational demands can be placed upon a system running such a simulation. Our recent work focused on leveraging some of the advanced capabilities of the IBM POWER8 and IBM POWER9 computer architectures, including their high memory bandwidth, efficient multithreading capacity, and high computational throughput, to this end.
The HPC Advantage: 30+ Qubits at Your Fingertips
QISKit provides several simulators that allow anyone who wishes to develop quantum computer applications to do so on their personal computer. These simulators become available by installing QISKit. However, simulating quantum circuits of significant breadth (qubit count) requires substantial memory and CPU resources. For example, the simulation of a 26 qubit configuration, using double-precision arithmetic, requires 1 GB of memory and, more importantly, the memory requirements follow an exponential curve, doubling with each additional qubit. Significant CPU resources are also required to perform the required computations and to manipulate the (large) data structures involved. As a single data point, consider that it takes more than 160 seconds to simulate a 26 qubit Quantum Fourier Transform on an early 2015 MacBook Pro, using the standard software installation.
To discuss the simulation of a quantum circuit it is advantageous to have a concrete example. As a point of reference, we utilize the example of simulating certain randomized circuits used to benchmark the power of a quantum device, a metric known as Quantum Volume, expressed using the QISKit infrastructure. Below, we provide a brief description of the code, point out some of the salient features of QISKit leveraged by this example, and examine the performance of the code on a classical simulator.
A circuit created in the quantum_volume function is simulated using the execute( ) method. Specifying the “local_qasm_simulator” backend starts a multi-threaded CPU simulation on a computer that runs this program. The QISKit software provides productivity to developers with its portable and well-designed APIs and, with the included QISKit Simulator, furnishes a user-friendly environment for both novice and experienced quantum programmers to develop and deploy quantum simulation experiments.
While the QISKit Simulator is usually deployed on personal laptop or desktop system, simulation using an on-premises POWER8 or POWER9 environment is also available, as QISKit supports the POWER architecture and environment, allowing developers to install QISKit and to run applications on large POWER SMP systems exactly as they would on their laptops.
The scalability and memory bandwidth of the IBM POWER architecture provides an enhanced and responsive experience for QISKit users, resulting in greater productivity. In addition to on-premises availability, IBM provides high-performance simulation Cloud-based simulation services by fully utilizing the substantial capabilities of the POWER architecture. This version of the simulator is publically available and free of charge, enabling simulations on server-class POWER systems by simply replacing “local_qasm_simulator” with “ibmq _qasm _simulator” in the above example. Whether the code is run on your private workstation or in IBM’s Cloud environment, the only limit to the scale of your simulation is the amount of memory available on the computer resource; this system has been used to run simulations in excess of 40 qubits.
Execution time of Quantum Volume benchmark (depth=10) with a laptop and a POWER8 machine varying number of qubits. POWER8: 8001_22c, 3.4GHz, 10 cores x 8SMT x 2 sockets, 512GB RAM, CentOS Linux 7.2.1511.
The above graphs show some of the advantages of simulation on the POWER architecture and the larger memory footprint afforded by server-class systems. In the above example, the execution times of Quantum Volume benchmark were measured on a laptop (MacBook, early 2015) and a POWER8 machine with the two simulators varying the number of qubits. The extreme memory capacity of memory on POWER systems (up to 32TB) enables simulation of larger-scale quantum circuits, while the memory bandwidth and computational benefits from the large number of threads and computational units available on these SMP systems allow such large-scale simulations to execute in a reasonable amount of time. As is evident in the above graphs, there is a performance difference visible between the on-premises and cloud-based systems. This is because the cloud-based simulation software optimizes memory accesses to run the simulation more efficiently on the POWER architecture. The cloud-based simulation optimizations will be available for QISKit users on-premises in the near future.
The POWER9 Processor
In 2017, IBM announced the availability of the POWER9 architecture, bringing extreme performance to a wide range of application areas, with a special focus on GPU-accelerated AI applications, due to the enhanced bandwidth available, via NVLink, between the POWER9 CPU and the GPU, as well as the performance of PCIe Gen4 I/O devices, CAPI, and the scalability of cores.
On-premises support for POWER9 is already available and cloud simulators will be available soon. We have witnessed significant improvements in simulation speed and plan on future enhancements, optimizing performance for POWER9, that we will present in an update to this article.
Conclusions and Future Work
All of the benchmarks mentioned in this article, and the instructions regarding how to reproduce the results described above, are available in the OpenQASM repository. The interested reader will find QFT, Quantum Volume, Bernstein-Vazirani, and Counterfeit-Coin Finding algorithms in that repository.
QISKit opens the door to high-performance simulation for quantum circuits. Its simulators have evolved along with optimizations for the POWER architecture. Our current work focuses on creating a high-performance simulator that will exploit SMP and distributed memory parallelism, as well as the acceleration opportunities available on POWER processors with multiple GPUs attached via NVLINK. Preliminary results on such systems indicate a performance advantage greater than 10x. This simulator will be introduced in the near future and we encourage those interested to watch this space for an update on our progress.