IBM Brings AI Retrosynthetic Analysis to the Cloud

Share this post:

IBM RXN for Chemistry receives a major upgrade as more AI continues to be computed in the cloud

The future of computing is one of the strongest transformational forces on our planet. Everything we touch has built-in computing capabilities and is generating tremendous volumes of data. The impact is not only speeding up our daily lives, but also more traditional industrial sectors, including chemistry.

Last year at the ACS Fall Meeting 2018 in Boston, IBM Research released IBM RXN for Chemistry, a cloud-based app that takes the idea of relating organic chemistry to a language. The magic behind the app is a state-of-the-art neural machine translation method, which can predict the most likely outcome of a chemical reaction using sequence-to-sequence (seq2seq) models. We designed the tool, using a simple ketcher drawing interface and made it available in the IBM Cloud to predict the outcome of forward chemical reactions

Since the launch, the response has been overwhelming, with more than 6,500 users and 50,000 reaction predictions in 12 months. The service is also currently being used across several laboratories around the world for automation experiments involving AI, where robots can learn in real-time, based on feedback from IBM RXN and wet-lab experiments. Humans and machines coming together to discover – that’s powerful.

We call our AI model, the Molecular Transformer. It was trained end-to-end, and is fully data-driven and without the aid of querying a database or any additional external chemical information. One of the most popular features enables users to create projects and share them with friends or colleagues and collaborate on complex multi-step reaction synthesis or novel chemical reaction designs.

At the time, we outperformed all data-driven models achieving more than 90% top-1 accuracy on forward chemical reaction prediction for the first time.

Today, the forward model performance is still unbeaten.

IBM RXN Goes Retro in the Cloud

Similar to the forward chemical prediction problem casted as a translation problem from one language (reactants+reagents) into a second language (products), the team immediately noticed the parallelism of inverting the translation process. Instead of predicting the outcome of a possible chemical reaction, the inverse problem determines the chemicals needed to create a given target molecule. This process, referred to as retrosynthesis, is very well known in chemistry. It’s a task mastered today by human experts, specifically synthetic organic chemists.

IBM Researchers, Philippe Schwaller and Riccardo Petraglia (IBM Research – Zurich) examine the process behind a retrosynthetic analysis.

In the last 12 months, our team worked to improve the user experience on IBM RXN. We seamlessly extended the approach of an inverse translation task, gearing up for something big in 2019 – retrosynthetic analysis or the sequence of chemical reactions to make a given target.

The implementation of the retrosynthesis is more challenging than training an AI model for forward chemical reaction predictions. While there have been several works reported in the last couple of years on the implementation of the retrosynthesis, all of them were actually validating the performance of the methodology with a handful of successful chemical synthesis validated by domain experts.

Our first goal was to construct an evaluation metric from zero to understand the strengths and the limitation of the model and to provide a way to systematically improve the AI technology.

A team effort, involving domain expertise

Collaborating with synthetic organic chemists from the University of Pisa, Italy: Prof. Anna Iuliano and Valerio Zullo

Collaborating with a team of synthetic organic chemists at the University of Pisa (Italy), we examined the human process behind a retrosynthetic analysis in detail. When faced with the challenge to design a synthesis of a new molecules, a human expert analyzes, in an unbiased way, all possible bonds which, if broken, generate precursors that could react to give the target starting chemical. The analysis is then iterated until commercial materials are reached.

When deciding which bond is best for retrosynthetic analysis, a human expert considers the following factors to measure the goodness of the resulting strategy: number of steps (the lower the better), use of high yields reactions, high level of selectivity and cost of starting precursors. In our research, a team of synthetic organic chemists validated the quality of the predictions at each stage of development, calibrating the individual components and providing valuable feedback on the AI strategies that required improvements.

Finally, the team at the University of Pisa rated the prediction of the retrosynthesis using the same metrics they routinely use with students in their courses on retrosynthesis to eliminate any bias and to offer an objective analysis of the current state of the art. It was a tough exam to pass for our AI model, but IBM RXN for Chemistry was successful at providing solutions for most of the set exam exercises.

On assessing its performance, it became evident that there are certain gaps and biases in model, which we were able to track down to the quality of the dataset used.

The dataset extracted from patents is the same used in several other works that, although reporting only successful retrosynthesis, suffer similar challenges. We are currently working on data curation strategies to improve the overall quality of the dataset and prepare IBM RXN for passing its chemistry exam with high marks.

Discover Today

Using statistics, the retrosynthetic architecture considers all the relevant aspects considered by humans when designing a retrosynthesis. Similar to the forward prediction model, we worked with the same 2.5 Million chemical reactions, using them to train both the forward and backward chemical reaction prediction models.

The core of the forward/backward prediction models is the Molecular Transformer, released in 2018. The entire exploration of the possible disconnection patterns uses a Bayesian-like probability to decide which, among many disconnection possibilities, are the most effective. We also introduced novel statistical metrics to evaluate the performance of the entire model as well as that of the individual components.

Try IBM RXN for Chemistry today to prepare for your synthetic organic chemistry class or to get inspiration for chemical problems in your daily research work. Think of it as your chemical companion.

Available now in the IBM Cloud to accelerate the transformational process of chemistry into a high-tech business:


Predicting retrosynthetic pathways using a combined linguistic model and hyper-graph exploration strategyPhilippe Schwaller, Riccardo Petraglia, Riccardo Pisoni, Costas Bekas, Teodoro Laino (IBM Research – Zurich); Valerio Zullo, Anna Iuliano (University of Pisa)


Distinguished RSM, Manager

More AI stories

IBM Research at EMNLP 2020

At the annual Conference on Empirical Methods in Natural Language Processing (EMNLP), IBM Research AI is presenting 30 papers in the main conference and 12 findings that together aim to advance the field of natural language processing (NLP).

Continue reading

DualTKB: A Dual Learning Bridge between Text and Knowledge Base

Capturing and structuring common knowledge from the real world to make it available to computer systems is one of the foundational principles of IBM Research. The real-world information is often naturally organized as graphs (e.g., world wide web, social networks) where knowledge is represented not only by the data content of each node, but also […]

Continue reading

The Rensselaer-IBM Artificial Intelligence Research Collaboration advances breakthroughs in more robust and secure AI

Launched in 2018, the Rensselaer-IBM Artificial Intelligence Research Collaboration (AIRC) is a multi-year, multi-million dollar joint venture boasting dozens of ongoing projects in 2020-2021 involving more than 80 IBM and RPI researchers working to advance AI.

Continue reading