IBM Launches Free AI Tool in the Cloud for Predicting Chemical Reactions

Share this post:

For more than 200 years, the synthesis of organic molecules has remained one of the most important tasks in organic chemistry. The work of chemists has scientific and commercial implications that range from the production of Aspirin to that of Nylon. Yet, little has been done to change age-old practices dramatically and allow a new era of productivity based on pioneering artificial intelligence (AI) science and technologies.

IBM RXN for Chemistry is a simple, web-based AI tool hosted in the IBM Cloud. The AI model is trained end-to-end, fully data-driven and without the aid of querying a database or any additional external information.

The challenge for organic chemists in fields such as chemistry, materials science, oil and gas, and life sciences is that there are hundreds of thousands of reactions and, while it is manageable to remember a few dozen in a narrow specialist’s field, it’s impossible to be an expert generalist.

To address this, we asked ourselves, can we use deep learning and artificial intelligence to predict reactions of organic compounds?

Treating Chemistry like a Language

First, because we studied engineering and material sciences, but not organic chemistry, we had to hit the books. It wasn’t long before we started seeing organic chemistry everywhere — morning, noon and night. Atoms appeared instead of letters, molecules materialized from words and, then, something incredible happened: an idea was born.

We realized that organic chemistry datasets and language datasets have a lot in common: they both depend on grammar, on long-range dependencies, and a small particle or word like “not” can change the entire meaning of a sentence, just like stereochemistry can turn Thalidomide into either a medication or a deadly poison.

As non-native English speakers, we are both familiar with online translation tools, which work wonders in turning English into French, and German into English, so why not try to use them to turn random chemicals into functional compounds?

A New Level of Accuracy for AI and Chemistry

Last year at the NIPS 2017 Conference, we presented our results: a web-based app that takes the idea of relating organic chemistry to a language and applies state-of-the-art neural machine translation methods to go from designing materials to generating products using sequence-to-sequence (seq2seq) models. At the time we outperformed current solutions using their own training and test sets by achieving a top-1 accuracy of 80.3 percent and set a first score of 65.4 percent on a noisy single product reactions dataset extracted from US patents, but we knew we could get even more accurate.

On Monday, 20 August, at the American Chemistry Society meeting in Boston we are launching the AI tool in the IBM Cloud. It’s called IBM RXN for Chemistry and we will present a new level of accuracy at 11:25 AM in Alcott Room. (SPOILER ALERT) By simplifying the model and relaying more extensively on the attention mechanism we have reached 89{ccf696850f4de51e8cea028aa388d2d2d2eef894571ad33a4aa3b26b43009887} on the same data set used in the previous publication and by other groups — state of the art.

Using IBM RXN for Chemistry

Back in high school, we had to draw by hand the hexagons and pentagons and all the various lines representing bonds of organic molecules. Now we’ve brought up a system that takes the exact same representation and can predict how molecules will react within a click in seconds. Users can simply use the Ketcher, a web-based chemical structure editor designed for chemists, laboratory scientists and technicians, to input their molecules. It’s that easy.

Using SMILES, this molecule is translated into BrCCOC1OCCCC1

The overall tool is simple, and the model is trained end-to-end, fully data-driven and without the aid of querying a database or any additional external information. In addition, users can create projects and share them with friends or colleagues and collaborate on complex multi-step reaction synthesis or on novel chemical reaction designs.

The secret behind IBM RXN for Chemistry is what is called a simplified molecular-input line-entry system, or SMILES. SMILES represents a molecule as a sequence of characters. For instance, the image on the right becomes BrCCOC1OCCCC1.

We trained our model using a combination of reaction datasets, which corresponds to a total of 2 million reactions (extracted from patents and from textbook examples).

Also to be clear, we respect the privacy of the users. We will not be using any of the molecules saved in the project folders unless you grant access to them to further train our AI model.

Try it Today

Log in and use IBM RXN for Chemistry today — it’s completely free and available in the IBM Cloud.

For students we added a special challenge mode where they can test themselves against the system. It’s also a valuable tool for senior chemists as well. In fact, if you are attending ACS in Boston stop by IBM booth 530 and test your skills against our AI model.

Follow us on Twitter: @teodorolaino, @phisch124 and @TheophileGaudin and please submit your questions and feedback at #rxnforchemistry

“Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models, Philippe Schwaller, Théophile Gaudin, Dávid Lányi, Costas Bekas, Teodoro Laino









More AI stories

New advances in speaker diarization

In a recent publication, “New Advances in Speaker Diarization,” presented virtually at Interspeech 2020, we describe our new state-of-the-art speaker diarization system that introduces several novel techniques.

Continue reading

Could AI help clinicians to predict Alzheimer’s disease before it develops?

A new AI model, developed by IBM Research and Pfizer, has used short, non-invasive and standardized speech tests to help predict the eventual onset of Alzheimer’s disease within healthy people with an accuracy of 0.7 and an AUC of 0.74 (area under the curve).

Continue reading

State-of-the-Art Results in Conversational Telephony Speech Recognition with a Single-Headed Attention-Based Sequence-to-Sequence Model

Powerful neural networks have enabled the use of “end-to-end” speech recognition models that directly map a sequence of acoustic features to a sequence of words. It is generally believed that direct sequence-to-sequence speech recognition models are competitive with traditional hybrid models only when a large amount of training data is used. However, in our recent […]

Continue reading