If you think about it, the chemical sciences have made great strides in the discovery of novel and useful materials over the past several decades. Just in the area of polymers alone, the recent development of thermoplastics or structural polymers has had a huge influence on numerous applications ranging from new paints to clothing fibers and photographic film. These are all examples of how chemical sciences have contributed to indispensable materials for everyday use.
The discovery of new materials is thedriving force in the expansion and improvement of industrial products.From tissue engineering todrug discovery or the discovery of sustainable materials that are environmentally friendly, there are an untold number ofchemical compounds yet to be discovered. However, the vastness of chemical space,which inhabits all possible design combinations of material structures, exceeds the capability of human experts toexplore even a small fraction it all. Just the sound of it is intimidating, especially to chemists.
But what if artificial intelligence (AI) could help crack open the door to the infinite possibilities that chemical space has to offer? This is the question my research team – Toshiyuki Hama, Hsiang Han Hsu, Akihiro Kishimoto – and I at the IBM Research lab in Japan asked ourselves before we embarked on a voyage to find ways to automate the process of designing brand new molecular structures. Today, we are proudly announcing the launch of the IBM Molecule Generation Experience (MolGX) (pronounced “Mol-gee-ex”), an IBM cloud-based, AI-driven molecular inverse-design platform, which automatically designs brand new molecular structures rapidly and diversely. Inverse-design aims to discover tailored materials from the property targets of the product.
MolGX is part of IBM’s accelerated discovery strategy, which aims to supercharge the scientific methodology using AI, hybrid cloud, automation, and eventually quantum computing. Its goal is to speed up the discovery of new materials by 10 to 100 times.
Structured data, which incorporate structures and characteristics of molecules, are extracted by IBM Deep Search and complemented by AI-accelerated simulation, and integrated into MolGX to train the generative AI. MolGX supports simplified molecular-input line-entry system, or SMILES, which represents a line drawing of a molecule as a sequence of characters, such as BrCCOC1OCCCC1. Candidate molecules generated by MolGX can be passed to IBM RXN for Chemistry, which predicts chemical reactions or retrosynthesis pathways and IBM RoboRXN, which automates chemical synthesis using robots.
Taking a new pathway
The development of new materials follows a number of different pathways, depending on both the nature of the problem being pursued and the means of investigation. Breakthroughs in the discovery of new materials span from pure chance, to trial-and-error approaches, to design by analogy to existing systems. While these methodologies have taken us far, the challenges and requirements for new materials are more complex — so too are the demands and issues for which new materials are needed.As we face global problems such as pandemics and climate change, thenecessityand urgency to design and develop new medicines and materials at a faster pace andon a molecular scale through to the macroscopic level of afinal product is becoming increasingly important.
Our aim in developing MolGX is to accelerate molecular inverse design through state-of-the-art molecular generative AI-model technology. You may have heard of AI engines that can draw realistic images of landscapes or portraits of people that don’t exist. These are called generative models, and rather than use them to create imaginary things, we’ve adapted this technology to automatically generate moleculesfrom desired chemical properties such as “solubility in water” and “heatability” by performing three simple steps: observing and selectinga dataset, training the AI model to predict chemical characteristics within the given parameters, and designing molecular structures based on the model built.
In collaboration with IBM Garage team in Tokyo – including, Makoto Kogo, Takumi Hongo, Kumiko Fujieda – we then made it scalable in the IBM Cloud and accessible with an easy-to-understand user interface with general users in mind. Essentially anyone, even those without advanced IT skills, can experience the cutting edge of materials informatics as well as learn the basics of AI.
We also created a version for chemistry professionals and industries, which includes additional functionalities such as exportation of results, customized modelling and data uploading to explore beyond the built-in dataset.
MolGX introduces a novel pathway to generating new materials as it reduces the development time and allows for the innovation of new materials that are not bound by fixed ideas.
Accelerating the design experience
Our novel AI-driven molecular inverse-design platform is ready to be deployed and tested with companies making materials, including at IBM, where we have applied it towards the development of a new photoacid generator, an important material in electronics manufacturing.
It offers material R&D three main advantages:
Firstly, it features an algorithm-basedencoding and structure generation process, as oppose to a data-driven one; so there is no pre-training of large datasets, nor themajor training costs that come with this task.
Secondly,the space and structure generation process are fullyinterpretable for chemists, and therefore easy to customizechemical insights about molecular structures.
Finally, hierarchical data structures and a clear UI provide a flexibleand intuitive workflow.
The MolGX professional version , which includes additional functionality, is already being deployed as a service at materials manufacturing company NAGASE & CO., LTD., running on IBM’s and NAGASE’s cloud. Inverse-design of sugar and dye molecules were carried out more than 10 times faster than the performance of human chemists. In addition, the diversity of molecular structures was expanded while still satisfying chemical rationality.
Just imagine all the new materials we could discover at this accelerated speed, and all the problem we could solve. One of the ways to realize a sustainable future is to discover new materials that can address global issues.For example, we could create new materials with properties that can rapidly adsorb carbon dioxide or convert sunlight into electricity without losing energy.
The properties of materials are determined by the shape of the molecules that make up the material, but there are almost infinite patterns in the shape of molecules, which is why it takes an enormous amounts of time to develop new materials – time we can’t afford to waste. There is a growing demand for technologies like MolGXthat can optimize materials designin a quicker and more cost-efficient manor by utilizing AI and hybrid cloud. Our platform is a perfect exampleof how AI and cutting-edge data processing technologies can put us on the fast track to the discovery of innovative materials that can make a significant impact on the environment and on our society.
A free, experience trial version of MolGXis available today using a built-in dataset. IBM Research is also offering an unlimited version with additional functionality including data upload, results exportation, customized modeling, and more with a license.
In our latest paper, “Unassisted Noise Reduction of Chemical Reaction Data Sets” in Nature Machine Intelligence, we explore the application of NLP techniques to automate the identification of “language outliers” or the noise in chemical datasets.
Our team has turned to AI to accelerate the design and discovery of better polymer membranes to efficiently separate carbon dioxide from flue gases — the results that we will present at the upcoming 2021 Meeting of the American Physical Society.