Using Generative AI to Accelerate Drug Discovery

Share this post:

Novel drug design is difficult, costly and time-consuming. On average, it takes $3 billion and 12 to 14 years for a new drug to reach market. One third of this overall cost and time is attributed to the drug discovery phase requiring the synthetization of thousands of molecules to develop a single pre-clinical lead candidate.

At IBM Research AI, we’re researching ways to leverage artificial intelligence -based models to expedite this discovery phase at a significantly lower cost.

The drug discovery landscape today

Deep generative models, such as variational autoencoders and generative adversarial networks, are considered promising for computational creation of novel molecules due to their state-of-the-art results in virtual synthesis of images, text, speech, and image captions. Virtual creation of new and optimal lead candidates requires exploring  and performing a multi-objective optimization in a vast chemical space, as the model needs to assess and balance between critical factors such as drug activity, selectivity, toxicity, ease of synthesis, stability, etc. Such multi-objective optimization is handled using either conditional generative models or optimization methods such as Bayesian optimization.

These types of approaches either need access to a large amount of labeled data for training the generative model, involve expensive and/or inefficient optimization techniques, or do not generalize to unseen situations such as designing drug candidates for a novel viral target (e.g. SARS-CoV-2 proteins).

Designed antimicrobial sequences with their experimentally validated broad-spectrum potency, and in vitro and in vivo toxicity.

Using AI to make a complex process smarter

Recently, us and other IBM researchers looked at the antimicrobial peptide (AMP) design problem. In particular, generating new and optimal antimicrobial peptides by learning from limited repository of known AMP sequences is an incredibly challenging task. We view this research as critical given that AMPs are viewed as a drug of last resort against antimicrobial resistance, one of the biggest threats to global health, food security, and development. It is thought that bacterial co-infections and widespread antibiotic use in COVID-19 could further fuel antibiotic resistance worldwide.

In our paper, “Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics,” we propose a new, sample-efficient approach for targeted design of optimal molecules on the AMP design problem. This approach leverages guidance from property predictors trained on the latent features of the molecules. And notably, it does not need any complicated optimization or model training, and it can learn from labels that are scarce.

This work follows a two-step approach leveraging semi-supervision and self-supervision. First, we learn a latent representation by training a state-of-the-art peptide autoencoder that includes two jointly trained neural networks on an abundant amount of unlabeled peptide sequences available in the Uniprot database. Interpolation between peptides in this latent space shows smooth transition of the physico-chemical and functional properties.

We provide a visual platform for exploring the modeled peptide space here.

We then perform attribute-controlled generation of antimicrobial peptides with the new approach by using the latent features from the pre-trained autoencoder. Additional in silico screening using deep learning and high-throughput molecular simulations allowed us to select 20 designed novel peptides for wet lab synthesis and validation, of which two were confirmed to possess broad-spectrum potency, even to a multi-drug resistant strain, and low toxicity.

Left. Generated molecules with high SARS-CoV-2 Main Protease binding activity and selectivity. Right. A selected designed molecule docked to the SARS-CoV-2 Main Protease.

Applying the research to COVID-19 antiviral design

The additional advantage of the proposed generative model became  evident when we applied it to the computational design of antiviral therapeutic molecules targeting COVID-19. Due to the novel nature of COVID-19, there exists very limited binding affinity data between SARS-CoV-2 target proteins and small, drug-like molecules, making it challenging to generate drug molecules with high affinity to novel SARS-CoV-2 proteins. Additionally, accounting for high target selectivity becomes crucial for optimal drug generation in order to avoid potential undesired toxic and adverse effects arising from off-target activities, which could lead to failure in the later stages of discovery.

The proposed generative framework for this work tackles challenges by learning the protein-ligand binding relationships on the pre-trained latent features of protein sequences and small drug-like molecules, which were obtained using large corpuses of unlabeled data. As shown in “Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models,” the framework can generate molecules with high binding affinity to unseen proteins in a target-specific and selective manner. The generated molecules were extensively screened in silico, demonstrating their potential in terms of target structure binding, ease of synthesis, and parent molecule and metabolite toxicity.


A snapshot of the COVID-19 Molecule Explorer

A selected set of the generated molecules targeting SARS-CoV-2 is exposed in the IBM visual molecular explorer platform under an open license. This open sharing of the AI-generated artefacts in the explorer is the first step taken toward establishing a community to aid in finding optimal designs in the most efficient manner possible. Toward this goal, we are closely working with a number of academic partners including Oxford University, UK, A*Star, Singapore, Renseller Polytechnique Institute, and Rice University.

Ushering in a new era of accelerated discovery

At IBM Research, we are developing robust and scalable AI tools and platforms that establish and support multi-disciplinary communities of discovery. We believe collaborative efforts like these will accelerate how we currently perform complex scientific discovery tasks, including designing and optimizing novel materials and molecules, and transition us into a new era of accelerated discovery.

To learn more about this work and more from IBM Research AI, click here.


Principal Research Staff Member and Manager, Trustworthy AI Generative Modeling Lead, IBM Research

Hendrik Strobelt

Research Staff Member, IBM Research

Vittorio Caggiano

Program Manager, Emerging Technology Experiences (ETX)

More AI stories

IBM PAIRS Geoscope Reveals Environmental and Societal Impacts of COVID-19

Using sophisticated geospatial technology known as IBM PAIRS Geoscope, IBM researchers are shedding light on the environmental and societal impacts of the COVID-19 pandemic.

Continue reading

Largest Dataset for Document Layout Analysis Used to Ingest COVID-19 Data

Documents in Portable Document Format (PDF) are ubiquitous with over 2.5 trillion available from insurance documents to medical files to peer-review scientific articles. It represents one of the main sources of knowledge both online and offline. For example, just recently The White House made available the COVID-19 Open Research Dataset, which included links to 45,826 papers […]

Continue reading

COVID-19 HPC Consortium Calls for More Proposals from Researchers Worldwide

Researchers globally have been using the world’s fastest computers thanks to the COVID-19 HPC Consortium for nearly two months now – but there is still supercomputing capacity, and the partnership is calling for more proposals. “There is real hunger on the free resource providers side for good projects,” said Jim Brase, Program Leader at Lawrence […]

Continue reading