This article was featured in the Think newsletter. Get it in your inbox.
Protein language models can spot which proteins might make promising drug or vaccine targets, turning vast biological data into potential breakthroughs. The catch is that no one really knows how they make those calls. That black-box problem matters: drugmakers are reluctant to gamble millions on a molecule without understanding why it looks good.
A new study from MIT, paired with IBM’s release of open-source biomedical models, points to a shift toward AI that drug companies can actually trust. In a new paper published in the Proceedings of the National Academy of Sciences, the MIT team describes how a sparse autoencoder can extract meaningful biological features from a protein language model’s internal representations. These features align with real-world functions such as protein binding or metabolic activity. The result is a view into the model’s logic that could help researchers understand and verify its predictions.
“These models work fairly well for a lot of tasks,” Onkar Singh Gujral, a PhD student at MIT and lead author of the study, told IBM Think in an interview. “But we don’t really understand what’s happening inside the black box. Interpreting and explaining this functionality would first build trust, especially in situations such as picking drug targets.”
Protein language models use machine learning to study amino acid sequences the way natural language models study words. By training on massive collections of protein data, they learn the “grammar” of how proteins are built and function. This allows them to capture key insights about protein evolution, structure and biological roles.
The goal behind the new MIT paper is not only to explain model behavior, but to speed up validation, the process of checking whether a candidate molecule actually works as predicted. Interpretability can help scientists move faster by clarifying what a model is actually signaling. Michal Rosen-Zvi, Director of Healthcare and Life Sciences at IBM Research, said the benefit is both technical and practical.
“Interpretability accelerates drug discovery by enabling researchers to validate AI-driven hypotheses faster,” she told IBM Think in an interview. “It ensures scientific rigor through transparent reasoning and traceable biological mechanisms.”
Rosen-Zvi pointed out that AI systems are not immune to error. Predictions can be shaped by flawed data or internal quirks. Transparency makes it easier to detect those issues and move quickly to correct them. “Interpretability allows rapid cycles of verification, both of the model’s logic and the integrity of the underlying data,” she said.
There is a tradeoff. Models designed for interpretability are often less powerful than opaque ones, at least in raw performance, Rosen-Zvi said. She added that the most advanced algorithms tend to be the least transparent, requiring post hoc analysis to make sense of their behavior. Even so, she believes the extra effort is worth it, especially in biology, where mistakes can have costly consequences.
IBM recently open-sourced biomedical foundation models designed to give researchers clearer insight into how AI identifies promising drug candidates. The company is betting that transparency will be key to wider adoption of AI in pharmaceutical research.
Industry newsletter
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
The MIT team’s method works by taking a compressed representation of a protein and expanding it into a large, sparsely activated space. That makes it easier to see which specific biological features are driving the prediction. Some of the features identified in the study correspond to known protein families and molecular functions, while others align with broader biological categories, such as sensory systems. To make these features easier to interpret, the researchers used a language model to turn complex sequence patterns into plain-language summaries.
This level of visibility, Gujral said, allows researchers to evaluate not only whether a model is correct, but why, helping teams stay involved in the decision-making process. “You might also be able to discard unworthy candidates with human help if your model is interpretable,” he said.
Rosen-Zvi agreed that models that show their work can help engender trust. “Trustworthy AI enables meaningful collaboration between human expertise and machine intelligence,” she said. “It makes biases and limitations in biomedical data and models more visible.”
In domains like drug development, where data is often incomplete and complex, that visibility can improve both internal workflows and external communication. “Transparency around data provenance, openness in methodology and inclusive benchmarking” are all critical, she said.
Scientific rigor is not the only concern. Rosen-Zvi noted that interpretability also plays a social role, making it easier for scientists to communicate model results to colleagues, regulators or funders and to build trust in the decisions that follow.
“It is both a technical and trust challenge,” she said. “In biomedical sciences, this is further nuanced by the field’s dual reliance on mathematical modeling and narrative reasoning.”
These developments come at a moment when drugmakers are pulling AI deeper into pharmaceutical pipelines, turning to machine learning to identify targets and simulate compound behavior.
In the last few years, companies like Insilico Medicine have advanced AI-generated compounds into clinical trials. However, despite the rapid progress, to date, no AI-designed drug has reached the market.
One challenge is credibility. Researchers, investors and regulators often hesitate to rely on AI systems when they cannot verify the reasoning behind a prediction, Gujral said. He suggested that a more interpretable system could help research teams explain funding decisions, avoid dead ends and better prioritize resources. “It might be easier to justify using funds to pursue a particular candidate if you can clearly explain why the model considers it promising,” he said.
This push for transparency extends beyond academia. Alphabet’s Isomorphic Labs recently raised USD 600 million to support its AI drug discovery platform. It aims to build on the advances made by AlphaFold, which predicts protein structures with impressive accuracy but offers little insight into its internal process. Meanwhile, a new open-source tool called DrugReasoner attempts to predict drug approval likelihoods while showing its reasoning at each step.
Back at MIT, senior author of the study Bonnie Berger said her team tested their method across several layers of ESM, a popular protein language model, evaluating how interpretable the sparse features were compared to the original neurons in the context of GO terms, protein families, protein names, etc. In all layers, the sparse autoencoder features outperformed the interpretability scores of the original neurons, showing the superior interpretability of the method. Several of the interpretable sparse features they observed would play a significant role in drug design.
“The goal isn’t to replace human judgment,” he said. “It’s to combine it with what the model sees. That’s only possible if we understand how the model thinks.”
See how InstructLab enables developers to optimize model performance through customization and alignment, tuning toward a specific use case by taking advantage of existing enterprise and synthetic data.
Move your applications from prototype to production with the help of our AI development solutions.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.