Boosting our understanding of microbial world with software repurposing

Share this post:

Developing new software for a specific scientific task can be time-consuming and costly. Software repurposing can help — at times it can even improve the results of the task compared to the traditional methods. This is exactly what our global team from IBM Research Daresbury in the UK, and Almaden and Yorktown in the US has achieved.

In our latest paper, “Re-purposing software for functional characterization of the microbiome,” published in the Microbiome Journal, we propose a way to improve the speed, sensitivity and accuracy of what’s known as microbial functional profiling — determining what microbes in a specific environment are capable of. Our method is based on clever reuse of bioinformatic tools that were originally developed for a different task.

Microbial functional profiling can help improve our limited understanding of the world of the teeny tiny organisms that live all around and also inside us — our microbiome. When microbes throw a party, we can get stomach aches, bloating and other issues, but doctors may find it hard to treat them effectively.

Microbes are all around us and understanding them better is important to help us keep our health in check by better understanding various diseases and the environment.

Functional profiling can help. It’s part of metagenomics — data-intensive science that involves sampling an environment with genomic technologies. Metagenomics takes raw data on a computational journey, to give scientists information so that they can make biologically-relevant insights about the nature of microbes in a specific environment and assess what they are capable of.

But because it’s so computationally intensive, it can take hours and sometimes days to perform a metagenomic analysis. Each metagenomics experiment can generate several gigabytes of data that have to be processed in computational workflows.

These workflows consist of multiple steps and tools. Typically, the first steps after quality control and filtering include taxonomic classification — identifying which microbes are present — and functional profiling. Functional profiling is often more relevant for practical applications, but the computational effort to run it can be massively higher than that of taxonomic classification.

This is where our research can be of use. We have developed computational techniques that could help improve our limited knowledge of microbiome by making it much easier and less computationally intensive to run microbial functional profiling. And we did it using previously existing software, well-known within the scientific community.

How repurposing started

The inspiration to perform software repurposing came from our previous work on a classifier we dubbed PRROMenade, as well as on IBM’s Functional Genomics Platform. PRROMenade uses a tree-shaped data structure to propose direct, one-step functional annotation for metagenomics reads. It is powered by k-mer (short DNA subsequence of length k) based algorithms that enable several well-known taxonomic profiling tools, and relies on variable  length sequence matching that is more flexible than fixed-size k-mer methods.

We knew from our experience that the k-mer-based algorithms were much faster than traditional functional profiling methods. That’s because they relied on computationally simpler string-matching operations, often performed in-memory due to the smaller size of pre-requisite look-up database. So we decided to test if it was possible to repurpose the commonly used taxonomic profiling tools to perform both taxonomic and functional profiling.

First, we compared the microbiomes of several people with plant- and animal-based diets, where diet has a visible impact on the gut microbiome and its functions. This takes the saying “you are what you eat” to a whole new level: it’s not just the person who is affected by his or her every meal but their gut bacteria as well. We also compared soil bacterial communities across the globe, linking antioxidant and nutrient reservoir activity with geographical influences. Insights into keeping a healthy soil microbiome can be critical for food security and tackling climate change — soil is a vast carbon sink, effective in removing CO2 from the atmosphere and storing it as carbon via the microbiome.

Our tests showed an improvement in functional profiling in speed and accuracy. We found that repurposed software helps cut down the processing time and remove the need for an extra tool. Another advantage is that these tools can run on large machines as well as standard laptops.

We believe that our results could help speed up an important computational step in metagenomics data processing. They also show that software repurposing is not only possible in metagenomics, but it has potential to diversify the usage of existing tools, effectively cutting down time in software development and adaptation. Next, we aim to investigate a diverse range of samples to gain biological insight into microbes’ behavior in different environments.

We hope that our research results could help push the limits of scientists’ understanding of the secret lives of microbes so that we are able to deal with them much more effectively than ever before. The wheel really doesn’t have to be reinvented every time there is a new problem. Software repurposing can help cut down development time, reduce the learning curve and improve the quality of results compared to traditional methods through clever algorithmic improvisations — and we should do it more often.


Gardiner, LJ., Haiminen, N., Utro, F. et al. Re-purposing software for functional characterization of the microbiome. Microbiome 9, 4 (2021).


Inventing What’s Next.

Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.


Computational Genomics Lead, Research Staff Member, IBM Research

Ekaterina Moskvitch

IBM Research Editorial Lead

More Science stories

Atomic force microscopy helps clear the haze surrounding Saturn’s moon Titan

We have unveiled in the laboratory new details on how the famous Titan haze may have formed and what its chemical make-up looks like. Our findings in the latest issue of the Astrophysical Journal detail how we've resolved molecules of different sizes, giving snapshots of the different stages through which molecules grow to build up the haze.

Continue reading

Peeking into AI’s ‘black box’ brain — with physics

Our team has developed Physics-informed Neural Networks (PINN) models where physics is integrated into the neural network’s learning process – dramatically boosting the AI’s ability to produce accurate results. Described in our recent paper, PINN models are made to respect physics laws that force boundaries on the results and generate a realistic output.

Continue reading

IBM’s innovation: Topping the US patent list for 28 years running

A patent is evidence of an invention, protecting it through legal documentation, and importantly, published for all to read. The number of patents IBM produces each year – and in 2020, it was more than 9,130 US patents – demonstrates our continuous, never-ending commitment to research and innovation.

Continue reading