Big data enlists microbes in battle for food safety


David Chambliss

Principal Research Staff Member
at IBM Almaden Research Center
Connect on LinkedIn

A new approach to food safety

Each year the food industry spends billions of dollars to keep the US food supply one of the safest in the world. Despite those efforts and expense, one in six Americans (about 48 million people) gets sick from foodborne illnesses caused by contamination, according to the Centers for Disease Control and Prevention. David Chambliss is one of a number of IBMers hoping to change that.

A Principal Research Staff Member in IBM’s Almaden Research Center, Chambliss is part of team that’s working with Mars, Inc. and the University of California, Davis, on a new approach to food safety, the Consortium for Sequencing the Food Supply Chain. The goal of the multi-year project is to have microbes, some of the very organisms behind many food-borne illnesses, become tiny sentries in the battle for food safety.

It’s important to find out if there are nooks and crannies where bacteria hide.

Looking for non-biological threats

“If there’s something that’s harmful to the people or animals that are eating the food product, some of the bacterial species that are literally swimming in that material are going to have their lives turned upside down, and some will die off,” he says. ““We want to use biology and analytics to measure both the biological and non-biological things that might be wrong with a given food or food environment.”

That’s not going to be easy. Consider this: scientists estimate there are 10,000 to 50,000 species living in just one gram of soil. And 90 percent of the human body’s 100 trillion cells are from bacteria, viruses and other microorganisms. To advance food safety, researchers first have to determine what a “normal” population of microorganisms is. Then they can track changes in that sample that could indicate a dangerous change. Key to conducting a census of any microbiome (an aggregate of microorganisms) are DNA analysis and lots of computing power.

“We also examine RNA. And RNA is important for a couple of reasons. RNA is more of a measure of what those organisms are doing because RNA is what active cells constantly generate to conduct metabolic processes. It is the stuff of life, the stuff of living. So we will only get RNA from things that are alive and biologically active.

“DNA is really a great mechanism of doing a census, for telling us who’s there in a sample.” says Chambliss. “And this isn’t just about bacteria, but also fungi, possibly viruses, all sorts of biological entities. But bacteria are the main thing we pay attention to.”

Protecting food for humans and pets

That distinction greatly reduces the amount of data that needs to be analyzed. The first phase of the project focuses on ingredients for both human and pet food.

“It’s important to look at the environment and find out if there are nooks and crannies where bacteria hiding. There is significant concern in food safety in making sure that there aren’t areas that are sheltered from the cleaning and sterilization.” The research projects of the Consortium will, in time, study micro-environments up and down the supply chain. “But in the first phase, our focus is going to be specifically on the ingredients,” says Chambliss. “Imagine you have a pile of meat meal that’s going to be used in a pet food product. Most of the DNA that you’re going to be measuring is from cows, and that’s not a pathogen. But the cow meat is not undergoing any more metabolism once it’s been rendered, and what we want to do is see the things that are active and alive. And that’s one of the key reasons for pulling out the RNA.” The RNA that shows up there will not be from the beef itself, but from the microbes that are living in the meat-meal environment.

We’re trying to do something that’s never been done before in metagenomics.

Microbes react to inorganic threats

A big advantage of enlisting microbes in food safety is that they can detect even non-biological dangers. Current food safety depends mostly on testing for known pathogens, and for nonorganic threats such as PCBs and insecticides. That means the system can overlook substances that haven’t threatened food supplies before, such as melamine, the additive that poisoned infant formula supplies in China. Microbes, however, will react to almost any threat that could harm humans.

“The usual assays are very good at looking for the things they know about. When something unlike whatever you’ve seen before shows up, you need a way to detect that something’s amiss even if you don’t know what it is with the first measurement,” explains Chambliss. “One of our key hopes is that we’ll be in a position to find out something is wrong much, much earlier than you would otherwise.” That early detection would not only minimize health problems, it would also avoid a product recall and a huge impact on a food brand.

IT key to project’s success

“A lot of what we bring to this problem is strength in information technology,” he says. “What we’re trying to do is use what IBM already knows how to do well in scaling up new kinds of computations. We want to make it easier to do the biological calculation. Key to this is bringing in lots of data and bringing it all these different data sets to bear on one another.

“What really makes this possible is that we’ll be able to put all the data in one place, in one big facility so we can have many terabytes of public reference data from other people’s experiments and sample data from the stream of food samples.

“For one thing, many of the genes we will find in our food samples will turn out to be genes already identified in laboratory studies, and biologists have determined what that gene does in the microorganism. For example, we may see genes that become active to protect the microbes under harsh conditions, possibly in response to sterilization procedures. We want to know that is happening, and we need to pull in data that biologists have published to let us make that identification. We refer to that as curated referenced material.”

DNA is really a great mechanism of doing a census, for telling us who’s there in a sample

Machine learning and advanced analytics

But the curated data is far from complete. Biological research only moves so fast. And there is also a larger volume of un-curated data that has not been fully analyzed. One example is a project led by Bart Weimer at UC Davis to collect genomes of 100,000 variants of foodborne pathogens.

“We are working with the raw data of that project, even before the sequence data is fully understood from a biological standpoint, With data at this scale we really need machine learning and other advanced analytics to connect the trends in that raw data with our observations from food samples. By treating this as a big data problem we expect to find meaningful correlations that will be useful for improving food safety and also will trigger deeper biological research.”