Teaching AI to Learn from Non-Experts

Share this post:

Today my IBM team and my colleagues at the UCSF Gartner lab reported in Nature Methods an innovative approach to generating datasets from non-experts and using them for training in machine learning. Our approach is designed to enable AI systems to learn just as well from non-experts as they do from expert-generated training data. We developed a platform, called, that allows non-experts to analyze images (a common task in biomedical research) and create an annotated dataset. The platform is complemented by a set of algorithms specifically designed to interpret this kind of “noisy” and incomplete data correctly. Used together, these technologies can expand applications of machine learning in biomedical research.

Non-experts and noisy data

The limited availability of high-quality annotated datasets is a bottleneck in advancing machine learning. By creating algorithms that can deliver accurate results from lower-quality annotations—and a system for rapidly collecting such data—we can help alleviate the bottleneck. Analyzing images for features of interest is a great example. Expert image annotation is accurate but time-consuming, and automated analysis techniques such as contrast-based segmentation and edge detection perform well under defined conditions but are sensitive to changes in experimental setup and can produce unreliable results.

Image annotations by non-experts versus an expert

Non-expert image annotations are noisy. Ten non-experts outlined the dark black circles in the image, which are cell nuclei. Their results (shown in orange) do not match up exactly. Our algorithms are able to infer a consensus outline (shown in purple) from the noisy data. Compare this consensus with expert annotation of the same image (shown in green).

Enter crowd-sourcing. Using, we obtained crowd-sourced image annotations 10–50 times faster than it would have taken a single expert to analyze the same images. But, as one might expect, annotations from non-experts were noisy: some correctly identified a feature and others were off-target. We developed algorithms to process the noisy data, inferring the correct location of a feature from the aggregation of both on- and off-target hits. When we trained a deep convolutional regression network using the crowd-sourced dataset, it performed nearly as well as a network trained on expert annotations, with respect to precision and recall. Along with the paper describing our approach and strategy, we released the source code for our algorithm.

Applications in cellular engineering

Image analysis is central to many fields of quantitative biology and medicine.  A few years ago we and our collaborators announced the NSF-funded Center for Cellular Construction (CCC), a science and technology center that is pioneering the new scientific discipline of cellular engineering. CCC facilitates close collaboration between experts of different disciplines, like machine learning, physics, computer science, cell and molecular biology, and genomics, to drive progress in cellular engineering. We aim to study and create cells that can be used as automated machines, or ad hoc sensors, to learn new and vital information about a variety of biological entities and their relationship with the environment they live in. We use image analysis to pinpoint the position and size of internal cell components. But even with advanced imaging techniques, exact inference of cellular substructures may be incredibly noisy, making it difficult to operate on the cell’s components. Our technique can use this noisy data to correctly predict where the relevant cellular structures may be, allowing better identification of organelles involved in production of important chemicals or potential drug targets in a disease.

We believe our algorithms are an important first step toward more complex AI platforms. Such systems may use additional “human in the loop” paradigms, by involving a biologist to correct mistakes during the training phase, for example, to further improve performance. We also see an opportunity to apply our method beyond biology to other fields where high-quality annotated datasets may be scarce. a tool for rapid, flexible, crowd-based annotation of images
Alex J. Hughes, Joseph D. Mornin, Sujoy K. Biswas, Lauren E. Beck, David P. Bauer, Arjun Raj, Simone Bianco and Zev J. Gartner
Nature Methods 31 July 2018

More Publications stories

Unlocking the Potential of Today’s Noisy Quantum Computers for OLED Applications

Scientists at Mitsubishi Chemical, a member of the IBM Quantum Hub at Keio University in Japan, reached out to our team about experimenting with new approaches to error mitigation and novel quantum algorithms to address these very challenges. In the new arXiv preprint, “Applications of Quantum Computing for Investigations of Electronic Transitions in Phenylsulfonyl-carbazole TADF Emitters,” we – along with collaborators at Keio University and JSR - describe quantum computations of the “excited states,” or high energy states, of industrial chemical compounds that could potentially be used in the fabrication of efficient organic light emitting diode (OLED) devices.

Continue reading

IBM Research at SIGMOD 2020

ACM SIGMOD/PODS 2020 like many other events impacted by COVID-19 pandemic will be taking place virtually from June 14 through June 19. The focus of work at SIGMOD 2020 ranges from adding graph querying to relational databases, to natural language interfaces to data, to operationalizing data for new AI workloads. Results to be presented includes work done at our IBM Research-Almaden and IBM Research-India labs, as well as by our summer interns from universities and our partners in other IBM units.

Continue reading

Free of Heavy Metals, New Battery Design Could Alleviate Environmental Concerns

Today, IBM Research is building on a long history of materials science innovation to unveil a new battery discovery. This new research could help eliminate the need for heavy metals in battery production and transform the long-term sustainability of many elements of our energy infrastructure.

Continue reading