Advancing the Potential of AI in Medical Imaging at MICCAI 2020

Share this post:

I believe one of the most promising areas for AI to make an impact is in the field of medical imaging. Through advancements in AI that allow for more intelligent and accurate analysis of video and still images, there is hope that clinicians will soon be able to widely augment the data and information they use in clinical practice with the help of machine learning and AI algorithms.

This year at the International Conference on Medical Image Computing & Computer Assisted Intervention (MICCAI), our IBM research team presented new breakthroughs and progress in propelling the vision of applicable AI models and techniques in this space into reality.

This includes new AI video analysis algorithms that have combined perfusion biophysics with machine learning to help medical researchers more accurately identify malignant tumors with 95 percent accuracy. Our work shows how AI models may be able to help provide radiologists and radiology teams a layer of review to help them to flag images that should be double-checked. Additionally, our team also presented progress as to the efficacy and potential impact that AI analysis can bring to specific diseases and prevention for wider populations, such as breast cancer screening and skin cancer.

This year at MICCAI 2020, an IBM Research team also received the best paper award of GRAIL, the third international workshop on Graphs in biomedical Image anaLysis, a satellite event of MICCAI 2020. This work demonstrates how Graph Neural Networks (GNNs) can assist in describing complex structures, such as heterogeneity in human tissues, in an interpretable and data-driven way.

Some of our notable new research being presented at the conference includes:

Learning to Read Tumors from Endoscopic Videos 

In what holds the potential to be a significant breakthrough in the area of medical camera technology and imaging, IBM researchers have created new AI video analysis algorithms that are designed to combine biophysics of perfusion with machine learning. These models are based on the hypothesis that observation of differences in structure of vasculature and perfusion patterns using fluorescence could be used to differentiate between benign, malignant, and healthy tissue, and that perfusion patterns can serve as a marker to identify most of the benign and malignant tumors intra-operatively. Now being tested and explored at a major Irish hospital, these models have proven successful at matching the intra-operative interpretation of an expert surgeon with 95% accuracy with 100% sensitivity and 92% specificity for patient-based correctness (i.e., compared with post-operative pathology findings on excised tissue). (1)

NIR-intensity time-series for two ROIs; I0 (t) (ROI 0) and I1 (t) (ROI 1). At time instant t the value Ii(t) equals the mean of the intensity taken over ROI i, i = 0, 1, and the bands denote ± one standard deviation around that mean. Panel B: White – visible light video sequence, NIR – NIR light sequence. Panel C: ROI with surgical team annotation, and a classification result showing ROIs correctly classified as normal (green) and cancer (light blue).  

The Case of Missed Cancers: Applying AI as a Radiology “Safety Net”

According to recent papers [2-7], AI models’ performance in identifying breast cancer within 12 months is similar – if not better than – radiologists’. This indicates that AI systems could potentially assist radiologists in breast cancer screening. However, most of the reports lack information about their expected performance in real-life settings, and the existence of false negative cases by radiologists in their dataset.

In this work, we investigated the potential contribution of an AI system as a safety net application for radiologists in breast cancer screening. The AI models we developed alert radiologists to cases suspected to be malignant which the radiologist did not recommend for a recall, while maintaining a low number of false alerts. Reducing AI’s false alarms is key, as computer-aided diagnosis systems have been shown in the past to generate a large number of false positive findings, slowing the radiologist’s work without contributing to their performance [8]. Using held-out data enriched with missed cancers, the safety net demonstrated a significant contribution to the radiologists’ performance even when they utilized both mammography and ultrasound images. In a multi-reader study with five radiologists over 120 exams, 10 of which were originally missed cancers, the AI system was able to assist 3 out of the 5 radiologists in detecting missed cancers without raising any false alerts. (9)

Chest X-Ray Report Generation Through Fine-Grained Label Learning

Chest X-rays are among the most common imaging modality read by radiologists in hospitals, and in tele-radiology practices today. AI can help in obtaining automated preliminary reads that can potentially expedite clinical workflows, improve accuracy and reduce overall costs. However, the quality of reports generated by current automated approaches is not yet clinically acceptable, as they cannot yet ensure the correct detection of a broad spectrum of radiographic findings, nor describe them accurately in terms of laterality, anatomical location and severity. At MICCAI, we’re presenting a new paper which unveils a domain-aware automatic chest X-ray radiology report generation algorithm that combines deep learning with document retrieval ideas to reports from a large report database. We’ve also developed an automatic labeling algorithm for assigning such descriptors to images, and built a novel deep learning network that recognizes both coarse and fine-grained descriptions of findings. This resulting report generation algorithm significantly outperforms the state of the art using established metrics. The spectrum of findings covered by this definitive work is the largest to date, and the semantically meaningful reports produced establish a new benchmark for automatic report generation methods. (19, 20)

Multi-Task Learning for Detection and Classification of Breast Cancer in Screening Mammography

It’s well known that breast screening is often an effective method to identify breast cancer in asymptomatic women. Deep learning provides a valuable tool to help support this critical decision point.

Algorithmically,  accurate  assessment  of  breast  mammography  requires both  detection  of  abnormal  findings  (object  detection)  and  a  correct decision whether to recall a patient for additional imaging (image classification). In this paper, we present a multi-task learning approach that we argue is ideally suited to this problem. We train a network for both object detection and image classification based on state-of-the-art models, and demonstrate significant improvement in the recall vs. no recall decision on a multi-site, multi-vendor data set, measured by concordance with biopsy proven malignancy. We also observe improved detection of micro-calcifications,  and  detection  of  cancer  cases  that  were  missed  by radiologists, demonstrating that this approach could provide meaningful support for radiologists in breast screening (especially non-specialists). Moreover, we argue this multi-task framework is broadly applicable to a wide range of medical imaging problems that require a patient-level recommendation, based on specific imaging findings. (10 – 12)

Fairness of classifiers across skin tones in dermatology

Recent advances in computer vision and deep learning have led to breakthroughs in the development of automated skin image analysis [13, 14]. In particular, skin cancer classification models have achieved performance higher than trained expert dermatologists [16]. However, no attempt has been made to evaluate the consistency in machine learning models’ performance across populations with varying skin tones [15, 17]. In this paper, we present an approach to estimate skin tone in benchmark skin disease datasets and investigate whether model performance is dependent on this measure (18). We find that the data analyzed in this work have a higher proportion of lighter skin samples, while there is a consistent under-representation of darker skinned populations. We also find no measurable correlation between the performance of machine learning model and different skin tones values, though more comprehensive data is needed for further validation. Our findings point to the need for further expansion of racially and geographically diverse datasets that are accessible for the training of machine learning models. Moving towards this will allow the potential of machine learning tools to reach a greater potential and potentially be successfully and fairly incorporated into clinical practice.

The skin tone distribution figure shows the distributions of the ITA values estimated from the non-diseased skin regions of the images in the entire ISIC2018 and SD-136 datasets. Both datasets are found to predominantly lie in the Light category.

HACT-Net: A Hierarchical Cell-to-Tissue Graph Neural Network for Histopathological Image Classification

Histopathological diagnosis, prognosis, and therapeutic response prediction are often heavily influenced by the relationship between the histopathological structures and the function of the tissue. Recent approaches acknowledging the structure-function relationship have linked the structural and spatial patterns of cell organization in tissue via cell-graphs to tumor grades. Though cell organization is imperative, it can be insufficient to entirely represent the histopathological structure. (21)

At MICCAI, we proposed a novel hierarchical cell-to-tissue-graph (HACT) representation to improve the structural depiction of the tissue. It consists of a low-level cell-graph, capturing cell morphology and interactions, a high-level tissue-graph, capturing morphology and spatial distribution of tissue parts, and cells-to-tissue hierarchies, encoding the relative spatial distribution of the cells with respect to the tissue distribution. Further, a hierarchical graph neural network (HACT-Net) is proposed to efficiently map the HACT representations to histopathological breast cancer subtypes.

We assess the methodology on BRACS (BreAst Cancer Subtyping), a large set of annotated tissue regions of interest from H&E stained breast carcinoma whole-slides. Upon evaluation, the proposed method outperformed recent convolutional neural network and graph neural network approaches for breast cancer multi-class subtyping. The proposed entity-based topological analysis is more in line with the pathological diagnostic procedure of the tissue. It provides more command over the tissue modeling, therefore encourages the further inclusion of pathological priors into task-specific tissue representation.

Currently, the BRACS dataset is pending approval for releasing to the research community. It contains tumor regions-of-interest from seven breast tumor subtypes, including the challenging atypical categories. Further, the tumor regions in BRACS demonstrate high variability by accommodating more realistic scenarios, such as, varying tumor sizes, varying grades, stain variance, tissue preparation artifacts, tissue marking artifacts etc. (22)

We look forward to working with other leaders in the medical imaging and technology space at this year’s MICCAI and future events take hold and evolve.


  2. Akselrod-Ballin, A., et al., Radiology 2019
  3. Yala A., et al., Radiology 2019
  4. McKinney, S.M., et al., Nature 2020
  5. Kim, H.-E., et al., Lancet Digit. Health 2020
  6. Schaffter, T., et al., JAMA Netw. Open 2020
  7. Rodriguez-Ruiz, A., et al., J. Natl Cancer Inst., 2019
  8. Lehman, C.D., et al., JAMA Intern. Med. 2015
  13. Celebi, M.E., Codella, N., Halpern, A.: Dermoscopy image analysis: Overview andfuture directions. IEEE J. Biomed. Health 23(2), 474{478 (Mar 2019)8.
  14. Celebi, M.E., Codella, N., Halpern, A., Shen, D.: Guest editorial: Skin lesion imageanalysis for melanoma detection. IEEE J. Biomed. Health 23(2), 479{480 (Mar2019)
  15. Barocas, S., Selbst, A.D.: Big data’s disparate impact. Calif. Law Rev. 104(3), 671{732 (Jun 2016)
  16. Haenssle, H.A., Fink, C., Schneiderbauer, R., Toberer, F., Buhl, T., Blum, A.,Kalloo, A., Ben Hadj Hassen, A., Thomas, L., Enk, A., Uhlmann, L.: Man againstmachine: Diagnostic performance of a deep learning convolutional neural networkfor dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann.Oncol. 29(8), 1836{1842 (Aug 2018)
  17. Wilson, B., Homan, J., Morgenstern, J.: Predictive inequity in object detection.arXiv:1902.11097 (Feb 2019)

Director, Health Informatics, IBM Research

More Healthcare stories

Finding new uses for drugs with generative AI

New research published in the journal "Frontiers" demonstrates advanced AI algorithms successfully and rapidly modeling clinical trials to find new uses for existing drugs and therapeutics.

Continue reading

IBM researchers use epidemiology to find the best lockdown duration

In our recent paper, "Optimal periodic closure for minimizing risk in emerging disease outbreaks," published in PLoS One, we developed a technique to calculate the optimal duration of a periodic lockdown during an outbreak of an infectious disease where there is no cure or vaccine. Our findings are different from the lockdown duration being widely applied, today.

Continue reading

Speech-to-text AI could help doctors prescribe placebo to ease chronic pain

In a newly published paper “Quantitative language features identify placebo responders in chronic back pain” in the peer-reviewed journal PAIN, we report the first proof-of-concept that uses AI to analyze patients’ clinical trial experiences. The AI quantifies a placebo response in patients with chronic pain and distinguishes those who respond to placebo from those who do not.

Continue reading