IBM Research AI at NeurIPS 2019

Share this post:

Deep learning to better understand video. 8-bit training. AI for genomics and beer. The 33rd Conference on Neural Information Processing Systems (NeurIPS) in Vancouver is just around the corner, kicking off on December 8, 2019. Many of IBM Research AI‘s scientists are getting ready to showcase the results of their work – some in early stages of research, and some that’s getting closer to commercial applications.

IBM researchers from our labs around the world will present more than 100 papers across regular sessions and workshops at NeurIPS. They are all focused on different core technologies and use cases of AI. And a number of them will be on display in booth #111 with demos scientists will be presenting throughout the week.

This past year, IBM researchers have been busy investigating a plethora of topics. Advancing AI methods for learning is one of them, and IBM Research will have several papers on various aspects.

New conditional dependency method

One, by a team led by Youssef Mroueh from IBM Research, is on the new conditional dependency method, the Sobolev Independence Criterion. Sounds like a mouthful but it’s crucial when it comes to predicting a particular outcome. Typically, one needs to determine specific features among plenty of candidates in a dataset – a process called feature selection. It’s important, for instance, in genomics, when it comes to establishing the genes responsible for a specific disease.

IBM researchers have now come up with a new method to determine the features, by using the Sobolev Independence Criterion. It outranks previous feature selection algorithms by leveraging the expressiveness of deep learning architectures, according to the authors.

Speeding deep learning with 8-bit training

Another team has developed a method to dramatically speed up the process of creating complex deep learning models that typically takes weeks. The new approach is based on very low precision arithmetic (8-bits) and builds on previous 8-bit training techniques – including the ones pioneered by the same team at NeurIPS 2018.

The previous techniques “were shown to work well with specific computer vision deep learning models, including with ResNet50,” says Kailash Gopalakrishnan, one of the researchers on the team. Still, they “show considerable accuracy degradation,” he adds, in other AI models including MobileNets, used in mobile and IoT devices, and transformers, used in Natural Language Processing (NLP).

The latest results enable 8-bit training on the entire spectrum of complex AI models – in speech, vision and NLP – while fully preserving model accuracy. The key insight, says Gopalakrishnan, is that different data structures encountered during deep learning model training have different requirements on precision and dynamic range. The system is able to customize the 8-bit number format for these tensors differently.

The paper also proposes a new training protocol that greatly reduces the chip to chip communication overhead in large training systems – while using the same 8-bit training format. This is important because in the future, training systems using the technology could have two to four times higher performance over today’s best systems, says Gopalakrishnan.

Causal Graphs

Another team, a joint collaboration between MIT-IBM Watson AI Lab, Purdue University and Columbia University, will present their results on characterizing the set of plausible causal graphs from observational and interventional data. This work is all about learning causal relationships – the classic aim of which is to characterize all possible sets that could produce the observed data. In the paper, the researchers provide a complete characterization of all possible causal graphs with observational and interventional data involving so-called ‘soft interventions’ on variables when the targets of soft interventions are known.

The novelty is that the work accommodates hidden variables and uses testable relationships across dynamic distributions that form the popular “do-calculus” of Judea Pearl in causal inference for mapping observational distributions. “We also provide a novel sound learning algorithm,” says one of the authors, an IBM Research scientist Karthikeyan Shanmugam. “This work potentially could lead to discovery of other novel learning algorithms that are both sound and complete.”

Learning Efficient Video Representations

Then there will be a paper on learning efficient video representations – important for a wide range of applications such as video indexing and retrieval, automatic video content enrichment and human-robot interactions. In recent years, there has been a lot of progress in video understanding, with approaches that use complex 3D convolutional neural networks (3D CNNs) to learn spatio-temporal representations for recognition. But there is a problem: such models need to be quite deep and require the analysis of many video frames. This makes training and inference of a 3D model very expensive.

The IBM team has developed a simple, memory-friendly 2D network architecture of entangled spatial and temporal information that outperforms more sophisticated 3D ones – at a fraction of the cost. “This lightweight architecture can combine frames at different resolutions to learn video information in both space and time efficiently,” says IBM scientist Quanfu Fan who led the research. The approach, he adds, shows strong performance on several action recognition benchmarks, including Kinetics and Moments in Time. And it cuts three to four times in FLOPs and roughly twice in memory, compared to the baseline. “Future research can focus more on capturing temporal information effectively, the crux of a matter for video understanding,” says Fan.

Selective Rationalization

A joint team from IBM and MIT led by Shiyu Chang will present a paper on a game theoretic approach to class-wise selective rationalization, or rationale. A rationale is a hard selection of input features that are sufficient to explain the output prediction. The researchers give a nice example of a rationale – a beer review: “This beer pours ridiculously clear with tons of carbonation that forms a rather impressive rocky head that settles slowly into a fairly dense layer of foam. This is a really good lookin’ beer.” The output prediction is the rating of the beer appearance: “Look: 5 stars.” The rationale are the words that explain why the rating is 5 stars, so “pours ridiculously clear with tons of carbonation,” and “This is a really good lookin’ beer.”

The problem, though, is that such rationales typically only support the label class, meaning that if a hotel review has sentences that convey both a negative and a positive sentiment, the label may be only negative. The team proposes a method to have both, similar to a human weighing pros and cons – thus allowing for a more structural interpretation of deep-learning models. They call it class-wise selective rationalization, or CAR – which can find rationales explaining any given class. It’s a game theoretic framework, and it highlights pros and cons in reviews, helping understand them much more accurately.

These are just some of the papers IBM Research is excited to share and discuss with you at NeurIPS 2019. Please visit us in booth 111 to learn more about our research and meet our scientists, who will be giving demos on some of our emerging technologies, including:

  • Science Summarizer — retrieves and summarizes relevant and recent scientific papers that meet specific information needs.
  • GAAMA — short for Go Ahead Ask Me Anything, GAAMA is a reading comprehension system that is currently a top system on the Google NQ leaderboard (on short answers).
  • NeuNetS for IoT Applications — automatically synthesizes customized neural networks for IoT devices. In this demo, a camera sensor on a Raspberry-Pi performs real-time classification and results are displayed on a screen.
  • Command Line AI Toolkit (CLAI) — Project CLAI aims to bring the power of AI to the shell by augmenting the user experience with natural language support, troubleshooting, and automation, as well as providing researchers with an easily extensible API to develop their own AI plugins.
  • Live Sports Commentator — automatically generates natural language fully expressive speech commentary for soccer games.
  • IBM-MIT Three-D World (TDW) — a highly realistic 3D virtual world platform for interactive multi-modal physical simulation to train and test AI models and agents.
  • Interactive Visual Exploration of Latent Space (IVELS) – a fully automated pipeline for an interactive visual tool to enable evaluation and exploration of the hidden space of text sequences.
  • LALE — an open-source Python library for semi-automated data science — is compatible with scikit-learn, adding a simple interface to existing machine-learning automation tools.

We look forward to seeing you in Vancouver!

Papers and sessions

Tuesday, December 10 – morning sessions

(Poster) Algorithms — Adversarial Learning
A Game Theoretic Approach to Class-wise Selective Rationalization   #1

(Poster) Algorithms — Adversarial Learning
Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers   #4

(Poster) Algorithms — Adversarial Learning
ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization   #16

(Poster) Algorithms — Clustering
Subquadratic High-Dimensional Hierarchical Clustering   #40

(Poster) Algorithms — Components Analysis (e.g., CCA, ICA, LDA, PCA)
Sobolev Independence Criterion   #46

(Poster) Applications — Privacy, Anonymity, and Security
Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation   #86

(Poster) Deep Learning — Generative Models
Generalization in Generative Adversarial Networks: A Novel Perspective from Privacy Protection   #127

(Poster) Probabilistic Methods — Causal Inference
Characterization and Learning of Causal Graphs with Latent Variables from Soft Interventions   #181

(Poster) Reinforcement Learning and Planning — Markov Decision Processes
A Family of Robust Stochastic Operators for Reinforcement Learning   #199

 Tuesday, December 10 – afternoon sessions

(Demo) exBERT: A Visual Analysis Tool to Explain BERT’s Learned Representations  |  #801

(Demo) “How can this Paper get in?” – A game to advise researchers when writing for a top AI conference  |  #804

(Poster) Algorithms — Large Scale Learning
SySCD: A System-Aware Parallel Coordinate Descent Algorithm   #39

(Poster) Applications — Body Pose, Face, and Gesture Analysis
Deep Structured Prediction for Facial Landmark Detection   #65

(Poster) Deep Learning — Optimization for Deep Networks
Constrained deep neural network architecture search for IoT devices accounting for hardware calibration   #88

(Poster) Optimization — Convex Optimization
Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD   #114

(Poster) Probabilistic Methods — Causal Inference
Sample Efficient Active Learning of Causal Trees   #138

(Poster) Probabilistic Methods — Distributed Inference
Statistical Model Aggregation via Parameter Matching   #145

(Poster) Reinforcement Learning and Planning — Reinforcement Learning
Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement   #205

(Spotlight) Track 2 | Session 2  SySCD: A System-Aware Parallel Coordinate Descent Algorithm

(Spotlight) Track 3 | Session 2  Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Wednesday, December 11 – morning sessions

(Poster) Neuroscience and Cognitive Science — Neuroscience
Scalable Spike Source Localization in Extracellular Recordings using Amortized Variational Inference   #148

(Poster) Probabilistic Methods — Graphical Models
Counting the Optimal Solutions in Graphical Models   #179

(Poster) Probabilistic Methods — Topic Models
Scalable inference of topic evolution via models for latent geometric structures   #190

(Spotlight) Track 4 | Session 3  Counting the Optimal Solutions in Graphical Models

 Wednesday, December 11 – afternoon sessions

(Demo) Passcode: A cooperative word guessing game between a human and AI agent  |  #809

(Demo) Project BB: Bringing AI to the Command Line Interface  |  #804

(Poster) Algorithms — Multitask and Transfer Learning
Learning New Tricks From Old Dogs: Multi-Source Transfer Learning From Pre-Trained Networks   #42

(Poster) Applications — Audio and Speech Processing
DeepWave: A Recurrent Neural-Network for Real-Time Acoustic Imaging   #75

(Poster) Applications — Computer Vision
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries   #84

(Poster) Applications — Natural Language Processing
Hierarchical Optimal Transport for Document Representation   #105

(Poster) Applications — Visual Question Answering
Visual Concept-Metaconcept Learning   #120

(Poster) Optimization — Convex Optimization
A unified variance-reduced accelerated gradient method for convex optimization   #153

 Thursday, December 12 – morning sessions

(Poster) Applications — Privacy, Anonymity, and Security
Differentially Private Distributed Data Summarization under Covariate Shift   #89

(Poster) Applications — Privacy, Anonymity, and Security
Private Hypothesis Selection   #90

(Poster) Deep Learning — CNN Architectures
Cross-channel Communication Networks   #146

(Poster) Deep Learning — Efficient Training Methods
Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks   #159

(Poster) Optimization — Stochastic Optimization
Alleviating Label Switching with Optimal Transport   #206

(Poster) Reinforcement Learning and Planning — Planning
Depth-First Proof-Number Search with Heuristic Edge Cost and Application to Chemical Synthesis Planning   #214

 Thursday, December 12 – afternoon sessions

(Poster) Applications — Activity and Event Recognition
More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation   #72

(Poster) Applications — Network Analysis
KerGM: Kernelized Graph Matching   #145

(Poster) Data, Challenges, Implementations, and Software — Data Sets or Data Repositories
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models   #177

(Poster) Deep Learning — Embedding Approaches
Quantum Embedding of Knowledge for Reasoning   #186

(Spotlight) Track 3 | Session 6  KerGM: Kernelized Graph Matching

Workshops and papers

Friday, December 13

CiML 2019: Machine Learning Competitions for All

  • MOCA: An Unsupervised Algorithm for Optimal Aggregation of Challenge Submissions
  • Organizing crowd-sourced AI challenges in enterprise environments: opportunities and challenges
  • The Deep Learning Epilepsy Detection Challenge: design, implementation, and test of a new crowd-sourced AI challenge ecosystem

EMC2: Energy Efficient Machine Learning and Cognitive Computing (5th edition)

  • Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference

Graph Representation Learning

  • Natural Question Generation with Reinforcement Learning Based Graph-to-Sequence Model
  • Tensor graph neural network for learning time varying graphs

Information Theory and Machine Learning

  • Multilabel prediction in log time with data-dependent grouping

KR2ML – Knowledge Representation and Reasoning Meets Machine Learning

  • Domain-agnostic construction of domain-specific ontologies
  • KAFE: Automated Feature Enhancement for Predictive Modeling using External Knowledge
  • Learning Logical Representations from Natural Languages with Weak Supervision and Back Translation
  • Learning Multi-Step Spatio-Temporal Reasoning with Selective Attention Memory Network
  • Phenotypical Ontology Driven Framework for Multi-Task Learning
  • Populating Web-Scale Knowledge Graphs using Distantly Supervised Relation Extraction and Validation
  • Towards a Coalition Focused Neural-Symbolic Generative Policy Model
  • TransINT: Embedding Implication Rules in Knowledge Graphs with Isomorphic Intersections of Linear Subspaces

Learning Meaningful Representations of Life

  • Generative Models for Target Specific Drug Design

Learning with Rich Experience: Integration of Learning Paradigms

  • Advancing Sequence Models with Joint Paraphrase Learning
  • MUTE: Data-Similarity Driven Multi-hot Target Encoding for Neural Network Design

Machine Learning for Health (ML4H): What makes machine learning in medicine different?

  • Advancing Seq2seq Semantic Parsing with Joint Paraphrase Learning
  • Combining human cell line transcriptome analysis and Bayesian inference to build trustworthy machine learning models for prediction of animal toxicity in drug development
  • Drug Repurposing for Cancer: An NLP Approach to Identify Low-Cost Therapies
  • Federated Learning for Sensitive Health Data
  • Pi-PE: A Pipeline for Pulmonary Embolism Detection using Sparsely Annotated 3D CT Images
  • Privacy Preserving Human Fall Detection using Video Data


  • Learning to Tune XGBoost with XGBoost

MLSys: Workshop on Systems for ML

  • 5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks using computational memory
  • Breadth-first, Depth-next Training of Random Forests
  • CrossLang: the system of cross-lingual plagiarism detection
  • GradZip: Gradient Compression using Alternating Matrix Factorization for Large-scale Deep Learning
Optimal Transport for Machine Learning
  • Wasserstein Style Transfer
  • Unsupervised Hierarchy Matching with Optimal Transport Over Hyperbolic Spaces

Robust AI in Financial Services: Data, Fairness, Explainability, Trustworthiness, and Privacy

  • Exploring Graph Neural Networks for Stock Market Predictions with Rolling Window Analysis
  • Exploring Multi-Banking Customer-to-Customer Relations in AML Context with Poincare Embeddings
  • Subgroup Preservation in Financial Data Anonymized by a Variational Autoencoder
  • Towards Federated Graph Learning for Collaborative Financial Crimes Detection

Safety and Robustness in Decision-making

  • Distributional Actor-Critic for Risk-Sensitive Multi-Agent Reinforcement Learning
  • Efficient Training of Robust and Verifiable Neural Networks
  • Formal Verification of End-to-End Learning in Cyber-Physical Systems: Progress and Challenges
  • Toward Resilient Reinforcement Learning: Causal Inference Q-Networks
  • Towards Verifying Robustness of Neural Networks against Semantic Perturbations

Shared Visual Representations in Human and Machine Intelligence

  • Bio-Inspired Hashing for Unsupervised Similarity Search

Visually Grounded Interaction and Language

  • Visually Grounded Video Reasoning in Selective Attention Memory

Workshop on Human-Centric Machine Learning

  • MonoNet: Towards Interpretable Models by Learning Monotonic Features
  • (Invited Talk) EMC2: Energy Efficient Machine Learning and Cognitive Computing (5th edition)  |  Rogerio Feris
  • (Invited Talk) KR2ML – Knowledge Representation and Reasoning Meets Machine Learning  |  Francesca Rossi

Saturday, December 14

Smooth Games Optimization and Machine Learning 
  • Decentralized Parallel Algorithm for Training Generative Adversarial Nets

Deep Reinforcement Learning

  • Fully Bayesian Recurrent Neural Networks for Safe Reinforcement Learning

Document Intelligence

  • CrossLang: the system of cross-lingual plagiarism detection
  • Doc2Dial: a Framework for Dialogue Composition Grounded in Business Documents

Fair ML in Healthcare

  • Understanding racial bias in health using the Medical Expenditure Panel Survey data
  • Estimating Skin Tone and Effects on Classification Performance in Dermatology Datasets

Joint Workshop on AI for Social Good

  • Can We (and Should We) Use AI to Detect Dyslexia in Children’s Handwriting?

Machine Learning and the Physical Sciences

  • Data-driven Chemical Reaction Classification
  • Evaluation Metrics for Single-Step Retrosynthetic Models
  • PaccMannRL : Designing anticancer drugs from transcriptomic data via reinforcement learning

ML For Systems

  • CodeCaption: A dataset for captioning data science code

NeurIPS Workshop on Machine Learning for Creativity and Design 3.0

  • Machine learning based co-creative design framework

Privacy in Machine Learning (PriML)

  • Diffprivlib: The IBM Differential Privacy Library

Real Neurons & Hidden Units: future directions at the intersection of neuroscience and AI

  • Local Unsupervised Learning for Image Analysis

Robot Learning: Control and Interaction in the Real World

  • Enhanced Adversarial Strategically-Timed Attacks on Deep Reinforcement Learning

Science meets Engineering of Deep Learning

  • A Simple Dynamic Learning Rate Tuning Algorithm for Automated Training of DNNs

(Competition Track) Live Malaria Challenge

  • Qualifying – December 9 | Live Competition – December 11  |  Presentation – December 14
More AI stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading