Expanding AI’s Field of Computer Vision at CVPR 2018

Share this post:

The annual Conference on Computer Vision and Pattern Recognition (CVPR), a premier event in computer vision, took place June 18-22 in Salt Lake City. CVPR is a highly competitive conference and accepted fewer than 1,000 of 3,300 submissions in 2018. IBM Research AI  presented multiple technical papers and hosted two workshops at CVPR 2018. We were also a platinum sponsor at conference expo at booth #845. IBM Research AI’s papers describe recent advances in our quest to give AI systems sight.

In the paper “BlockDrop: Dynamic Inference Paths in Residual Networks,” our team, in collaboration with University of Texas at Austin and University of Maryland, describes a new approach to image classification in which deep convolutional neural networks dynamically choose which layers of the network to execute during inference. By optimizing this selection process, our approach speeds inference to reduce total computation without degrading prediction accuracy. In addition, the policies that the network devises to guide its selections reflect the input’s complexity and encode meaningful visual information about the image.

In the paper “A Low Power, High Throughput, Fully Event-Based Stereo System,” IBM Research introduces the first stereo correspondence system implemented fully on event-based digital hardware, using a fully graph-based non von-Neumann computation model, with no frames, arrays, or any other such data structures. We use a cluster of TrueNorth neurosynaptic processors to process bilateral event-based inputs streamed live by Dynamic Vision Sensors (DVS), at up to 2,000 disparity maps per second, and produce high-fidelity disparities, which are in turn used to reconstruct, at low power, the depth of events produced from rapidly changing scenes. The system takes full advantage of the asynchronous and sparse nature of DVS sensors for low-power depth reconstruction in environments where conventional frame-based cameras connected to synchronous processors would be inefficient for rapidly moving objects.

The “A Prior-Less Method for Multi-Face Tracking in Unconstrained Videos” paper addresses the challenge of tracking an unspecified number of human faces in unconstrained video, maintaining their individual identities across multiple shots despite partial occlusion and drastic appearance changes. We developed a new method with three algorithmic components: a co-occurrence model of multiple body parts is used to create face tracklets; the tracklets are recursively linked to extract clusters with strong associations; and a Gaussian Process model is introduced to compensate the deep feature insufficiency and to refine the linking results. Experiments on two distinct video datasets demonstrated the superiority of our proposed method when compared to the state-of-the-art methods that require intensive training to fine-tune the networks or manual video analysis to obtain the number of clusters.

CVPR 2018 Challenge Winners

CVPR Challenge Look-into-Person

An example of multi-person parsing

The IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) team was ranked #1 in three tasks of the CVPR Look-into-Person (LIP) Challenge. This challenge aimed to advance AI technologies in parsing, or understanding the semantic meaning, of a scene that includes people. To do this, an AI program needs to precisely segment various parts of people (such as hair, face, upper clothes, left arms, right arms etc.) down to the pixel level. A C3SR team led by Yunchao Wei developed an effective solution to “human parsing” using deep neural networks, which achieved the highest accuracy in all three tracks of the LIP Challenge in which the team participated (there were total five tasks), including single-person parsing, multi-person parsing, and fine-grained multi-person parsing.

Winners of the CVPR Challenge Look-Into-Person

Three members of the team, Honghui Shi (second from left), Yunchao Wei (third from left) and Jinjun Xiong (third from right), accepted their certificates Monday in Salt Lake City

In addition, a C3SR team led by Honghui Shi from IBM won third place in the traffic speed estimation task of the CVPR AI City Challenge. This challenge focused on enabling the use of data captured by sensors to make transportation systems smarter. Honghui also participated the first AI City Challenge in 2017 and was ranked number one at the traffic detection task.

Accepted papers at CVPR 2018

BlockDrop: Dynamic Inference Paths in Residual Networks (Spotlight)
Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. Davis, K. Grauman and R. S. Feris
#167 Thursday 2:50-4:30 @Ballroom [Orals/Spotlights]

Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Improved Human Pose Estimation
Z. Tang, X. Peng, F. Yang, R. S. Feris and D. Metaxas
#285 Tuesday 12:30-2:50 @Halls C-E

Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation (Spotlight)
Y. Wei, H. Xiao, H. Shi, Z. Jie, J. Feng and T. Huang
#812 Thursday 8:30-10:10 @Ballroom [Orals/Spotlights]

Image Super-Resolution via Dual-State Recurrent Networks
H. Wei, S. Chang, D. Liu, M. Yu, M. Witbrock and T. Huang
#3225 Tuesday 12:30-2:50 @Halls C-E

A Prior-Less Method for Multi-Face Tracking in Unconstrained Videos
C.-C. Lin and Y. Hung
#3502 Tuesday 10:10-12:30 @Halls C-E

A Low Power, High Throughput, Fully Event-Based Stereo System (Spotlight)
A. Andreopoulos, H. J. Kashyap, T. K. Nayak, A. Amir and M. D. Flickner
#3791 Thursday 8:30-10:10 @Room 255 [Orals/Spotlights]

NISP: Pruning Networks using Neuron Importance Score Propagation
R. Yu, A. Li, C.-F. (Richard) Chen, J.-H. Lai, V. Morariu, X. Han, M. Gao, C.-Y. Lin and L. Davis
#601 Thursday 2:50-4:30 @Room 255 [Orals/Spotlights]

The Excitement of Sports: Automatic Highlights Using Audio/Visual Cues
M. Merler, D. Joshi, K.-N. C. Mac, Q.-B. Nguyen, J. Kent, S. Hammer, J. Xiong, M. N. Do, J. R. Smith and R. S. Feris
#11 Friday, June 22 (PM) Sight&Sound Workshop @Room 251 – C

The Riemannian Geometry of Deep Generative Models
H. Shao, A. Kumar and T. Fletcher
4th Intl. Workshop on Differential Geometry in Computer Vision and Machine Learning, 2018

Dialog-based Interactive Image Retrieval
H. Wu, X. Guo, Y. Cheng, S. J. Rennie, G. Tesauro and R. S. Feris
Monday, June 18, VQA Challenge and Visual Dialog Workshop, @Room 155 – A

Benchmarks and workshops at CVPR 2018

Workshop on Disguised Faces in the Wild
Monday June 18

Moments in Time Challenge
Friday June 22

IBM Research was a gold sponsor of the Women in Computer Vision Workshop (June 22).

Principal RSM and Manager, Computer Vision and Multimedia Department, IBM Research

More AI stories

IBM Releases Novel AI-Powered Technologies to Help Health and Research Community Accelerate the Discovery of Medical Insights and Treatments for COVID-19

IBM Research is making available multiple novel, free resources to help healthcare researchers, doctors and scientists around the world accelerate COVID-19 drug discovery.

Continue reading

IBM at the Intersection of Human-Computer Interaction and AI

IBM researchers present their latest work in human-computer interaction (HCI), which focuses on improving the interaction between humans and AI systems.

Continue reading

Moving Beyond the Lab: IBM Research Powers Pipeline of AI Advances for the Enterprise

IBM AI researchers are responsible for developing many of the NLP capabilities IBM has brought to market. With the announcement that IBM will begin integrating NLP features developed for Project Debater into Watson, IBM Research once again delivers unique technology from the lab to the enterprise.

Continue reading