Share this post:
For the second year in a row, researchers at the IBM-Illinois Center for Cognitive Computing Systems Research(C3SR) won a competition challenging experts worldwide to design low-power embedded systems for Internet-of-Things (IoT) applications. The 2019 Design Automation Conference (DAC) System Design Contest’s objective: create algorithms that can accurately detect and locate objects from images taken by aerial drones.
Such aircraft are currently used to, among other things, assist with search-and-rescue operations, take aerial photos and perform infrastructure inspection in places inaccessible to workers. In the near future, AI-enhanced drones could be used to perform these tasks, and perhaps even deliver packages, with higher levels of precision and efficiency. Those functional improvements will come at a cost, however, as AI-enhanced algorithms will demand more computations and draw more power, posing a significant design challenge for drones with a limited pool of resources.
This year’s DAC System Design Contest, sponsored by ACM SIGDA and DAC, tasked researchers with implementing deep neural network (DNN)-based object detection and localization algorithms for images taken from drones. Just like last year’s competition, drone manufacturer DJI provided all contestants with a training dataset of about 100,000 images with objects of interest across 95 categories, including boats, buildings, cars, and even whales. Most of the objects in the image are smaller than five percent of the image size. DJI also provided a hidden dataset of about 50,000 images that only the contest organizers could access. Organizers used that second dataset to evaluate the accuracy, throughput (frame per second) and energy consumption of contestants’ designs.
Teams competed by building algorithms that could run on one of two low-power platforms: a GPU-based (Nvidia Jetson TX2) board and an FPGA-based (Xilinx Avnet Ultra96) board. Our first-place entries for both categories set a new record for accuracy.
A total of 110 teams registered for the System Design Contest, 52 focused on the GPU platform and 58 remaining teams competing on the FPGA platform. Our C3SR researchers formed two teams:
- iSmart3-Skynet(GPU): Xiaofan Zhang*, Haoming Lu*, Jiachen Li, Cong Hao, Yuchen Fan, Yuhong Li, Sitao Huang, Bowen Cheng, Yunchao Wei, Thomas Huang, Jinjun Xiong, Honghui Shi, Wen-mei Hwu, and Deming Chen.
- iSmart3(FPGA): Cong Hao*, Xiaofan Zhang*, Yuhong Li, Yao Chen, Xingheng Liu, Sitao Huang, Kyle Rupnow, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. Yao Chen is Prof. Deming Chen’s Ph.D. student at the Advanced Digital Sciences Center, and Dr. Kyle Rupnow as CTO of Inspirit IoT, a company co-founded by Prof. Deming Chen.
Researchers from the IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) receive first place in the 2019 Design Automation Conference (DAC) System Design Contest from Robert Aitken (far left), General Chair of DAC 2019. Left to right: Robert Aitken, Cong Hao, Xiaofan Zhang, Deming Chen, Jinjun Xiong, Wen-Mei Hwu, and Zhuo Li.
Officials evaluated the designs based on detection accuracy and energy consumption with a constraint on the minimum number of frames-per-second (fps) throughput. This year’s minimum target throughput for FPGA increased from last year’s five fps to 10 fps, while the GPU’s minimum target throughput remained 20 fps.
Our approach was to carefully design parameterized DNN IP templates that are computationally friendly to the hardware (requiring fewer resources) and run fast. We then built resource and computation models for each DNN template and ran coarse and fine-grained template evaluations to identify the most promising templates with consideration of the best tradeoffs between accuracy and throughput.
C3SR team members in front of their winning designs at the DAC Demo Booth. Left to right: Xiaofan Zhang, Deming Chen, Jinjun Xiong, Cong Hao and Haoming Lu.
We then performed a hardware implementation aware neural architecture search (optimization) to find the best neural architecture that would function as fast and accurately as possible within the hardware’s limited parameters. The general problem formulation of DNN architecture search is also called “auto-ML” in the field. We were among the first to introduce hardware resource constraints into auto-ML and proved its effectiveness in our winning designs for both GPU and FPGA platforms.
There were some key design differences between FPGA and GPU. For example, on the FPGA platform for each type of convolutional layer in the DNN, we designed a fine-grained tile-based pipeline that could accelerate the computation and minimize the required computation resources through IP reuse. Additional information on our FPGA work is available in a companion paper published at the DAC’19 conference, “FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge.”
Final evaluation of the designs considered three metrics for an end-to-end solution: accuracy, power/energy, and throughput. These three metrics were then used to calculate weighted ranking scores. Final results from the top three teams are shown in the table on the right.
C3SR is a collaboration between IBM and UIUC, co-directed by myself and UIUC’s Prof. Wen-mei Hwu. The center was founded in 2016 as part of IBM’s AI Horizons Network (AIHN) to advance state-of-the-art AI systems research. C3SR research focuses on optimizing all three layers of AI systems research -applications, software, and hardware -to improve cognitive computing applications.
* equal contributors