Researchers at the IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) developed a winning solution in the 2018 DAC System Design Contest, announced last week at the Design Automation Conference (DAC) in San Francisco.
The contest, jointly sponsored by ACM SIGDA and DAC, challenged worldwide experts to design low-power embedded systems for Internet-of-Things (IoT) applications. Specifically, contestants were asked to implement deep neural network (DNN)-based object detection and localization algorithms for images taken from drones. There were 95 categories of objects of interests, including boats, buildings, cars, whales, etc. The training dataset (about 100,000 images) was provided by the drone manufacture DJI and made available to all contestants, and a hidden dataset (about 50,000 images) that only the contest organizers could access was used to evaluate the accuracy and energy consumption of the designs.
The contest targeted two low-power platforms: a GPU-based (Nvidia Jetson TX) board and an FPGA-based (Xilinx PynQ Z-1) board. A total of 114 teams registered for the competition, 61 of which focused on the FPGA platform. Our C3SR iSmart2 team was one of them. The team included talented members from the University of Illinois at Urbana-Champaign (UIUC), IBM, InspritIoT, and Boeing and was led by C3SR’s Prof. Deming Chen.
The competition was fierce. It began in January 2018, and all teams were requested to submit their designs on a monthly basis to build up a timely leaderboard. The designs were evaluated on detection accuracy (as measured by the average intersection-over-union (IoU) among all hidden test images) and energy consumption with a constraint on the minimum of frame-per-second throughput (> 5 fps).
Because of the limited computing resources and on-chip memory for the FPGA board, we took a balanced approach to tackle this problem. We designed a customized DNN network topology that was jointly optimized with FPGA implementation considering both IP reuse and data reuse. The proposed DNN network comprises mainly depth-wise 3×3 convolutional layers, point-wise 1×1 convolutional layers, 2×2 max pooling layers, and a bounding box regression for detecting the location of objects (big or small) in the test images.
Final evaluation of the designs considered three metrics for an end-to-end solution: accuracy, power, and throughput. These three metrics were then used to calculate weighted ranking scores. Our team won third place out of 61 international teams in the FPGA track of the contest. Results from the top three teams are shown in the table below.
The top team (TGIIF) from Tsinghua University and DeePhi optimized for accuracy (IoU = 0.623798) but consumed more power (4200 mW). The second-place team (SystemsETHZ) from ETH focused on throughput (25.9678 fps) and power (2450 mW) with less accurate results (IoU = 0.491926). In contrast, we achieved a more balanced design with a relatively high accuracy (IoU = 0.5733, close to the TGIIF team) and low power consumption (2590 mW, close to the SystemsETHZ team), which was consistent with our initial design intention.
C3SR is a collaboration between IBM and UIUC which began in 2016 as part of IBM’s ongoing academic initiatives to help students develop skills and understanding of cognitive computing to meet the increasing demand for highly skilled technology professionals. Our research focuses on optimizing all three layers of computing paradigms—applications, software, and hardware—to improve cognitive computing applications. C3SR teams also recently won first places in three tasks of the Look-into-Person (LIP) Challenge and placed third in the traffic speed estimation task of the AI City Challenge at the 2018 Conference on Computer Vision and Pattern Recognition (CVPR).
The System Design Contest was a great experience for us, and we are excited to announce that the organizers plan to open-source all the contest submissions. Please stay tuned for the release so you can check out our design.