In an upcoming presentation at the 2018 AAAI Conference, our team of deep learning experts at IBM Research India propose a new and exploratory technique that automatically ingests and infers deep learning algorithms in published research papers and recreates them in source code for inclusion in libraries for multiple deep learning frameworks (Tensorflow, Keras, Caffe).
With this research, which we call IBM Deep Learning IDE, we are chasing the big dream of democratizing deep learning by reducing the effort involved in creating deep learning-based models, increasing the reuse of existing models, and making it easier to get past some of the current hurdles encountered when using multiple libraries/frameworks.
Consider, for example, a recently published and highly cited deep learning research paper at AAAI 2017, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.” As a software developer with minimum experience in deep learning, it would be considerably hard to understand the research paper and implement its details. In fact, at NIPS 2016, 685 or so papers out of 2,500 papers were related to deep learning or neural networks, but only ~18 percent of the accepted papers made their source code available. In one of our surveys involving more than 100 software developers, 86 percent of the participants who rated themselves as “experts in programming ability” responded that it takes them at least a couple of days to successfully or unsuccessfully implement an existing deep learning model from a research paper or document.
Consider another example where, as a developer, you want to build an image captioning system. The below are sample code implementations for highly referenced research papers tackling this problem:
- Show and Tell: Original implementation available in Theano; https://github.com/kelvinxu/arctic-captions
- NeuralTalk2: Original implementation available in Torch; https://github.com/karpathy/neuraltalk2
- LRCN: Implementation available in Caffe; http://jeffdonahue.com/lrcn/
Since the implementations of proposed models are available in different libraries, a developer cannot easily use or combine the models together. Further, if the remainder of the components are in Java (say, DL4J), directly leveraging either of these public implementations would be daunting.
Hence, keeping up with this fast-growing deep learning community is becoming a challenge as reproducing research papers in code is a hard and time consuming task. What if you could have a system that reads and understands deep learning research papers and implements the proposed model design in any language and library of your choice? This is the primary motivation of our research work.
In carrying out our research work, we observed that the architecture details of a deep learning model proposed in a research paper, are typically available as a flow diagram or described in a tabular format. We leverage these structures in the research papers by following this step-by-step procedure, as explained in the below image and steps:
- Extract all the images and tables from the PDF of a research paper.
- Train a binary classifier to detect which images and tables describe a deep learning model flow.
- Parse the image to extract the nodes, edges, and flow to construct the computational graph, as shown in the below image. Perform OCR on the image to extract the textual content.
- The table could be described either in a row-major or a column-major format. Based on the table alignment in the PDF research paper, the table is independently parsed to extract the deep learning model flow.
- If a table and image describe the same design flow, we combine them to extract designs to improve the accuracy of the model designs.
- From the extracted design, represented in a JSON format, we support source code generation in Keras (v2.1.2), Caffe (v1), Tensorflow (v1.4), and PyTorch (v0.3) using a manually curated template based code generation method.
Thus, for a given research paper saved as a PDF, execution-ready source can be automatically generated for four different frameworks. One of the major caveats of the proposed approach is that the figures in research papers can be highly unstructured and complex. We did a thorough analysis and broadly classified these images into five categories: (i) Neurons Plot, (ii) 2D Box, (iii) Stacked2D Box, (iv) 3D Box, and (v) Pipeline plot. Currently, we support the extraction of design flow information from “2D Box” type of images which account for roughly ~50 percent of the existing research papers, but in the future, we hope to be able to interpret model details from multiple different representations.
To evaluate grammar proposed in our work, we created source code implementations for more than 216,000 deep learning models from their corresponding 2D Box visualizations for Keras and Caffe frameworks. Experiments on this data set show that the proposed approach has an accuracy greater than 93 percent in extracting flow diagram content extraction.
Another important aspect of our work is an intuitive drag-and-drop based UI editor, which can be used to manually edit and perfect the extracted design, and generate the source code in real-time. Currently, we are in the process of building a model zoo consisting of design and source code for models from 5,000 core deep learning research papers from arXiv. We are hoping to share this dataset soon with the larger research community to use and improve.
The larger goals of our research are:
- To democratize deep learning by making it easier to reproduce research efforts, and increase the consumption of deep learning models by developers.
- To standardize the format in which deep learning models are expressed in research papers for easy understanding and re-use of models.
- To standardize the format in which deep learning models are described in a library agnostic manner.
In this work, we have taken the initial exploratory steps toward these larger research goals and look forward to the community’s feedback.