Coders worldwide help computers understand natural language

Share this post:

“What’s taking up all that space?!” That’s what you’d probably say to a human to find out what file was eating up all of the space on your hard drive. But dealing with a computer, you’d have to be more precise and say, somewhat boringly: “Display the top result from a list of files sorted in decreasing order of size, displayed in gigabytes / human readable format.”

This is what researchers badly want to change. Getting a machine to ‘understand’ natural language – the way you’d speak to a human – has been a hot area of research for years. So hot in fact that in July 2020 an IBM team led by computer scientists Mayank Agarwal, Tathagata Chakraborti, and Kartik Talamadupula co-organized a competition to improve natural language translation. The specific requirement has been to build an algorithm able to translate an English description of a command line task to its corresponding command line syntax.

The competition, called NLC2CMD for ‘Natural Language to Command,’ ran as part of the NeurIPS 2020 program until December – and this Saturday, we’ll finally see what the winners have come up with.

We already know who they are. In the accuracy track that measures the ability of algorithms and the systems built on them to accurately predict the correct utility and flags for a given natural language task, it’s Team Magnum – Quchen Fu and Zhongwei Teng from Vanderbilt University – with an accuracy score of 0.53. The second prize in the accuracy track goes to Team Hubris (Jaron Maene), with an accuracy score of 0.51. And in the energy consumption track, which looks at the power drawn by the models while making accurate predictions, the winner is Team AICore, composed of Kangwook Lee, Hurnjoo Lee, Haebin Shin, Sechun Kang and Yonghyun Ryu from Samsung. Their system drew 596.9 milliwatts of power per invocation with an accuracy of 0.49.

In all, the competition featured 16 teams from around the world, with nine of those beating the baseline system in the final (test) phase. The winner – Team Magnum – was chosen with a margin of one percent, and reflected a 37 percent improvement over the baseline. Overall, the competition received more than 200 submissions, split over the development (38), validation (119) and test (47) phases.


From softbots to ‘clever’ AI

AI assistance on the command line interface (CLI) isn’t new; it goes back to the 1990s, when researchers at the University of Washington conducted research on internet softbots. Later in that decade, Microsoft introduced various assistive agents for PC users, like the Clippy bot for the Office suite of tools.

Today, the terminal has many command line agents. But the majority of these agents and tools are rule-based and are extremely hard to abstract and scale – so to create a more generalized solution to assistive CLI-centric AI, one has to learn the rules of the underlying system first.

The command line is even older, dating back to the 1970s. It has become more complex over the years, and the number of options or flags to change the effect of utilities (or “commands”) has surged too. For example, the popular ‘tar’ command – used to compress and un-compress file archives – now has 139 different flags, compared to only 12 in 1979. The ‘find’ and ‘ps’ commands, used to find files and running processes respectively, now feature over 80 options each.

And over the years, computer users have had to adapt to this ever-increasing complexity – and keep learning and re-learning the evolving command line. Internet forums have been of some help, but recent analysis from AskUbuntu – the most popular support forum for the Ubuntu Linux distribution – shows that the number of unanswered and unsupported questions has also been increasing.

That’s where the NLC2CMD competition comes in. The hope is that the solutions will help inject new natural language processing (NLP) and AI techniques into the command line – making it ‘smarter’.

For months now, participants have been building models to transform descriptions of command line tasks in English into the so-called bash syntax – a default command language for Linux. The ultimate goal of the NLC2CMD competition is to ensure that users do not have to leave the terminal to look for answers on how to accomplish a specific task. Whenever the user has a question or a query about performing a task, he or she can ask it in natural language – and get a response from the AI. And when the user already knows exactly what command they want to use, their interaction with the terminal continues unchanged.

The competition builds on the CLAI (Command Line AI) system from IBM Research. CLAI is designed to keep the current user experience on the command line interface intact and is built as a set of AI skills in a plug-and-play fashion. One current AI skill on CLAI – and the current state-of-the-art on the NLC2CMD task – is Tellina, based on the Tellina system developed by Victoria Lin and others at the University of Washington.

Since the command line has a formal grammar, a successful algorithm needs to understand the concept of utilities, that each utility has its own individual sets of flags and options, and that it has the option of chaining multiple commands to accomplish a (more complex) task. The training and test sets of the competition included tasks that required chaining of commands to accomplish them.

More details about the specific metrics used in NLC2CMD can be found on the NLC2CMD website. The competition leaderboard will stay as we transition from the NeurIPS setting to an open competition, to encourage improvements in the state-of-the-art on the NLC2CMD task. The competition leaderboard is hosted on the EvalAI platform.

Stay tuned to see the winning solutions, as well as those by the rest of the participants, at the NeurIPS 2020 Competition Session on Saturday, 12 December. The NLC2CMD competition’s session starts at 2pm ET / 11am PT. Please access it via the NeurIPS website – and you can also get more information on the competition’s website.


The NLC2CMD competition was hosted on EvalAI, a popular open source platform for hosting competitions on machine learning and AI algorithms at scale. Many thanks to Rishabh Jain and the rest of the EvalAI team for their amazing support throughout the competition.

IBM Research AI is proudly sponsoring NeurIPS2020 as a Platinum Sponsor, as well as the Women in Machine Learning and Black in AI workshops. We are pleased to report that IBM has had its best year so far at NeurIPS: 46 main track papers, out of which eight are spotlight papers, with one oral presentation. In addition, IBM has 26 workshop papers, six demos and is also organizing three workshops and a competition. We hope you can join us from December 6 – 12 to learn more about our research. Details about our technical program can be found here


Inventing What’s Next.

Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.


IBM Research Editorial Lead

More AI stories

Introducing the AI chip leading the world in precision scaling

We’ve made strides in delivering the next-gen AI computational systems with cutting-edge performance and unparalleled energy efficiency.

Continue reading

IBM’s AI goes multilingual — with single language training

At AAAI, our team presented two new multilingual research techniques that enable AI to understand different languages while only trained on one.

Continue reading

IBM researchers check AI bias with counterfactual text

Our team has developed an AI that verifies other AIs’ ‘fairness’ by generating a set of counterfactual text samples and testing machine learning systems without supervision.

Continue reading