With the launch of its new lightweight Llama 3.2 models last week, Meta became the latest company to bet big on going small, following Apple, IBM, Google, Microsoft and other tech giants that have introduced small language models (SLMs) in the last 18 months.
Yes, SLMs cost less, use less energy and often perform better than their larger counterparts on specialized tasks. But perhaps their biggest draw is that they can be implemented on smartphones and other mobile devices that operate at the edge, like car computers or smart sensors on a factory floor.
“Smaller models will hugely impact productivity,” says Maryam Ashoori, Director of Product Management at IBM watsonx.ai. “Finally, many of the generative AI use cases will actually be accessible to a larger group of people and enterprises.”
Beyond being able to run on even very modest hardware, SLMs eliminate the need to transmit sensitive proprietary or personal data to off-network servers, which can help improve security and protect privacy.
One size doesn’t fit all
Large language models (LLMs) have started to dramatically transform the consumer and enterprise market. Generative AI can automate information extraction, classification, content generation, question and answering, and summarization, to name only a few applications.
The reality is, however, that traditional LLMs also cost millions of dollars to train and deploy, not to mention the fact that a larger LLM also means a larger GPU and greater energy consumption. Furthermore, individuals and enterprises may not be comfortable sharing their data with large public LLMs that are hosted on the cloud and trained on unstructured internet data. But creating a local large language model can be prohibitively expensive.
Enter SLMs. With approximately 1-3 billion parameters, SLMs can be developed and deployed at a fraction of the cost, making them more accessible to enterprises of all sizes, as well as regular smartphone-toting citizens.
In addition to being lower in cost, SLMs can “deliver much higher accuracy with a much, much smaller footprint,” says Shobhit Varshney, a VP and Senior Partner at IBM Consulting focusing on AI, in a recent Mixture of Experts podcast.
In the past few months, Varshney has seen many IBM clients in manufacturing and government deploy SLMs on local devices in contexts where reliable internet access may be lacking, such as on the factory floor or out in the field.
“When you can fine-tune these [models] and then run them on devices, that opens up a whole lot of use cases for our clients,” says Varshney of the new mini Llama 3.2 models, the smallest Llama models to date.
For regulated industries and sectors, such as healthcare or finance where data security is paramount, SLMs can maximize privacy.
Individuals stand to benefit, too. By November of this year, Apple phone users will be able to leverage AI-powered Apple Intelligence Writing Tools to rewrite, proofread and summarize text when they write on their devices.
As Apple explained in a press release, “Apple Intelligence allows users to choose from different versions of what they have written, adjusting the tone to suit the audience and task at hand. From finessing a cover letter, to adding humor and creativity to a party invitation, Rewrite helps deliver the right words to meet the occasion.”
Since SLMs can work offline, more people around the globe can access them.
“SLMs could be used in rural areas that lack cell service,” says Luis Vargas from Microsoft, which introduced its SLM, Phi-3-mini, in April. “Consider a farmer inspecting crops who finds signs of disease on a leaf or branch. The farmer could take a picture of the crop at issue and get immediate recommendations on how to treat pests or disease.”
Unlocking value at “the edge”
While the tech sector has snapped up language models large and small, some experts expect more traditional industries, such as manufacturing, to see the greatest benefit from SLMs and smaller AI models, particularly at the edge, which refers to systems or devices performing line of business operations such as on the factory floor.
At the edge, “You don’t have as much compute power or storage, but you do have massive amounts of data,” says Francis Chow, VP and GM for In-Vehicle Operating Systems and Edge Computing at Red Hat. “Currently, only 1-5% of the real-time data available is being used. There is tremendous business potential if you can get value from more of that data.”
While industries like manufacturing tend to move more slowly than IT, many places are already testing the waters with language models that summarize instruction manuals for technicians so they can ask questions and receive relevant summaries.
Using SLMs and other smaller AI models in edge computing with computer vision is another promising area, says Chow. Currently, computer vision algorithms in automobiles can stop a car if they detect a ball or other object within a certain proximity of the vehicle. As SLMs become more sophisticated, they learn from past experience, detecting patterns and making predictions. For example, if a car can detect and recognize a soccer ball, it might be able to predict that a child will come out to pick up the ball a few seconds later, and react accordingly.
Balancing accuracy and latency
“No size fits all” applies to language models too, says Dr. Christine Ouyang, a Distinguished Engineer and Master Inventor at IBM Consulting’s Center of Excellence (CoE) for generative AI. “Large language models are very powerful, but they can be an overkill for some tasks.”
The AI CoE is collaborating with IBM Research in creating SLMs for so-called “client zero” use cases. Client zero refers to computers with no local storage. These small models are created by IBM Research using various techniques, including fine-tuning large models before distilling them or distilling larger models before fine-tuning them.
“When it comes to model size, it’s a tradeoff,” says Dr. Ouyang. “For non-mission critical applications, you can sacrifice 2% of accuracy to save significant cost and decrease latency.” Latency refers to the lag that can occur after LLMs communicate with the cloud to retrieve information in response to users’ prompts and when they receive the generated answers.
In the past, Dr. Ouyang’s team worked with IBM Supply Chain Engineering and developed AI and edge solution applications for quality inspection in IBM manufacturing projects. Use cases included defect detection, such as looking for missing screws on the back of servers, or bent or missing connector pins.
“This type of task would have previously taken a quality control engineer ten minutes,” says Dr. Ouyang. “The AI-powered edge device solution completed this task in less than one minute.”
While SLMs are still a work in progress, promising results such as these suggest that these tiny but mighty models are here to stay.
eBook: How to choose the right foundation model