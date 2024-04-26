Enterprises that operate in specialized domains, like telcos or healthcare or oil and gas companies, have a laser focus. While they can and do benefit from typical gen AI scenarios and use cases, they would be better served with smaller models.

In the case of telcos, for example, some of the common use cases are AI assistants in contact centers, personalized offers in service delivery and AI-powered chatbots for enhanced customer experience. Use cases that help telcos improve the performance of their network, increase spectral efficiency in 5G networks or help them determine specific bottlenecks in their network are best served by the enterprise’s own data (as opposed to a public LLM).

That brings us to the notion that smaller is better. There are now Small Language Models (SLMs) that are “smaller” in size compared to LLMs. SLMs are trained on 10s of billions of parameters, while LLMs are trained on 100s of billions of parameters. More importantly, SLMs are trained on data pertaining to a specific domain. They might not have broad contextual information, but they perform very well in their chosen domain.

Because of their smaller size, these models can be hosted in an enterprise’s data center instead of the cloud. SLMs might even run on a single GPU chip at scale, saving thousands of dollars in annual computing costs. However, the delineation between what can only be run in a cloud or in an enterprise data center becomes less clear with advancements in chip design.

Whether it is because of cost, data privacy or data sovereignty, enterprises might want to run these SLMs in their data centers. Most enterprises do not like sending their data to the cloud. Another key reason is performance. Gen AI at the edge performs the computation and inferencing as close to the data as possible, making it faster and more secure than through a cloud provider.

It is worth noting that SLMs require less computational power and are ideal for deployment in resource-constrained environments and even on mobile devices.

An on-premises example might be an IBM Cloud® Satellite location, which has a secure high-speed connection to IBM Cloud hosting the LLMs. Telcos could host these SLMs at their base stations and offer this option to their clients as well. It is all a matter of optimizing the use of GPUs, as the distance that data must travel is decreased, resulting in improved bandwidth.