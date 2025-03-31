As reasoning models like OpenAI’s o1, DeepSeek-R1 and Google’s Gemini 2.5 compete to top AI intelligence benchmarks, enterprises looking to integrate AI are becoming increasingly wary of something called “model bloat"—the phenomenon whereby models become unnecessarily large or complex, pushing up computational costs and model training time and decreasing the speed at which they can provide the responses enterprises need.

OpenAI’s o1 and DeepSeek-R1 use chain of thought (CoT) reasoning to break complex problems into steps, achieving unprecedented performance and greater accuracy than prior models. But CoT also demands substantial computational resources during inference, leading to lengthy outputs and higher latency, says Volkmar Uhlig, a VP and AI Infrastructure Portfolio Lead at IBM, in an interview with IBM Think.

Enter a new class of prompting techniques, described in various new papers, ranging from atom of thought (AoT) to chain of draft (CoD), seeking to increase the efficiency and accuracy of CoT by helping models solve problems more quickly—thereby cutting down on costs and latency.

AI scientist and startup founder Lance Elliott sees the new offshoots of chain of thought as variations in a prompt engineer’s toolkit. “Your typical home handiwork toolkit might have a regular hammer—that would be CoT,” he tells IBM Think. “AoT would be akin to using a specialized hammer used for situations involving cutting and adjusting drywall. You could use a regular hammer for drywall work, but it would be advisable to use a drywall hammer if you had one and knew how to use it properly."

Vyoma Gajjar, an AI Technical Solution Architect at IBM, sees potential in these new CoT cousins, especially for enterprises “looking for more cost-efficient ways to prompt small models to get accurate answers for their specific use cases,” she says.