New OpenAI models break down complex tasks

OpenAI’s latest o1-preview models tackle complex problems by breaking them into steps, potentially changing how industries approach challenges.

This approach, known as chain of thought reasoning, marks a significant shift from previous AI models that often produced answers without explaining their reasoning. These advancements could reshape how businesses and researchers approach complex problem-solving tasks.

“These models are better at tasks requiring more logic and reasoning because they take time to think through the problem,” said IBM Distinguished Engineer Chris Hay. “It’s like they’re showing their work, step by step.”

Chain of thought reasoning

The chain of thought approach allows users to see how the AI arrives at its conclusions. Hay explained the process: “If you ask a child, for example, what’s 25 multiplied by 10 plus five, there’s three steps there. They might just throw that blurred answer. But you said, no, no, you need to break this down… it’s like in school, you’re showing your work.”

Nathalie Baracaldo, an IBM AI Security Senior Research Scientist, emphasized the significance of this development: “The main difference is related to how we can know how the model arrived at a decision. We have explanations about what the agent did that are very useful for understanding why something happened.”

This level of transparency could have far-reaching implications across various industries. In software development, for instance, the models are showing improved coding abilities with fewer errors. Hay noted, “They’re coding better and hallucinating less,” referring to instances where AI produces plausible but incorrect information.

The new models also incorporate reinforcement learning in their training process. Hay explained, “They’ve also changed the way that they are trained in the base models. They talk about how they’re using reinforcement learning… to teach and train those models.”

Human-AI collaboration

The most effective use of these advanced AI models will likely involve a partnership between human expertise and machine capability. “The human will always have to provide input, be okay with the planning, and verify these things,” Hay said.

Hay cautioned against overestimating the models’ capabilities: “I think you can get great outputs. I think when people hear the words AGI, they’re thinking of this big pulsating head in the clouds… actually, if I think about it, the models, as they are, with their next token prediction and good training data and their planning, etc., they do a pretty good job—better than humans in quite a lot of tasks.”

The development of these models raises questions about the nature of artificial intelligence and its comparison to human cognition. The new models have demonstrated remarkable prowess in certain areas—outperforming humans on standardized tests like the bar exam and SATs. Yet they still struggle with tasks that most humans find intuitive.

Hay pointed out that the models can struggle with tasks that humans find simple: “The model excels at specific, individual tasks. However, it currently has difficulty distinguishing between different parts of a conversation. This leads to confusion in its ability to handle multiple concepts simultaneously. The model overemphasizes context, often considering too much irrelevant information when processing requests.”

Baracaldo added a note of caution: “Even though this model is super impressive, sometimes it makes mistakes. And if you read the technical report, sometimes it creates solutions that a real expert, a human being, will think are not feasible, but the model does not know all the assumptions.”

The implications of these advancements extend beyond the tech industry. In research and academia, they might accelerate the pace of discovery by assisting in complex data analysis and hypothesis generation. In fields like medicine and law, they could serve as tools to augment human expertise, potentially leading to more accurate diagnoses or more comprehensive legal analyses.

Hay summarized the practical value of the new models for enterprises: “They are a lot better coders than they were before.”

Author

Sascha Brodsky

Staff Writer

IBM