OpenAI's New o3 Model Won't Come Cheap. Why Microsoft Is Paying Close Attention. -- Barrons.com

Dow Jones12-24 16:30

By Adam Levine

With the reveal of its o3 AI model, OpenAI is advancing what artificial intelligence can do, but customers like Microsoft may find the cost prohibitive.

AI companies have struggled to reach the next stage of advanced models, best exemplified by OpenAI's delays following up GPT-4 with GPT-5.

The main hindrance to the latest AI development is a lack of data to train the next generation of bleeding-edge models.

Faced with a world in which the models aren't advancing in "intelligence, " OpenAI has pivoted to enhancing outputs through "reasoning," making the model work through problems by breaking them down into manageable chunks and self-correcting at each stage. This adds time to the chatbot's output: from seconds for simple queries to minutes for complex math and science questions.

The "chain-of-thought" procedure has been considered a best practice for how humans use chatbots, but the new reasoning models do it on their own. Because they correct themselves, it cuts down on so-called hallucinations in which AIs say something that is wrong, albeit in convincing fashion.

Reasoning models began with the September release of ChatGPT o1, and this past Friday's announcement of o3. Alphabet's Google made a similar announcement with Gemini 2.0 Flash Thinking, also last week.

The "mini" version of o3 is being seeded to select researchers, with a goal of releasing it to the general public in early 2025. OpenAI did not provide a schedule for the full o3 release.

o1 showed the promise of reasoning to improve model output, and o3 is a big step up in that regard, at least in the advanced AI benchmarks that OpenAI presented in its livestream announcement. The improvement in math is especially impressive, with o3 scoring 25% on a test of advanced mathematics; OpenAI says no other available AI model scores above 2%.

It remains to be seen how o3 will perform in real-world tasks.

The o3 advancements may prove particularly important to Microsoft, since OpenAI models are the foundation of its AI assistant, Microsoft 365 Copilot, a paid upgrade for Office suite users. For business customers it's crucial that hallucinations are minimized, lest Office users make mistakes because of Copilot. Microsoft had been counting on GPT-5 to solve this problem, but o3 may be the next best thing.

Along with the added time for queries, there's another downside to reasoning: it costs more, as is so often the case with AI. OpenAI's business customers will have to pay for all the extra under-the-hood computing that the reasoning requires.

In the case of o1, text input and output is already charged at six times the rate of its predecessor, 4o. We don't yet know what o3 will cost.

The expenses can add up quickly if users are asking difficult questions. In one of the advanced benchmarks that OpenAI presented in its o3 livestream , the cost per task was $20, with an average task completion time of 1.3 minutes. That was using a so-called high efficiency version of the model, in which reasoning is limited.

The same tasks using the model's full-reasoning capabilities took an average of 13.8 minutes and used 172 times the computing power, the benchmarking organization said. The group didn't provide a cost breakdown for those tasks, but, initially, they could be cost prohibitive.

For Microsoft, which might like to use o3, the costs present a problem. It charges users $30 a month for Microsoft 365 Copilot, so even a few complex o3-powered tasks could erase its profit margin.

The smaller, cheaper, and faster o3 mini model, due in the coming months, may be the more likely candidate for OpenAI's business customers, including Microsoft.

While it's more prone to wrong answers than its larger sibling, it's still a significant improvement over the current 4o model that currently underpins Office Copilot.

Microsoft has yet to fully adopt o1, and the reason may be the added cost with limited upside. We don't know yet what o3 mini will cost to query, but its skills represent a bigger leap over 4o, and -- since limiting mistakes in AI business applications is key to adoption -- it could tip the scales.

Write to Adam Levine at adam.levine@barrons.com

This content was created by Barron's, which is operated by Dow Jones & Co. Barron's is published independently from Dow Jones Newswires and The Wall Street Journal.

 

(END) Dow Jones Newswires

December 24, 2024 03:30 ET (08:30 GMT)

Copyright (c) 2024 Dow Jones & Company, Inc.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment