(Bloomberg) — OpenAI was on the verge of a major milestone. The startup completed an initial round of training in September for a massive new artificial intelligence model that it hoped would significantly outperform previous versions of the technology behind ChatGPT and move closer to its goal of powerful AI that outperforms humans.
But the model, known internally as Orion, did not achieve the performance the company desired, according to two people familiar with the matter, who spoke on the condition of anonymity to discuss company business. . Late in the summer, for example, Orion failed to answer coding questions he hadn’t been trained to answer, the sources said. Overall, Orion is so far not seen as as significant an advancement over existing models of OpenAI as GPT-4 was over GPT-3.5, the system that originally powered the company’s flagship chatbot, the sources said.
OpenAI isn’t the only one experiencing difficulties recently. After years of pushing out increasingly sophisticated AI products at a breakneck pace, three of the biggest AI companies are now seeing diminishing returns from their costly efforts to create new models. At Alphabet Inc.’s Google (GOOG, GOOGLE), a next iteration of its Gemini software is not meeting internal expectations, according to three people with knowledge of the matter. Anthropic, for its part, saw the schedule slip for the release of its highly anticipated Claude model called 3.5 Opus.
Businesses face several challenges. It has become increasingly difficult to find new, untapped sources of high-quality artificial training data that can be used to create more advanced AI systems. Orion’s unsatisfactory coding performance was partly due to a lack of sufficient coding data to train on, two people said. At the same time, even modest improvements may not be enough to justify the enormous costs associated with building and operating new models, or to meet the expectations of presenting a product as an upgrade. major level.
There are many opportunities to improve these models. OpenAI put Orion through a months-long process often called post-training, according to one of the people. This procedure, which is common before a company publicly releases new AI software, includes, among other things, incorporating human feedback to improve responses and refine the tone of how the model should interact with users. users. But Orion is still not at the level OpenAI would like to offer it to users, and the company is unlikely to deploy the system before early next year, one person said.
These issues challenge the gospel that has taken hold in Silicon Valley in recent years, particularly since OpenAI released ChatGPT two years ago. Much of the tech industry has bet on so-called scaling laws that more computing power, data and bigger models will inevitably pave the way for greater advances in technology. power of AI. Recent setbacks also raise doubts about heavy investments in AI. and the feasibility of achieving an overarching goal that these companies are aggressively pursuing: artificial general intelligence. The term generally refers to hypothetical AI systems that would match or surpass humans in many intellectual tasks. The CEOs of OpenAI and Anthropic have already said that AGI could be just a few years away.
“The AGI bubble is bursting a little bit,” said Margaret Mitchell, chief ethics scientist at AI startup Hugging Face. It became clear, she said, that “different training approaches” might be needed to make AI models work really well on a variety of tasks — an idea that a number of intelligence experts artificial have taken over at Bloomberg News.
In a statement, a Google DeepMind spokesperson said the company is “pleased with the progress we’re seeing on Gemini and will share more when we’re ready.” OpenAI declined to comment. Anthropic declined to comment, but referred Bloomberg News to a five-hour podcast featuring CEO Dario Amodei, released Monday.
“People call it scaling laws. It’s a misnomer,” he said on the podcast. “These are not the laws of the universe. These are empirical regularities. I’ll bet in their favor, but I’m not sure.
Amodei said there were “a lot of things” that could “derail” the process of achieving more powerful AI in the coming years, including the possibility that “we could run out of data.” But Amodei said he is optimistic AI companies will find a way to overcome any obstacles.
On-set performances
The technology behind ChatGPT and a wave of competing AI chatbots was built on a trove of social media posts, online comments, books, and other data freely scraped from around the web. That was enough to create products capable of spitting out intelligent essays and poems, but building AI systems smarter than a Nobel Prize winner — as some companies hope to do — may require other data sources. as Wikipedia posts and YouTube captions.
These efforts are slower and more expensive than simple web scraping. Tech companies are also turning to synthetic data, such as computer-generated images or text intended to mimic content created by real people. But there too, there are limits. “It’s less about quantity and more about the quality and diversity of data,” said Lila Tretikov, head of AI strategy at New Enterprise Associates and former deputy chief technology officer at Microsoft. (MSFT) “We can generate quantity synthetically, but we struggle to obtain unique, high-quality datasets without human guidance, especially when it comes to language. »
Yet AI companies continue to follow a strategy of more is better. In their quest to create products that approach the level of human intelligence, tech companies are increasing the amount of computing power, data and time they use to train new models – and in doing so, driving up prices. costs. Amodei said companies would spend $100 million to train a cutting-edge model this year and that this amount would reach $100 billion in the coming years.
As costs rise, so do the stakes and expectations for each new model being developed. Noah Giansiracusa, an associate professor of mathematics at Bentley University in Waltham, Massachusetts, said AI models will continue to improve, but the speed at which that happens is questionable.
“We were very excited about a brief period of very rapid progress,” he said. “It just wasn’t sustainable.”
The Silicon Valley conundrum
This conundrum has become central in recent months in Silicon Valley. In March, Anthropic released a set of three new models and said the most powerful option, called Claude Opus, outperformed OpenAI’s GPT-4 and Google’s Gemini on key criteria, such as reasoning and college-level coding.
Over the next few months, Anthropic released updates to the other two Claude models – but not Opus. “It was the one that everyone was excited about,” said Simon Willison, an independent AI researcher. In October, Willison and other industry observers noticed that terms related to 3.5 Opus, including the indication that it would arrive “later this year” and “soon”, had been removed from some pages of the company website.
Similar to its competitors, Anthropic faced behind-the-scenes challenges developing 3.5 Opus, according to two people familiar with the matter. After training it, Anthropic found that version 3.5 Opus performed better than the older version in benchmarks, but not as much as it should, given the size of the model and the cost of building and maintaining it. execution, one of the people said.
An Anthropic spokesperson said language regarding Opus was removed from the website as part of a marketing decision to only show available and compared models. When asked if Opus 3.5 would be released again this year, the spokesperson pointed to Amodei’s remarks on the podcast. In the interview, the CEO said Anthropic still planned to release the model, but had repeatedly refused to commit to a timeline.
Tech companies are also starting to wonder whether they should continue offering their older AI models, perhaps with some additional enhancements, or incur the costs of supporting extremely expensive new versions that may not work. not much better.
Google has released updates to its flagship Gemini AI model to make it more useful, including restoring the ability to generate images of people, but has introduced few major advances in the quality of the underlying model. OpenAI, meanwhile, has focused on a number of relatively incremental updates this year, like a new version of a voice assistant feature that lets users have smoother spoken conversations with ChatGPT.
More recently, OpenAI rolled out a preliminary version of a model called o1 that spends more time calculating an answer before responding to a query, a process the company calls reasoning. Google is working on a similar approach, aiming to handle more complex queries and get better answers over time.
Technology companies also face significant tradeoffs by diverting too many of their coveted computing resources toward developing and running larger models that may not be significantly better.
“All of these models have become quite complex and we can’t ship as many things in parallel as we would like,” wrote Sam Altman, CEO of OpenAI, in response to a question during a recent Ask Me Anything session on Reddit. The creator of ChatGPT faces “a lot of limitations and difficult decisions,” he said, about how it decides what to do with its available computing power.
Altman said OpenAI would have “very good releases” later this year, but that list would not include GPT-5 – a name many in the AI industry would expect the company to uses for a major release after GPT-4, which was introduced over 18 months ago.
Like Google and Anthropic, OpenAI is now shifting its focus from the scale of these models to new use cases, including a series of AI tools called agents that can book flights or send emails on behalf of a user. “We will have better models,” Altman wrote on Reddit. “But I think what will seem to be the next giant breakthrough will be agents.”
Most read from Bloomberg Businessweek
©2024 Bloomberg LP