(Refiles to fix formatting, no change to story content)
By Krystal Hu and Anna Tong
(Reuters) – Artificial intelligence companies like OpenAI are seeking to overcome delays and unexpected challenges in finding ever-larger language models by developing training techniques that use more human-like ways of “thinking” for algorithms .
A dozen AI scientists, researchers and investors told Reuters they believe the techniques, which are behind OpenAI’s recently released o1 model, could reshape the AI arms race and have implications for the types of resources AI companies have insatiable demand for, from energy to types of chips.
OpenAI declined to comment for this story. After the release of the viral chatbot ChatGPT two years ago, tech companies, whose valuations have benefited greatly from the AI boom, publicly argued that “augmenting” current models by adding more data and power of computation would systematically lead to improved AI models.
But today, some of the most prominent AI scientists are speaking out about the limits of this “bigger is better” philosophy.
Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, recently told Reuters that this is a result of intensifying pre-training – the phase of training an AI model that uses a large amount of unlabeled data to understand linguistic patterns and structures. – have reached a plateau.
Sutskever is widely recognized as an early advocate for making massive strides in the advancement of generative AI through the use of more data and computing power during pre-training, which has eventually gave birth to ChatGPT. Sutskever left OpenAI earlier this year to found SSI.
“The 2010s were the era of scale, now we are back in the age of wonder and discovery. Everyone is looking for the next thing,” Sutskever said. “It’s more important than ever to scale the right thing. »
Sutskever declined to share more details about how his team is approaching the problem, saying only that SSI is working on an alternative approach to ramping up pre-training.
Behind the scenes, researchers at leading AI labs have experienced delays and disappointing results in the race to release a large language model that outperforms OpenAI’s nearly two-year-old GPT-4 model. according to three sources close to private affairs.
So-called “training cycles” for large models can cost tens of millions of dollars by running hundreds of chips simultaneously. They are more susceptible to hardware-related failures given the complexity of the system; Researchers may not know how the models will perform until after the trial is complete, which can take months.
Another problem is that large language models gobble up huge amounts of data and AI models have exhausted all the easily accessible data in the world. Power outages have also hampered training operations, as the process requires large amounts of energy.
To overcome these challenges, researchers are exploring “test-time computing,” a technique that improves existing AI models during the so-called “inference” phase or when the model is in use. For example, instead of immediately choosing a single answer, a model could generate and evaluate multiple possibilities in real time, ultimately choosing the best path forward.
This method allows models to devote more processing power to difficult tasks such as math or coding problems or to complex operations that require human-like reasoning and decision-making.
“It turned out that asking a robot to think for just 20 seconds in a poker hand got the same performance improvement as scaling the model 100,000 times and training it 100,000 times longer,” he said. said Noam Brown, a researcher at OpenAI who worked on o1, at the TED AI conference in San Francisco last month.
OpenAI adopted this technique in its new model known as “o1,” formerly known as Q* and Strawberry, which Reuters first reported on in July. The O1 model can “think” about problems in multiple steps, similar to how humans do. It also involves using data and feedback from doctors and industry experts. The secret sauce of the o1 series is another set of training done on “basic” models like GPT-4, and the company plans to implement it. this technique with increasingly larger base models.
At the same time, researchers at other major AI labs, from Anthropic, xAI and Google DeepMind, have also been working to develop their own versions of the technique, according to five people familiar with the efforts.
“We see a lot of low-hanging fruit that we can pick to improve these models very quickly,” Kevin Weil, chief product officer at OpenAI, said at a technology conference in October. “As people catch up, we’ll try to be three steps ahead again.
Google and xAI did not respond to requests for comment and Anthropic had no immediate comment.
The implications could change the competitive landscape for AI hardware, so far dominated by insatiable demand for Nvidia’s AI chips. Prominent venture capitalists from Sequoia to Andreessen Horowitz, who have invested billions to fund costly development of AI models at several AI labs, including OpenAI and xAI, are taking note of the transition and evaluating the impact on their expensive bets.
“This shift will move us from a world of massive pre-training clusters to inference clouds, which are distributed, cloud-based inference servers,” Sonya Huang, associate at Sequoia Capital.
Demand for Nvidia’s most advanced AI chips has helped drive its rise to become the world’s most valuable company, overtaking Apple in October. Unlike training chips, where Nvidia dominates, the chip giant could face more competition in the inference market.
Asked about the possible impact on demand for its products, Nvidia pointed to the company’s recent presentations on the importance of the technique behind the o1 model. Its CEO, Jensen Huang, cited growing demand to use its chips for inference.
“We have now discovered a second scaling law, and that is the scaling law at an inference moment… All of these factors have led to incredibly high demand for Blackwell,” Huang said last month at a conference in India. , referring to the company’s latest AI chip.
(Reporting by Krystal Hu in New York and Anna Tong in San Francisco; editing by Kenneth Li and Claudia Parsons)