New York: Artificial intelligence companies, including OpenAI, are looking to move beyond the limitations of scaling large language models by exploring more human-like approaches to training algorithms. These new techniques, which are integral to OpenAI’s recently released o1 model, are expected to reshape the AI arms race and could impact the growing demand for resources like energy and chips.
Despite the success of models like ChatGPT, released two years ago, technology companies have relied on the idea that adding more data and computing power to current models will always lead to better results. However, leading AI researchers are now speaking out about the limitations of the “bigger is better” philosophy that has driven advancements until now.
Ilya Sutskever, co-founder of Safe Superintelligence (SSI) and OpenAI, revealed that scaling up pre-training—the process of using vast amounts of unlabeled data to understand language patterns—has reached a plateau. Sutskever, known for advocating data-driven breakthroughs, acknowledged the shift, saying, “The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again. Everyone is looking for the next thing. Scaling the right thing matters more now than ever.”
Sutskever’s team at SSI is working on an alternative to the current pre-training scaling approach, though details remain sparse. Meanwhile, major AI labs have experienced delays and disappointing outcomes in efforts to surpass OpenAI’s GPT-4 model, which has been out for nearly two years.
Training large language models is an expensive and time-consuming endeavor. The “training runs” for these models can cost tens of millions of dollars as they run on hundreds of chips. The complexity of the process often results in hardware failures, with performance only revealed at the end of months-long runs. Additionally, these models require vast amounts of data, and many AI researchers have exhausted the accessible data sources. Energy shortages are also a significant concern, as training AI models consumes enormous amounts of power.
To tackle these challenges, AI researchers are turning to “test-time compute”, a technique designed to improve models during the “inference” phase—when they are actively used. Instead of quickly selecting one answer, the model could generate and assess multiple possibilities in real-time, dedicating more power to difficult tasks that demand human-like reasoning and decision-making.
OpenAI’s o1 model, which was unveiled this year, uses this technique. According to Noam Brown, an OpenAI researcher involved with o1, this method boosts performance significantly. “It turned out that having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100,000x and training it for 100,000 times longer,” Brown said at the TED AI conference in San Francisco last month.
Also Read | IMF Agrees to $185.5 Million Disbursement for Zambia Pending Final Approval
The o1 model operates in a human-like, multi-step manner and is trained using data curated from experts. This model represents a shift in training strategies, with OpenAI planning to apply this new approach to even larger models in the future.
Other AI labs, including Anthropic, xAI, and Google DeepMind, are also working on similar techniques, according to multiple sources. Kevin Weil, OpenAI’s chief product officer, remarked at a tech conference in October, “We see a lot of low-hanging fruit that we can go pluck to make these models better very quickly. By the time people do catch up, we’re going to try and be three more steps ahead.”
Also Read | U.S. to Judge Israel’s Actions on Gaza Aid Amid Famine Warnings
The implications of these changes could significantly alter the competitive landscape for AI hardware. Nvidia, currently dominant in the AI chip market, may face competition as inference-based cloud servers become more critical. Sequoia Capital partner Sonya Huang pointed out that the shift toward inference clouds, distributed systems for running models, could impact the demand for training chips like Nvidia’s, which have fueled its rise to become the world’s most valuable company.
Nvidia’s CEO Jensen Huang has acknowledged this shift in scaling laws, emphasizing that demand for its Blackwell chips, used in inference applications, remains strong.