In the secret math meeting, researchers struggled with outmart ai
The leading mathematicians in the world are shocked at how artificial intelligence makes their jobs
Yuichiro Chino / Getty Images
In one week in the middle of May, a clandestine mathematical conclave gathered. Thirty to the greatest mathematicians in the world traveled to Berkeley, Calif., With some from the distant group members faced by a showdown A “Reason” Chattbot That is intended to solve the problems they make to test mathematical metletical. After discarding the bot professor’s degree of two days, researchers were shocked to discover that it would be able to respond to some The most difficult mundane problems. “I have companions literally saying these models come to the genius of mathematics,” said Ken Ono, a mathematical at the University of Virginia and a leader and a meeting guide.
The chatbot of the question is run by O4-Minia so-called argument large language model (LLM). It trains in OpenI to make more intricate decreases. Google equivalent, Gemini 2.5 Flashhave similar abilities. Like LLMS running early chatgpt versions, O4-mini learned to predict the next word in a sequence. Compared to the first LLMS, however, the O4-mini and its equals are more stressed, the more models that train specials from people. The procedure toward a chatbot capable of diving more complex math problems than traditional llms.
To track O4-Mini development, Openiai before Epoch Ai, a nonprofit benchmarks llms, coming to 300 math questions Whose solutions have not been published. Even traditional LLMs can meet many complicated math questions. However when Epoch Ai asked many models these questions, which were different from their trainees, the most successful solved Less than 2 percentDisplaying these llms lacking the ability to reason. But the O4-Mini proves very different.
In support of science journalism
If you enjoy this article, think about supporting our winning journalism in Subscribe. By purchasing a subscription you helped to ensure the future of influential stories about the discoveries and ideas that make our world today.
Epoch Ai hired Elliot Glazer, who has just completed his mathematical Ph.D., to join new collaboration for benchmark for benchmark, stupid FrontiermathOn September 2024. The project collected novel questions in various tiers of difficulty, with the first three tiers quitting undergraduate-, graduated levels of the Pinradwar. By April 2025, Glazer knows that O4-Mini can solve about 20 percent of questions. He then moved to the fourth Tier: a set of questions to challenge even for an academic mathematical. Only a small group of people in the world can develop questions, to be alone to answer them. Mathematicians participating should sign a nondisclosure agreement that requires them to communicate only by signal messaging app. Other forms of contact, such as traditional e-mails, can scan a llm and unintentional training, thus harmful to the dataset.
Each problem does not resolve the O4-Mini cannot resolve mathematical that comes with $ 7,500 reward. The group has made slow, steady progress in finding questions. But Glazer wants to accelerate things, so Epoch Ai hosts the meeting on Saturday, May 17. There, participants will come first in the last round of challenge questions. 30 attendees is divided into groups of six. For two days, academics compete against themselves to make problems they can solve but travel to AI arguing.
By the end of Saturday night, Ono was disappointed in the bot, whose unexpected mathematical ability restricts group progress. “I’m joining a problem that my field experts acknowledge that an open question of number theory – a good problem with the fact that it is a result of the factual time, the bot has shown the process of its real-time. to the relevant field literature. After writing it on the screen it wants to try to resolve a simple “sassy’s query.” It’s up to the sassy. “Ono said to be ono the real cheek Consultant for Epoch Ai. “And finally, it says, ‘It’s not necessary to say because the mystery number is computed with me!’
Loss, Ono jumped to signal early Sunday morning and announced the rest of the participants. “I’m not ready to argue with a LLM like this,” he said, “I haven’t seen that kind of reasoning before the model. That’s what a scientist did. That was a scientist.”
Although the group eventually succeeds in searching 10 questions spreading the bot, researchers are amazed at how AI progresses in a year. It is likened to Ono to work with a “strong collaborator.” Yang Hui him, a mathematician of the London Institute for mathematical sciences and a first pioneer to use AI in mathematics, “This is a very grade student to do – more.”
The bot is more faster than a professional mathematical, inducing minutes to do what such experts do to complete.
While the Sigrending of O4-Mini is interesting, the development is also alarming. Ono and he declared anxiety that O4-MINO results can be very reliable. “There is proof through induction, proof by opposition, and then proof by intimidating,” he said. “If there is enough to be given authority, people are just afraid. I think the O4-Mini wakes the evidence; it says all trust.”
At the end of the meeting, the group began to think about what would happen in the future for mathematicians. Discussions have become inevitable “Tier Lima” -quedions that even the best mathematicians that cannot be resolved. If AI has reached that level, the role of mathematicians undergo an acute change. For example, mathematicians can shift simple questions and interact with arguments that help them discover new facts of mathematics, similar to a graduate student. Thus, Ono predicted that nurturing creativity of higher education can be the key to mathematical fulfillment for future generations.
“I told my companions that it was a serious mistake to say that the overall artificial intelligence did not come, (that’s only) just one computer,” Ono said. “I don’t want to add to a hysteria, but in some ways these language models are mostly most of our most graduate students.”