Skip to main content
Artificial intelligence

Artificial intelligence

AI-led solutions of Erdős problems spark debate over the future of mathematics

29 May 2026 Amy Searle 
artificial-intelligence illustration
Courtesy: iStock/iMrSquid

News that large language models (LLM) have made major advances in solving Erdős problems – a set of problems formulated by the renowned 20th-century mathematician Paul Erdős – has created an amalgamation of uproar and interest amongst mathematicians. The past month alone has seen two significant LLM-generated solutions. The first relates to prime sets, a generalization of prime numbers, and was solved after Liam Price, an amateur mathematician from the US, fed the problem statement into GPT-5.4 Pro without other information. The second came last week when the company behind ChatGPT, OpenAI, announced that it had used artificial intelligence to disprove Erdős’ planar unit distance conjecture.

LLMs have solved Erdős problems before, but the one Price chose wasn’t just any Erdős problem. It was one that human mathematicians had worked on for 60 years without success. The nature of the solution was also unusual. While previous LLM solutions to Erdős problems used standard techniques, this one took an entirely different approach. Rather than starting from Erdős’ original probability-theory-based framing of the problem, as human mathematicians had, the LLM found an alternative route – one that led naturally, in less than a page, to a correct proof.

“Paul Erdős had a concept of ‘Proofs from The Book’, meaning that the argument is so compact and elegant that this is the proof God would’ve written down in ‘The Book,’” Jared Lichtman, a mathematician at Stanford University in the US, wrote on the social media site X after the proof was announced. “After reading the GPT5.4 proof of Erdős #1196, I would say this is a Book Proof of the result.”

The planar unit distance conjecture, meanwhile, concerns a deceptively simple question: if you have n points in a plane, how many pairs of points can be exactly one distance unit apart? Erdős thought the limit was n1+C/log log(n) where C is a positive constant, but OpenAI’s model identified a higher bound. What’s more, the company claims it did so not by rehashing prior work, but by “bring[ing] unexpected, sophisticated ideas from algebraic number theory to bear on an elementary geometric question”.

Some members of the mathematics community have greeted these proofs, and the advent of AI in mathematics in general, with enthusiasm. OpenAI’s announcement quotes Arul Shankar, a number theorist at the University of Toronto, Canada, as saying that the new proof “demonstrates that current AI models go beyond just helpers to human mathematicians – they are capable of having original ingenious ideas, and then carrying them out to fruition”.

Others, however, are more cautious. David Bessis, a mathematician-turned-science writer who previously worked on algebra, geometry and topology, claims that even such apparent successes stem from a misconception of mathematics as a logically direct process of churning out theorems, given some rules. Writing in his Substack newsletter, Bessis argues that the method used to verify AI-generated proofs, which involves a computer program called Lean, may reduce the benefit the mathematics community gains from proofs. Notably, proofs that are verifiable in Lean are not always parse-able by humans, which detracts from (and in certain cases removes) the insights researchers typically get from new proofs.

How AI is being used in mathematics…

To evaluate the merits of these arguments, it’s useful to understand how AI is currently used within mathematics research. The first strategy is the one Price used to solve Erdős #1196: directly prompting an LLM. “Large language models have proven their worth at literature search: finding similar instances of a problem, or a proof, in past literature,” notes François Charton, an AI engineer at the California-based start-up AxiomMath, which is using AI to accelerate mathematics research.

The second strategy is to use AI models trained on other types of data. According to Charton, these models are especially good at spotting “weak signals and correlations” and thereby uncovering patterns in data that might be too laborious or convoluted for humans to identify.

Both methods have shown promise for generating new results, but they are not universal – at least, not yet. “It [AI] seems to do a lot better at certain types of maths than others,” says Thomas Bloom, a mathematician at the University of Manchester, UK, who maintains a webpage that tracks solutions to Erdős problems. In particular, Bloom says that to the best of his knowledge, AI “hasn’t done anything interesting in category theory” – a field whose reputation for abstraction is only matched by its track record of bridging supposedly distinct areas of mathematics.

Photo of Paul Erdős' grave. It's made of white marble and consists of stacked rectangular solids. The largest rectangular solid contains the name, birth and death date of his father Lajos (1879-1942). A smaller rectangle below refers to his mother and gives her birth and death dates (1880-1970), followed by the name Erdős Pál and the dates 1913-1996.

Another challenge is that with AI systems churning out new proofs at scale, there are simply not enough people with the skills needed to check them. A process called autoformalization could solve this problem by turning human proofs into what Bessis calls “bulletproof, machine-verifiable logical derivations” expressed in Lean or other specialized languages. At that point, AI-generated proofs could be checked automatically. The question is, what knowledge will humans gain in the process?

For doubters like Bessis, who refers to autoformalization (at least as practiced by certain firms) as “AI slop”, the answer is very little. But within the broader mathematics community, there is considerable interest in autoformalization, if done correctly. “I see autoformalization as the bridge in both directions, as important as proving itself,” Charton argues. “We can use Lean to translate between these two languages so that a Lean proof can be reverse-translated into a sketch, lemmas or natural language a human mathematician can engage with. That bidirectional translation preserves and extends mathematical knowledge at scale.”

…and how it isn’t

In the 18th century, when Leonhard Euler began arranging the logical thought processes of mathematics into theorems, definitions and proofs, mathematicians were primarily interested in solving problems with underpinnings in the physical world: questions of volume and distance, and, more generally, geometry and counting. Since then, though, mathematics has become a discipline that is at least as concerned with coming up with interesting problems as it is with solving them.

Two aspects of this change seem relevant to debates over AI’s utility. The first is that posing problems requires a broader skillset than solving them. The second is that solving posed problems sometimes requires mathematicians to invent new structures, tools or objects. Fermat’s Last Theorem, which posits that there are no three positive integers a, b, and c that satisfy the equation an + bn = cn for any integer value of n greater than 2, is a good example. At face value, this nearly 400-year-old theorem seems simple. However, proving it was the life’s work of a modern mathematician, Andrew Wiles, who won the Abel Prize in 2016 for developing the numerous new tools required, as well as for the proof itself.

Coming up with such tools – or indeed whole new frameworks – is a challenging and hugely creative endeavour. There are no rules as to the kinds of objects you are allowed to create, and unlike a proof (which is either correct or incorrect), there is no finality, either. If the new framework is a good one, it will crop up frequently and naturally in various branches of mathematics, and other mathematicians will incorporate it into their own work. If it isn’t, they won’t.

Currently, not even AI enthusiasts like Charton think machines are capable of such leaps. “Theory building is completely out of reach right now,” he tells Physics World. “Models, especially generative models, can provide a mathematician with interesting examples, or discover surprising relations that may bring a theoretical breakthrough, but the breakthrough still depends on the mathematician. I believe this will remain the case for some time.”

A new tool for scientists and mathematicians alike

In many areas of science, AI works in a way that is entirely distinct from human thinking. In physics, for example, machine learning algorithms are trained to analyse large amounts of data, find patterns and use them to infer underlying laws. This strategy could advance our understanding of some of the most fundamental questions in physics, but it is very different from how a human scientist would do it, and therefore perhaps more likely to be seen as a welcome new tool.

On the theorem-proving side of mathematics, the distinction between methods a human might use and those an algorithm might use is more blurred. Yet in some ways, Bloom thinks incorporating AI into mathematics could bring the field closer to other sciences. In particle physics, for example, “you don’t go in and take these individual recordings [of data]. It’s all automated,” he tells Physics World. “Until now, there has been no equivalent for maths. It takes time and attention to prove theorems, and maybe this had been a bottleneck.”

AxiomMath’s Charton agrees. “Every new math tool in history has automated something that used to be the work of a human mathematician – from the abacus all the way to symbolic algebra,” he says. “With each new tool, the role of the mathematician evolved rather than disappeared. Tasks got automated, and problems that felt impossible became trivial – but mathematicians just keep moving up the stack to the next set of questions. I see AI as the latest shift rather than a categorical break from history.”

Back to Artificial intelligence Artificial intelligence
Copyright © 2026 by IOP Publishing Ltd and individual contributors