AI Solves 60-Year-Old Math Problem, Signaling New Era in Research
AI News

AI Solves 60-Year-Old Math Problem, Signaling New Era in Research

5 min
26/04/2026
Artificial IntelligenceMathematicsMachine LearningOpenAI

An Amateur and an AI Crack a Legendary Math Puzzle

In a story that blurs the lines between human intuition and machine intelligence, a 23-year-old amateur with no advanced mathematics training has prompted an AI to solve a 60-year-old problem posed by the legendary mathematician Paul Erdős. Liam Price, armed only with a ChatGPT Pro subscription, submitted a single prompt to GPT-5.4 Pro and received a solution to a conjecture about "primitive sets" of numbers—a problem that had stumped prominent minds for decades.

The solution, posted on erdosproblems.com in late April 2026, is notable not just for its result, but for its method. According to experts like UCLA mathematician Terence Tao and Stanford's Jared Lichtman, the AI employed a novel approach, sidestepping a "mental block" that had hindered human researchers. "The humans that looked at it just collectively made a slight wrong turn at move one," Tao observed.

This breakthrough arrives just as OpenAI unveils its next-generation model, GPT-5.5, designed explicitly for intensive tasks like coding, research, and mathematics. The timing underscores a pivotal moment: AI is no longer just regurgitating known solutions but is beginning to generate genuinely novel mathematical insights.

The Erdős Problem: Primitive Sets and a Stubborn Conjecture

The solved problem revolves around "primitive sets," where no number in the set can be evenly divided by any other. Erdős connected these to prime numbers and devised a scoring system called the Erdős sum. He conjectured that the minimum possible score for such a set must be exactly one, a limit approached as the set's numbers grow infinitely large.

Lichtman had previously proven the maximum of this sum in his 2022 doctoral thesis but, like others, was stuck on proving the minimum. Price, unaware of this history, fed the problem to ChatGPT on an "idle Monday afternoon." The AI's raw output was messy, but its core idea—applying a well-known formula from a different area of mathematics to this question—was the key breakthrough. Experts have since refined and shortened the proof.

"We have discovered a new way to think about large numbers and their anatomy," said Tao. The method's novelty suggests it could have broader applications, confirming a long-held intuition among mathematicians that these problems were interconnected.

GPT-5.5: OpenAI's Power Play for Research and Agentic AI

Concurrent with this mathematical milestone, OpenAI released GPT-5.5 on April 30, 2026, just seven weeks after GPT-5.4. The company positions it as a model built for serious work. "It's likely going to be the most useful for people doing research or other intensive tasks, like coding," according to CNET. OpenAI President Greg Brockman emphasized its intuitive, agentic capabilities: "It can look at an unclear problem and figure out just what needs to happen next."

Chief Scientist Jakub Pachocki highlighted the shift in focus it enables: "The challenge transitions from figuring out the details of the implementation... to the higher-level goals. It allows you to make progress much more quickly." Mark Chen, OpenAI's Chief Research Officer, stated the model "shows meaningful gains on scientific and technical research workflows" and could "help expert scientists make progress," including in fields like drug discovery.

continua a leggere sotto...

Benchmark Dominance and the Competitive Landscape

OpenAI released extensive benchmark data showing GPT-5.5 outperforming its predecessors and key rivals. According to VentureBeat, it "narrowly beats" Anthropic's Claude Mythos Preview on the Terminal-Bench 2.0 (82.7 vs. 82.0). Its performance in mathematical reasoning is particularly striking:

  • FrontierMath Tier 1–3: 51.7 (vs. Claude Opus 4.7's 43.8)
  • FrontierMath Tier 4: 35.4 (vs. Claude's 22.9)

The model also excels in coding (Expert-SWE: 73.1), computer use (OSWorld-Verified: 78.7), and specialized tasks like investment banking modeling (88.5). These scores validate OpenAI's claim that GPT-5.5 is a significant step towards a true "digital assistant" capable of managing complex, multi-step projects across a user's entire computer ecosystem.

Context and Caveats: The State of AI in Problem-Solving

This achievement sits within a broader, and sometimes skeptical, narrative about AI's role in mathematics. AI has recently solved several Erdős problems, but experts have warned they are an "imperfect benchmark," varying greatly in significance and difficulty. Some past AI solutions were less original than they first appeared.

The Price/ChatGPT case is seen as different because of the problem's pedigree and the genuinely novel method. However, the story also highlights the continued essential role of human experts. Lichtman noted the AI's initial proof was "quite poor," requiring experts to "sift through and actually understand what it was trying to say." The collaboration is a prototype: human intuition guiding AI exploration, and AI offering paths humans might not consider.

This comes as other studies caution about over-reliance on AI. A separate report from Let's Data Science found general chatbots like ChatGPT delivered problematic health advice 52% of the time, underscoring that domain expertise and rigorous verification remain critical, even as capabilities grow.

Why This Matters: A New Tool for Discovery

The confluence of a specific AI-aided discovery and the launch of a more powerful general-purpose model is not coincidental. It signals a maturation of AI from a pattern-recognition engine into a tool for open-ended exploration and discovery. For researchers, the promise is less about automation and more about augmentation—freeing cognitive bandwidth for high-level conceptual work.

As GPT-5.5 rolls out to Plus, Pro, Business, and Enterprise users, its impact will be tested in real-world labs and development environments. Early feedback, like a developer's comment that it's "the first coding model I've used that has serious conceptual clarity," suggests it may cross a utility threshold. For mathematicians and scientists, the Erdős solution offers a tantalizing glimpse of a future where AI acts as a collaborative partner, capable of making unexpected, intuitive leaps that accelerate the pace of fundamental research.