ChatGPT 5.5 Pro Solves PhD-Level Math, Raises AI Research Bar

AI Crosses a New Frontier in Mathematical Research

Large language models have been steadily encroaching on territory once considered uniquely human. But a recent, detailed experiment by mathematician Timothy Gowers reveals a potential paradigm shift. ChatGPT 5.5 Pro, an advanced model from OpenAI, didn't just assist with research—it autonomously produced a novel, publishable-grade mathematical proof in under two hours, with zero serious mathematical input from its human collaborator.

The achievement was not a fluke. Gowers, a Fields Medalist, presented the AI with a specific problem from additive number theory concerning the possible sizes of h-fold sumsets. The model not only improved upon an existing human proof but did so by introducing an “original and clever” idea that impressed the original paper’s author, MIT student Isaac Rajagopal.

The Experiment: From Gentle Problem to Genuine Insight

Gowers contextualizes the current state of AI in mathematics. Initially, LLM “solutions” to famous problems like the Erdős problems were often trivial or easily deduced from known literature. The laughter has quieted. Now, if a problem has a missed, “easy” argument, an LLM has a good chance of spotting it.

Seeking a tougher test, Gowers chose a paper by Mel Nathanson exploring the parameter N(h,k), defined as the minimal integer N such that all possible sizes of an h-fold sumset for a k-element set can be realized within the interval {0,1,...,N}. For h=2, Nathanson had an exponential bound. Gowers asked ChatGPT 5.5 Pro to improve it.

In 17 minutes, the AI provided a construction yielding a quadratic upper bound—clearly optimal. It then formatted the proof in LaTeX. For a related problem on restricted sumsets, it succeeded “with no trouble at all.” The real challenge was generalizing the result for any h.

The Breakthrough: An “Ingenious” Polynomial Construction

The core of Rajagopal’s earlier work relied on combining geometric series sets S and T, whose elements grow exponentially large with k. This led to an exponential bound for N(h,k). ChatGPT’s task was to tighten this.

Its first attempt, after 16 minutes, improved the bound from exponential in k to exponential in k^α for any α > 1/2. But the real leap came when prompted to push for a polynomial bound.

“ChatGPT came back with an answer, constructing sets G and H which behave like ‘half a geometric series squeezed into a polynomial interval,’ which is counterintuitive,” Rajagopal writes in his evaluation. The AI used h²-dissociated sets—an algebraic concept dating to Bose and Chowla (1963)—to create components with polynomial-sized elements that mimicked the crucial sumset-size properties of the exponential geometric series.

Rajagopal states this idea “is quite impressive” and “the sort of idea I would be very proud to come up with after a week or two of pondering.” The final construction combines these polynomial-sized components to prove N(h,k) ≤ O(k^10h³), a monumental reduction from an exponential to a polynomial dependence.

continue reading below...

Converging with a Broader AI Evolution

This mathematical feat coincides with the broader rollout of GPT-5.5 Instant as ChatGPT's new default model. OpenAI claims it produces 52.5% fewer hallucinations on high-stakes prompts in medicine, law, and finance and scores 81.2 on the AIME 2025 math test, up from 65.4.

The update also emphasizes enhanced personalization and memory. GPT-5.5 Instant can now leverage past chats, uploaded files, and connected Gmail accounts to tailor responses, reducing repetitive prompting. New “memory source” controls show users what context was used, allowing for correction or deletion.

These advances highlight a dual trajectory: raw reasoning power is increasing, as seen in the math proof, while the interface is becoming more seamlessly integrated into personal and professional workflows.

The Human Cost and the New Research Paradigm

Gowers’ experiment forces a reckoning with the future of mathematical training and discovery. He notes the traditional path for a beginning PhD student—solving a “gentle” open problem—may be vanishing. “The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove.”

This creates an access dilemma. As commenter Olof Sisask points out, top models are expensive, creating inequality. “The age of equality... is sadly over in research math,” adds another.

Gowers proposes a collaborative future: the task becomes “proving something in collaboration with LLMs that LLMs cannot manage on their own.” He has found LLMs make useful, if not yet game-changing, contributions in his own work.

Where Do AI-Generated Proofs Belong?

The successful proof created by ChatGPT 5.5 Pro exists in a publication limbo. “Had the result been produced by a human mathematician, it would definitely have been publishable,” Gowers writes. But arXiv bans AI-written content.

Gowers suggests a need for a new, moderated repository for AI-produced results, certified by human mathematicians or formal proof assistants. Until then, the proof lives on a Google Drive link—a fittingly ephemeral home for a disruptive new form of knowledge creation.

The implications extend beyond mathematics. As one commenter noted, “We’re going to see similar questions raised for most intellectually fulfilling activities.” For now, ChatGPT 5.5 Pro’s foray into additive combinatorics stands as a stark marker of how far and how fast AI’s cognitive capabilities have advanced.