GPT-5.2 derives a new result in theoretical physics

2 weeks ago/429 comments/openai.com

The headline may make it seem like AI just discovered some new result in physics all on its own, but reading the post, humans started off trying to solve some problem, it got complex, GPT simplified it and found a solution with the simpler representation. It took 12 hours for GPT pro to do this. In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.

21 days ago by CGMthrowaway

This is the critical bit (paraphrasing):

Humans have worked out the amplitudes for integer n up to n = 6 by hand, obtaining very complicated expressions, which correspond to a “Feynman diagram expansion” whose complexity grows superexponentially in n. But no one has been able to greatly reduce the complexity of these expressions, providing much simpler forms. And from these base cases, no one was then able to spot a pattern and posit a formula valid for all n. GPT did that.

Basically, they used GPT to refactor a formula and then generalize it for all n. Then verified it themselves.

I think this was all already figured out in 1986 though: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.56... see also https://en.wikipedia.org/wiki/MHV_amplitudes

21 days ago by godelski

  > I think this was all already figured out in 1986 though

They cite that paper in the third paragraph...

  Naively, the n-gluon scattering amplitude involves order n! terms. Famously, for the special case of MHV (maximally helicity violating) tree amplitudes, Parke and Taylor [11] gave a simple and beautiful, closed-form, single-term expression for all n.

It also seems to be a main talking point.

I think this is a prime example of where it is easy to think something is solved when looking at things from a high level but making an erroneous conclusion due to lack of domain expertise. Classic "Reviewer 2" move. Though I'm not a domain expert and so if there was no novelty over Parke and Taylor I'm pretty sure this will get thrashed in review.

21 days ago by CGMthrowaway

You're right. Parke & Taylor showed the simplest nonzero amplitudes have two minus helicities while one-minus amplitudes vanish (generically). This paper claims that vanishing theorem has a loophole - a new hidden sector exists and one-minus amplitudes are secretly there, but distributional

21 days ago by nyc_data_geek1

[flagged]

21 days ago by btown

It bears repeating that modern LLMs are incredibly capable, and relentless, at solving problems that have a verification test suite. It seems like this problem did (at least for some finite subset of n)!

This result, by itself, does not generalize to open-ended problems, though, whether in business or in research in general. Discovering the specification to build is often the majority of the battle. LLMs aren't bad at this, per se, but they're nowhere near as reliably groundbreaking as they are on verifiable problems.

20 days ago by utopiah

> modern LLMs are incredibly capable, and relentless, at solving problems that have a verification test suite.

Feel like it's a bit what I tried to expressed few weeks ago https://news.ycombinator.com/item?id=46791642 namely that we are just pouring computational resources at verifiable problems then claim that astonishingly sometimes it works. Sure LLMs even have a slight bias, namely they do rely on statistics so it's not purely brute force but still the approach is pretty much the same : throw stuff at the wall, see what sticks, once something finally does report it as grandiose and claim to be "intelligent".

21 days ago by QuercusMax

Yes, this is where I just cannot imagine completely AI-driven software development of anything novel and complicated without extensive human input. I'm currently working in a space where none of our data models are particularly complex, but the trick is all in defining the rules for how things should work.

Our actual software implementation is usually pretty simple; often writing up the design spec takes significantly longer than building the software, because the software isn't the hard part - the requirements are. I suspect the same folks who are terrible at describing their problems are going to need help from expert folks who are somewhere between SWE, product manager, and interaction designer.

20 days ago by D-Machine

Even more generally than verification, just being tied to a loss function that represent something we actually care about. E.g. compiler and test errors, LEAN verification in Aristotle, basic physics energy configs in AlphaFold, or win conditions in e.g. RL, such as in AlphaGo.

RLHF is an attempt to push LLMs pre-trained with a dopey reconstruction loss toward something we actually care about: imagine if we could find a pre-training criterion that actually cared about truth and/or plausibility in the first place!

21 days ago by lupsasca

That paper from the 80s (which is cited in the new one) is about "MHV amplitudes" with two negative-helicity gluons, so "double-minus amplitudes". The main significance of this new paper is to point out that "single-minus amplitudes" which had previously been thought to vanish are actually nontrivial. Moreover, GPT-5.2 Pro computed a simple formula for the single-minus amplitudes that is the analogue of the Parke-Taylor formula for the double-minus "MHV" amplitudes.

21 days ago by woeirua

You should probably email the authors if you think that's true. I highly doubt they didn't do a literature search first though...

21 days ago by emp17344

You should be more skeptical of marketing releases like this. This is an advertisement.

21 days ago by coldtea

It's hard to get someone to do literature first when they get free publicity by not doing literature search and claiming some major AI assisted breakthrough...

Heck, it's hard to get authors to do literature search, period: never mind not thoroughly looking for prior art, even well known disgraced papers get citated continue to get possitive citations all the time...

21 days ago by godelski

They also reference Parke and Taylor. Several times...

21 days ago by suuuuuuuu

Don't underestimate the willingness of physicists to skimp on literature review.

21 days ago by randomtoast

> but I haven’t been to get them to do something totally out of distribution yet from first principles

Can humans actually do that? Sometimes it appears as if we have made a completely new discovery. However, if you look more closely, you will find that many events and developments led up to this breakthrough, and that it is actually an improvement on something that already existed. We are always building on the shoulders of giants.

21 days ago by davorak

> Can humans actually do that?

From my reading yes, but I think I am likely reading the statement differently than you are.

> from first principles

Doing things from first principles is a known strategy, so is guess and check, brute force search, and so on.

For an llm to follow a first principles strategy I would expect it to take in a body of research, come up with some first principles or guess at them, then iteratively construct and tower of reasonings/findings/experiments.

Constructing a solid tower is where things are currently improving for existing models in my mind, but when I try openai or anthropic chat interface neither do a good job for long, not independently at least.

Humans also often have a hard time with this in general it is not a skill that everyone has and I think you can be a successful scientist without ever heavily developing first principles problem solving.

20 days ago by dsign

"Constructing a solid tower" from first principles is already super-human level. Sure, you can theorize a tower (sans the "solid") from first principles; there's a software architect at my job that does it every day. But the "solid" bit is where things get tricky, because "solid" implies "firm" and "well anchored", and that implies experimental grounds, experimental verification all the way, and final measurable impact. And I'm not even talking particle physics or software engineering; even folding a piece of paper can give you surprising mismatches between theory and results.

Even the realm of pure mathematics and elegant physic theories, where you are supposed to take a set of axioms ("first principles") and build something with it, has cautionary tales such as the Russel paradox or the non-measure of Feymann path integrals, and let's not talk about string theory.

20 days ago by samrus

Yes. Thats how all advancement in human knowledge happened. Small and incremental forays out of our training distribution.

These have been identified as various things. Eureka moments, strokes of genius, out of the box thinking, lateral thinking.

LLMs have not shown to be capable of this. They might be in the future, but they havent yet

21 days ago by dotancohen

Relativity comes to mind.

You could nitpick a rebuttal, but no matter how many people you give credit, general relativity was a completely novel idea when it was proposed. I'd argue for special relatively as well.

21 days ago by Paracompact

I am not a scientific historian, or even a physicist, but IMO relativity has a weak case for being a completely novel discovery. Critique of absolute time and space of Newtonian physics was already well underway, and much of the methodology for exploring this relativity (by way of gyroscopes, inertial reference frames, and synchronized mechanical clocks) were already in parlance. Many of the phenomena that relativity would later explain under a consistent framework already had independent quasi-explanations hinting at the more universal theory. Poincare probably came the closest to unifying everything before Einstein:

> In 1902, Henri Poincaré published a collection of essays titled Science and Hypothesis, which included: detailed philosophical discussions on the relativity of space and time; the conventionality of distant simultaneity; the conjecture that a violation of the relativity principle can never be detected; the possible non-existence of the aether, together with some arguments supporting the aether; and many remarks on non-Euclidean vs. Euclidean geometry.

https://en.wikipedia.org/wiki/History_of_special_relativity

Now, if I had to pick a major idea that seemed to drop fully-formed from the mind of a genius with little precedent to have guided him, I might personally point to Galois theory (https://en.wikipedia.org/wiki/Galois_theory). (Ironically, though, I'm not as familiar with the mathematical history of that time and I may be totally wrong!)

21 days ago by DonaldFisk

Agreed.

General relativity was a completely novel idea. Einstein took a purely mathematical object (now known as the Einstein tensor), and realized that since its coveriant derivative was zero, it could be equated (apart fron a constant factor) to a conserved physical object, the energy momentum tensor (except for a constant factor). It didn't just fall out of Riemannian geometry and what was known about physics at the time.

Special relativity was the work of several scientists as well as Einstein, but it was also a completely novel idea - just not the idea of one person working alone.

I don't know why anyone disputes that people can sometimes come up with completely novel ideas out of the blue. This is how science moves forward. It's very easy to look back on a breakthrough and think it looks obvious (because you know the trick that was used), but it's important to remember that the discoverer didn't have the benefit of hindsight that you have.

21 days ago by johnfn

Even if I grant you that, surely we’ve moved the goal posts a bit if we’re saying the only thing we can think of that AI can’t do is the life’s work of a man who’s last name is literally synonymous with genius.

21 days ago by poplarsol

That's not exactly true. Lorentz contraction is a clear antecedent to special relativity.

21 days ago by CooCooCaCha

Depends on what you think is valid.

The process you’re describing is humans extending our collective distribution through a series of smaller steps. That’s what the “shoulders of giants” means. The result is we are able to do things further and further outside the initial distribution.

So it depends on if you’re comparing individual steps or just the starting/ending distributions.

21 days ago by stouset

When chess engines were first developed, they were strictly worse than the best humans. After many years of development, they became helpful to even the best humans even though they were still beatable (1985–1997). Eventually they caught up and surpassed humans but the combination of human and computer was better than either alone (~1997–2007). Since then, humans have been more or less obsoleted in the game of chess.

Five years ago we were at Stage 1 with LLMs with regard to knowledge work. A few years later we hit Stage 2. We are currently somewhere between Stage 2 and Stage 3 for an extremely high percentage of knowledge work. Stage 4 will come, and I would wager it's sooner rather than later.

20 days ago by MITSardine

There's a major difference between chess and scientific research: setting the objectives is itself part of the work.

In chess, there's a clear goal: beat the game according to this set of unambiguous rules.

In science, the goals are much more diffuse, and setting those in the first place is what makes a scientist more or less successful, not so much technical ability. It's a very hierarchical field where permanent researchers direct staff (postdocs, research scientists/engineers), direct grad students. And it's at the bottom of the pyramid where the technical ability is the most relevant/rewarded.

Research is very much a social game, and I think replacing it with something run by LLMs (or other automatic process) is much more than a technical challenge.

21 days ago by bluecalm

The evolution was also interesting: first the engines were amazing tactically but pretty bad strategically so humans could guide them. With new NN based engines they were amazing strategically but they sucked tactically (first versions of Leela Chess Zero). Today they closed the gap and are amazing at both strategy and tactics and there is nothing humans can contribute anymore - all that is left is to just watch and learn.

20 days ago by undefined

[deleted]

21 days ago by TGower

With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth. It's worth keeping in mind just how little we understand about LLM capability scaling. Ask 10 different AI researchers when we will get to Stage 4 for something like programming and you'll get wild guesses or an honest "we don't know".

21 days ago by stouset

That is not what happened with chess engines. We didn’t just throw better hardware at it, we found new algorithms, improved the accuracy and performance of our position evaluation functions, discovered more efficient data structures, etc.

People have been downplaying LLMs since the first AI-generated buzzword garbage scientific paper made its way past peer review and into publication. And yet they keep getting better and better to the point where people are quite literally building projects with shockingly little human supervision.

By all means, keep betting against them.

21 days ago by baq

Chess grandmasters are living proof that it’s possible to reach grandmaster level in chess on 20W of compute. We’ve got orders of magnitude of optimizations to discover in LLMs and/or future architectures, both software and hardware and with the amount of progress we’ve got basically every month those ten people will answer ‘we don’t know, but it won’t be too long’. Of course they may be wrong, but the trend line is clear; Moore’s law faced similar issues and they were successively overcome for half a century.

IOW respect the trend line.

21 days ago by blt

And their predictions about Go were wrong, because they thought the algorithm would forever be α-β pruning with a weak value heuristic

21 days ago by NitpickLawyer

> With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth.

And the same practitioners said right after deep blue that go is NEVER gonna happen. Too large. The search space is just not computable. We'll never do it. And yeeeet...

21 days ago by guluarte

so we are going back to physical labor then

21 days ago by bpodgursky

I don't want to be rude but like, maybe you should pre-register some statement like "LLMs will not be able to do X" in some concrete domain, because I suspect your goalposts are shifting without you noticing.

We're talking about significant contributions to theoretical physics. You can nitpick but honestly go back to your expectations 4 years ago and think — would I be pretty surprised and impressed if an AI could do this? The answer is obviously yes, I don't really care whether you have a selective memory of that time.

21 days ago by RandomLensman

I don't know enought about theoretical physics: what makes it a significant contribution there?

21 days ago by terminalbraid

It's a nontrivial calculation valid for a class of forces (e.g. QCD) and apparently a serious simplification to a specific calculation that hadn't been completed before. But for what it's worth, I spent a good part of my physics career working in nucleon structure and have not run across the term "single minus amplitudes" in my memory. That doesn't necessarily mean much as there's a very broad space work like this takes place in and some of it gets extremely arcane and technical.

One way I gauge the significance of a theory paper are the measured quantities and physical processes it would contribute to. I see none discussed here which should tell you how deep into math it is. I personally would not have stopped to read it on my arxiv catch-up

https://arxiv.org/list/hep-th/new

Maybe to characterize it better, physicists were not holding their breath waiting for this to get done.

21 days ago by epolanski

Not every contribution has immediate impact.

21 days ago by undefined

[deleted]

21 days ago by outlace

I never said LLMs will not be able to do X. I gave my summary of the article and my anecdotal experiences with LLMs. I have no LLM ideology. We will see what tomorrow brings.

21 days ago by nozzlegear

> We're talking about significant contributions to theoretical physics.

Whoever wrote the prompts and guided ChatGPT made significant contributions to theoretical physics. ChatGPT is just a tool they used to get there. I'm sure AI-bloviators and pelican bike-enjoyers are all quite impressed, but the humans should be getting the research credit for using their tools correctly. Let's not pretend the calculator doing its job as a calculator at the behest of the researcher is actually a researcher as well.

21 days ago by famouswaffles

If this worked for 12 hours to derive the simplified formula along with its proof then it guided itself and made significant contributions by any useful definition of the word, hence Open AI having an author credit.

21 days ago by bpodgursky

If a helicopter drops someone off on the top of Mount Everest, it's reasonable to say that the helicopter did the work and is not just a tool they used to hike up the mountain.

21 days ago by square_usual

It's interesting to me that whenever a new breakthrough in AI use comes up, there's always a flood of people who come in to handwave away why this isn't actually a win for LLMs. Like with the novel solutions GPT 5.2 has been able to find for erdos problems - many users here (even in this very thread!) think they know more about this than Fields medalist Terence Tao, who maintains this list showing that, yes, LLMs have driven these proofs: https://github.com/teorth/erdosproblems/wiki/AI-contribution...

21 days ago by loire280

It's easy to fall into a negative mindset when there are legions of pointy haired bosses and bandwagoning CEOs who (wrongly) point at breakthroughs like this as justification for AI mandates or layoffs.

20 days ago by threethirtytwo

I think it's more insidious then this.

It's easy to fall into a negative mindset because the justification is real and what we see is just the beginning.

Obviously we are not at a point where developers aren't needed. But One developer can do more. And that is a legitimate reason to higher less developers.

The impending reality of the upward moving trendline is that AI becomes so capable that it can replace the majority of developers. That future is so horrifying that people need to scaffold logic to unjustifiy it.

20 days ago by anon84873628

What does "pointy haired" mean? (Presumably not literally?)

20 days ago by FeteCommuniste

The "pointy-haired boss" was a character in the Dilbert comics, an archetypical know-nothing manager who spews jargon, jumps on trends, and takes credit for ideas that aren't his.

20 days ago by brokencode

Crazy that an honest question like this gets downvoted.

I honestly think the downvote button is pretty trash for online communities. It kills diversity of thought and discussion and leaves you with an echo chamber.

If you disagree with or dislike something, leave a response. Express your view. Save the downvotes for racism, calls for violence, etc.

21 days ago by dakolli

Yes, all of these stories, and frequent model releases are just intended to psyop "decision makers" into validating their longstanding belief that the labour shouldn't be as big of a line item in a companies expenses, and perhaps can be removed altogether.. They can finally go back to the good old days of having slaves (in the form of "agentic" bots), they yearn to own slaves again.

CEOs/decision makers would rather give all their labour budget to tokens if they could just to validate this belief. They are bitter that anyone from a lower class could hold any bargaining chips, and thus any influence over them. It has nothing to do with saving money, they would gladly pay the exact same engineering budget to Anthropic for tokens (just like the ruling class in times past would gladly pay for slaves) if it can patch that bitterness they have for the working class's influence over them.

The inference companies (who are also from this same class of people) know this, and are exploiting this desire. They know if they create the idea that AI progress is at an unstoppable velocity decision makers will begin handing them their engineering budgets. These things don't even have to work well, they just need to be perceived as effective, or soon to be for decision makers to start laying people off.

I suspect this is going to backfire on them in one of two ways.

1. French Revolution V2, they all get their heads cutoff in 15 years, or an early retirement on a concrete floor.

2. Many decisions makers will make fools of themselves, destroy their businesses and come begging to the working class for our labor, giving the working class more bargaining chips in the process.

Either outcome is going to be painful for everyone, lets hope people wake up before we push this dumb experiment too far.

21 days ago by janalsncm

I’m reminded of Dan Wang’s commentary on US-China relations:

> Competition will be dynamic because people have agency. The country that is ahead at any given moment will commit mistakes driven by overconfidence, while the country that is behind will feel the crack of the whip to reform. … That drive will mean that competition will go on for years and decades.

https://danwang.co/ (2025 Annual letter)

The future is not predetermined by trends today. So it’s entirely possible that the dinosaur companies of today can’t figure out how to automate effectively, but get outcompeted by a nimble team of engineers using these tools tomorrow. As a concrete example, a lot of SaaS companies like Salesforce are at risk of this.

21 days ago by lovecg

Let’s have some compassion, a lot of people are freaking out about their careers now and defense mechanisms are kicking in. It’s hard for a lot of people to say “actually yeah this thing can do most of my work now, and barrier of entry dropped to the ground”.

21 days ago by Toutouxc

I am constantly seeing this thing do most of my work (which is good actually, I don't enjoy typing code), but requiring my constant supervision and frequent intervention and always trying to sneak in subtle bugs or weird architectural decisions that, I feel with every bone in my body, would bite me in the ass later. I see JS developers with little experience and zero CS or SWE education rave about how LLMs are so much better than us in every way, when the hardest thing they've ever written was bubble sort. I'm not even freaking about my career, I'm freaking about how much today's "almost good" LLMs can empower incompetence and how much damage that could cause to systems that I either use or work on.

21 days ago by kilroy123

I agree with you on all of it.

But _what if_ they work out all of that in the next 2 years and it stops needing constant supervision and intervention? Then what?

20 days ago by nprateem

Yes and look how far we've come in 4 years. If programming has another 4 that's all it has.

I'm just not sure who will end up employed. The near state is obviously jira driven development where agents just pick up tasks from jira, etc. But will that mean the PMs go and we have a technical PM, or will we be the ones binned? Probably for most SMEs it'll just be maybe 1 PM and 2 or so technical PMs churning out tickets.

But whatever. It's the trajectory you should be looking at.

20 days ago by threethirtytwo

Have you ever thought about the fact that 2 years ago AI wasn't even good enough to write code. Now it's good enough.

Right now you state the current problem is: "requiring my constant supervision and frequent intervention and always trying to sneak in subtle bugs or weird architectural decisions"

But in 2 years that could be gone too, given the objective and literal trendline. So I actually don't see how you can hold this opinion: "I'm not even freaking about my career, I'm freaking about how much today's "almost good" LLMs can empower incompetence and how much damage that could cause to systems that I either use or work on." when all logic points away from it.

We need to be worried, LLMs are only getting better.

20 days ago by threethirtytwo

I'm all for this. But it's the delusion and denialism of people not wanting to face reality.

Like I have compassion, but I can't healthily respect people who try so hard to rewrite reality so that the future isn't so horrifying. I'm a SWE and I'm affected too, but it's not like I'm going to lie to myself about what's happening.

21 days ago by dakolli

Yeah but you know what, this is a complete psyop.

They just want people to think the barrier of entry has dropped to the ground and that value of labour is getting squashed, so society writes a permission slip for them to completely depress wages and remove bargaining chips from the working class.

Don't fall for this, they want to destroy any labor that deals with computer I/0, not just SWE. This is the only value "agentic tooling" provides to society, slaves for the ruling class. They yearn for the opportunity to own slaves again.

It can't do most of your work, and you know that if you work on anything serious. But If C-suite who hasn't dealt with code in two decades, thinks this is the case because everyone is running around saying its true they're going to make sure they replace humans with these bot slaves, they really do just want slaves, they have no intention of innovating with these slaves. People need to work to eat, now unless LLMs are creating new types of machines that need new types of jobs, like previous forms of automation, then I don't see why they should be replacing the human input.

If these things are so good for business, and are pushing software development velocity.. Why is everything falling apart? Why does the bulk of low stakes software suck. Why is Windows 11 so bad? Why aren't top hedge funds, medical device manufactures (places where software quality is high stakes) replacing all their labor? Where are the new industries? They don't do anything novel, they only serve to replace inputs previously supplied by humans so the ruling class can finally get back to good old feeling of having slaves that can't complain.

20 days ago by D-Machine

"It's interesting to me that whenever some new result in AI use comes up, there's always a flood of people who come in to gesticulate wildly that that the sky is falling and AGI is imminent. Like with the recent solutions GPT 5.2 has been able to find for Erdos problems, even though in almost all cases such solutions rely on poorly-known past publications, or significant expert user guidance and essential tools like Aristotle, which do non-AI formal verification - many users here (even in this very thread!) think they know more about this than Fields medalist Terence Tao, who maintains this list showing that, yes, though these are not interesting proofs to most modern mathematicians, LLMs are a major factor in a tiny minority of these mostly-not-very-interesting proofs: https://github.com/teorth/erdosproblems/wiki/AI-contribution..."

The thing about spin and AI hype (besides being trivially easy to write) is that is isn't even trying to be objective. It would help if a lot of these articles would more carefully lay out what is actually surprising, and what is not, given current tech and knowledge.

Only a fool would think we aren't potentially on the verge of something truly revolutionary here. But only a fool would also be certain that the revolution has already happened, or that e.g. AGI is necessarily imminent.

The reason HN has value is because you can actually see some specifics of the matter discussed, and, if you are lucky, an expert even might join in to qualify everything. But pointing out "how interesting that there are extremes to this" is just engagement bait.

20 days ago by famouswaffles

>It's interesting to me that whenever some new result in AI use comes up, there's always a flood of people who come in to gesticulate wildly that that the sky is falling and AGI is imminent.

Really? Is that happening in this thread because I can barely see it. Instead you have a bunch of asinine comments butthurt about acknowledging a GPT contribution that would have been acknowledged any day had a human done it.

>they know more about this than Fields medalist Terence Tao, who maintains this list showing that, yes, though these are not interesting proofs to most modern mathematicians, LLMs are a major factor in a tiny minority of these mostly-not-very-interesting proofs

This is part of the problem really. Your framing is disingenuous and I don't really understand why you feel the need to downplay it so. They are interesting proofs. They are documented for a reason. It's not cutting edge research, but it is LLMs contributing meaningfully to formal mathematics, something that was speculative just years ago.

20 days ago by D-Machine

> Your framing is weirdly disingenuous

I am not surprised that you can't understand that the quote I am making is obviously parodying the OP as disingenuous. Given our previous interactions (https://news.ycombinator.com/item?id=46938446), it is clear you don't understand much things about AI and/or LLMs, or, perhaps, basic communication, at all.

20 days ago by Leynos

Can we not just say "this is pretty cool" and enjoy it rather than turning it into a fight?

20 days ago by threethirtytwo

>Only a fool would think we aren't potentially on the verge of something truly revolutionary here. But only a fool would also be certain that the revolution has already happened, or that e.g. AGI is necessarily imminent.

This sentence sounds contradictory. You're a fool to not think we're on the verge of something revolutionary and you are a fool if you think something revolutionary like AGI is on the verge of happening?

But to your point if "revolutionary" and "agi" are different things, I'm certain the "revolution" has already happened. ChatGPT was the step function change and everything else is just following the upwards trendline post release of chatGPT.

Anecdotally I would say 50% of developers never code things by hand anymore. That is revolutionary in itself and by the statement itself it has already literally happened.

20 days ago by krackers

Because most times results like this are overstated (see the Cursor browser thing, "moltbook", etc.). There is clear market incentive to overhype things.

And in this case "derives a new result in theoretical physics" is again overstating things, it's closer to "simplify and propose a more general form for a previously worked out sequence of amplitudes" which sounds less magical, and closer to something like what Mathematica could do, or an LLM-enhanced symbolic OEIS. Obviously still powerful and useful, but less hype-y.

20 days ago by newswasboring

> it's closer to "simplify and propose a more general form for a previously worked out sequence of amplitudes"

How is this different than a new result? Many a careers in academia are built on simplifying mathematics.

21 days ago by Davidzheng

"An internal scaffolded version of GPT‑5.2 then spent roughly 12 hours reasoning through the problem, coming up with the same formula and producing a formal proof of its validity."

When I use GPT 5.2 Thinking Extended, it gave me the impression that it's consistent enough/has a low enough rate of errors (or enough error correcting ability) to autonomously do math/physics for many hours if it were allowed to [but I guess the Extended time cuts off around 30 minute mark and Pro maybe 1-2 hours]. It's good to see some confirmation of that impression here. I hope scientists/mathematicians at large will be able to play with tools which think at this time-scale soon and see how much capabilities these machines really have.

21 days ago by mmaunder

Yes and 5.3 and the latest codex cli client is incredibly good across compactions. Anyone know the methodology they're using to maintain state and manage context for a 12 hour run? It could be as simple as a single dense document and its own internal compaction algrorithm, I guess.

21 days ago by knicholes

https://developers.openai.com/cookbook/articles/codex_exec_p... might be what you're looking for

21 days ago by slopusila

after those 30 min you can manually ask it again to continue working on the problem

21 days ago by Davidzheng

It's a bit unclear to me what happens if I do that after it thinks for 30 minutes and ends with no response. Does it start off where it left off? Does it start from scratch again? Like I don't know how the compaction of their prior thinking traces work

21 days ago by cpard

AI can be an amazing productivity multiplier for people who know what they're doing.

This result reminded me of the C compiler case that Anthropic posted recently. Sure, agents wrote the code for hours but there was a human there giving them directions, scoping the problem, finding the test suites needed for the agentic loops to actually work etc etc. In general making sure the output actually works and that it's a story worth sharing with others.

The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding. It works great for creating impressions and building brand value but also does a disservice to the actual researchers, engineers and humans in general, who do the hard work of problem formulation, validation and at the end, solving the problem using another tool in their toolbox.

21 days ago by supern0va

>AI can be an amazing productivity multiplier for people who know what they're doing.

>[...]

>The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding.

You're sort of acting like it's all or nothing. What about the the humans that used to be that "force multiplier" on a team with the person guiding the research?

If a piece of software required a team of ten to people, and instead it's built with one engineer overseeing an AI, that's still 90% job loss.

For a more current example: do you think all the displaced Uber/Lyft drivers aren't going to think "AI took my job" just because there's a team of people in a building somewhere handling the occasional Waymo low confidence intervention, as opposed to being 100% autonomous?

21 days ago by guluarte

Where I work, we're now building things that were completely out of reach before. The 90% job loss prediction would only hold true if we were near the ceiling of what software can do, but we're probably very, very far from it.

A website that cost hundreds of thousands of dollars in 2000 could be replaced by a wordpress blog built in an afternoon by a teenager in 2015. Did that kill web development? No, it just expanded what was worth building

20 days ago by matwood

> If a piece of software required a team of ten to people, and instead it's built with one engineer overseeing an AI, that's still 90% job loss.

Yes, but this assumes a finite amount of software that people and businesses need and want. Will AI be the first productivity increase where humanity says ‘now we have enough’? I’m skeptical.

20 days ago by kaibee

> Yes, but this assumes a finite amount of software that people and businesses need and want.

A lot of software exists because humans are needy and kinda incompetent, but we needed to enable to process data at scale? Like, would you build SAP as it is today, for LLMs?

20 days ago by throwaway743

This is all inevitable with the trajectory of technology, and has been apparent for a long time. The issue isn't AI, it's that our leaders haven't bothered to think or care about what happens to us when our labor loses value en masse due to such advances.

Maybe it requires fundamentally changing or economic systems? Who knows what the solution is, but the problem is most definitely rooted in lack of initiative by our representatives and an economic system that doesn't accommodate us for when shit inevitably hits the fan with labor markets.

21 days ago by cpard

there's 90% job loss assuming that this is a zero sum type of thing where humans and agents compete for working on a fixed amount of work.

I'm curious why you think I'm acting like it's all or nothing. What I was trying to communicate is the exact opposite, that it's not all or nothing. Maybe it's the way I articulate things, I'm genuinely interested what makes it sound like this.

20 days ago by ramathornn

Fully agree with your og comment and I didn’t get the same read as the person above at all.

This is a bizarre time to be living in, on one hand these tools are capable of doing more and more of the tasks any knowledge worker today handles, especially when used by an experienced person in X field.

On the other, it feels like something is about to give. All the superbowl ads, AI in what feels like every single piece of copy coming out these days. AI CEOs hopping from one podcast to another warning about the upcoming career apocalypse…I’m not fully buying it.

21 days ago by jonahx

> The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding.

It's also a legitimate concern. We happen to be in a place where humans are needed for that "last critical 10%," or the first critical 10% of problem formulation, and so humans are still crucial to the overall system, at least for most complex tasks.

But there's no logical reason that needs to be the case. Once it's not, humans will be replaced.

21 days ago by cpard

The reason there is a marketing opportunity is because, to your point, there is a legitimate concern. Marketing builds and amplifies the concern to create awareness.

When the systems turn into something trivial to manage with the new tooling, humans build more complex or add more layers on the existing systems.

20 days ago by krethh

The logical reason is that humans are exceptionally good at operating at the edge of what the technology of the time can do. We will find entire classes of tech problems which AI can't solve on its own. You have people today with job descriptions that even 15 years ago would have been unimaginable, much less predictable.

To think that whatever the AI is capable of solving is (and forever will be) the frontier of all problems is deeply delusional. AI got good at generating code, but it still can't even do a fraction of what the human brain can do.

20 days ago by jonahx

> To think that whatever the AI is capable of solving is (and forever will be) the frontier of all problems is deeply delusional. AI got good at generating code, but it still can't even do a fraction of what the human brain can do.

AGI means fully general, meaning everything the human brain can do and more. I agree that currently it still feels far (at least it may be far), but there is no reason to think there's some magic human ingredient that will keep us perpetually in the loop. I would say that is delusional.

We used to think there was human-specific magic in chess, in poker, in Go, in code, and in writing. All those have fallen, the latter two albeit only in part but even that part was once thought to be the exclusive domain of humans.

21 days ago by decidu0us9034

I'm not sure you can call something an optimizing C compiler if it doesn't optimize or enforce C semantics (well, it compiles C but also a lot of things that aren't syntactically valid C). It seemed to generate a lot of code (wow!) that wasn't well-integrated and didn't do what it promised to, and the human didn't have the requisite expertise to understand that. I'm not a theoretical physicist but I will hold to my skepticism here, for similar reasons.

21 days ago by cpard

sure, I won't argue on this, although it did manage to deliver the marketing value they were looking for, at the end their goal was not to replace gcc but to make people talk about AI and Anthropic.

What I said in my original comment is that AI delivers when it's used by experts, in this case there was someone who was definitely not a C compiler expert, what would happen if there was a real expert doing this?

21 days ago by BrouteMinou

Deliver what exactly? False hope and lies?

https://github.com/anthropics/claudes-c-compiler/issues/228

21 days ago by elzbardico

Actually, the results were far worse and way less impressive than what the media said.

21 days ago by cpard

the c compiler results or the physics results this post is about?

21 days ago by elzbardico

The C compiler.

21 days ago by NewsaHackO

His point is going to be some copium like since the c compiler is not as optimized as gcc, it was not impressive.

21 days ago by nilkn

It would be more accurate to say that humans using GPT-5.2 derived a new result in theoretical physics (or, if you're being generous, humans and GPT-5.2 together derived a new result). The title makes it sound like GPT-5.2 produced a complete or near-complete paper on its own, but what it actually did was take human-derived datapoints, conjecture a generalization, then prove that generalization. Having scanned the paper, this seems to be a significant enough contribution to warrant a legitimate author credit, but I still think the title on its own is an exaggeration.

20 days ago by uh_uh

Would you be similarly pedantic if a high-schooler did the same?

20 days ago by nilkn

Yes. Someone making one contribution among many to a paper clearly does not deserve anything like sole authorship credit of the entire paper, which is what the title from OpenAI implies to me. I don't believe I'm being pedantic at all. And, by the way, high schoolers or college students make co-author-level contributions to real papers quite frequently in the US at least (I was one of them).

The text of the post is much more honest. The title is where the dishonesty is.

20 days ago by lupsasca

Hi, I'm an author on the paper. It was definitely a human-AI collaboration, but it is also true that the final simplified formula, Eq. 39 in the paper (which is what we had been seeking, without success), was conjectured and proved by GPT. So it derived a new result in theoretical physics. I'm genuinely puzzled by your complaint.

21 days ago by turzmo

Physicist here. Did you guys actually read the paper? Am I missing something? The "key" AI-conjectured formula (39) is an obvious generalization of (35)-(38), and something a human would have guessed immediately.

(35)-(38) are the AI-simplified versions of (29)-(32). Those earlier formulae look formidable to simplify by hand, but they are also the sort of thing you'd try to use a computer algebra system for.

I'm willing to (begrudgingly) admit the possibility for AI to do novel work, but this particular result does not seem very impressive.

I picture ChatGPT as the rich kid whose parents privately donated to a lab to get their name on a paper for college admissions. In this case, I don't think I'm being too cynical in thinking that something similar is happening here and that the role of AI in this result is being well overplayed.

20 days ago by radioactivist

Also a physicist here -- I had the same reaction. Going from (35-38) to (39) doesn't look like much of a leap for a human. They say (35-38) was obtained from the full result by the LLM, but if the authors derived the full expression in (29-32) themselves presumably they could do the special case too? (given it's much simpler). The more I read the post and preprint the less clear it is which parts the LLM did.

21 days ago by cxvwK

[dead]

20 days ago by refulgentis

Random anonymous HN driveby claiming something that'd be horrible PR; or the coauthors on the GPT-5.2 paper...and the belief OpenAI isn't aggressively stupid, especially after earlier negative press....gotta say, going with the coauthors, after seeing their credentials.

20 days ago by turzmo

I think you're misunderstanding my claim. There's no scandal here, just run-of-the-mill academic politicking. I fully believe that ChatGPT did the work they say it did, but that it deserves about as much credit as Mathematica does in "deriving a new result".

21 days ago by Insanity

They also claimed ChatGPT solved novel erdös problems when that wasn’t the case. Will take with a grain of salt until more external validation happened. But very cool if true!

21 days ago by famouswaffles

Well they (OpenAI) never made such a claim. And yes, LLMs have made unique solutions/contributions to a few erdos problems.

21 days ago by smokel

How was that not the case? As far as I understand it ChatGPT was instrumental to solving a problem. Even if it did not entirely solve it by itself, the combination with other tools such as Lean is still very impressive, no?

21 days ago by emil-lp

It didn't solve it, it simply found that it had been solved in a publication and that the list of open problems wasn't updated.

21 days ago by Davidzheng

My understanding is there's been around 10 erdos problems solved by GPT by now. Most of them have been found to be either in literature or a very similar problem was solved in literature. But one or two solutions are quite novel.

https://github.com/teorth/erdosproblems/wiki/AI-contribution... may be useful

21 days ago by undefined

[deleted]

21 days ago by vonneumannstan

Wasnt that like some marketing bro? This is coming out the front door with serious physicists attached.

20 days ago by castigatio

I'm not sure where people think humans are getting these magical leaps of insight that transcend combinations of existing things. Magic? Ghost in the machine? The simplest explanation is that "leaps of insight" are simply novel combinations that demonstrate themselves to have some utility within the boundaries of a test case or objective.

Snow + stick + need to clean driveway = snow shovel. Snow shovel + hill + desire for fun = sled

At one point people were arguing that you could never get "true art" from linear programs. Now you get true art and people are arguing you can't get magical flashes of insight. The will to defend human intelligence / creativity is strong but the evidence is weak.

20 days ago by hiAndrewQuinn

Some people defend it because they are nondualists. They think the moral value of human life rounds to zero against the existence of something which can effortlessly outclass them in all domains. This is obviously confused, but they can't bring themselves to say "Very cool, and also I think humans are inherently special and deserve to continue existing even if all we do is lie around all day and watch the Hallmark channel."

Happy Valentine's day to those who celebrate btw <3

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.

Home About GitHub Kaggle

AI Blog Deep Learning Apps Security Checklist

Bookmarks Hacker News My Stack