Here’s one way to think about the difference between coming up with a formal proof and having something other mathematicians can use:
> A clear explanation can be found in Alex Kontorovich’s account of his own learning curve with formalized mathematics. In a nutshell: Mathlib, the dominant Lean library, is a human-curated formalization of an ever-growing fraction of existing human mathematics. It exposes clean APIs and abstractions, without which no autoformalization could take place. By contrast, Math Inc’s autoformalized proof of Viazovska’s results exposes no intelligible interface. Who in their right mind would merge a 200,000-line unaudited vibe-coded blob into the master branch of global human science?
Imo, the proved theorem is the API. And that's really all it has to be.
If there are other lemmas, etc buried inside that 200k blob that can be factored out and proved and used themselves, so much the better. But denying a machine-valid proof just because it's incomprehensible with what a human being considers a reasonable effort made to unpack it just seems odd to me. I see no reason not to accept the vibe coded blob if Lean says it's kosher except anthropocentrism.
(i) accepting that a piece of code is a valid Lean proof
(ii) merging a valid Lean proof into Mathlib.
Valid Lean proofs need maintenance. Mathlib is a living blob of code. People care about how fast the proofs typecheck. Many other properties of code play a role.
Not everything that is true is worthy of immediately including in the Encyclopaedia Brittanica.
So add it as an axiom with a comment pointing to the proof. If you don't trust the proof because you don't personally understand it, why are you using a proof assistant in the first place?
You can't deny it if it's true, but the point is to find new techniques and abstractions. A proof you can't extrapolate and learn from is just a checkmark, and about as useful.
I don't think that's the point. I think the point is to prove the statement. The techniques and abstractions are a means to an end; making them the point is being seduced by the beauty of the weapon.
Most of the interesting research I’ve ever done started while reading through the intermediate steps in an unrelated paper.
As far as I can tell from colleagues in other domains, it’s the same there. One paper will mention something off-hand and that’ll cause someone else to have a spark of insight, which turns into it’s own valuable research
New techniques and abstractions is how mathematics expand. Mathematics is about studying structures, proving statements is a part of it but it is not all what mathematics is about. If anything, proofs themselves are a means to an end (understanding). Eg Galois developed some techniques and abstractions to prove that there is no general solution to polynomial equations of degree >=5, but these techniques and abstractions gave rise to whole new mathematical fields.
Mathematics has to be also understood from the perspective of theory building, not just problem solving.
A lot of mathematical "problems" are almost entirely pointless. Nobody genuinely cares about the moving sofa problem, or about square packing, or about the minimum number of colors needed to draw a map - it is the math that is developed during the solving process that is valuable!
An answer to a question like "what is the exact area of a unit circle" is a mere curiosity. Calculating a good-enough approximation is trivial, after all. But wanting an exact answer leads to developing calculus, which leads to most modern physics. Science was able to make a giant leap forwards due to the techniques developed, while the actual answer itself is mostly useless.
My instinct is to agree with you. I believe that the drive to a deeper understanding of the problems is what helps us unlock new areas of study, and find opportunities to transfer techniques or bridge otherwise unconnected domains.
But let’s consider a hypothetical: what if an intuitive understanding of the true “boundaries” of mathematics (if such things exist) is beyond the capabilities of a human mind? If there truly is no way to simplify some proofs down from 200,000 line incomprehensible gibberish to something you could teach to a high schooler or undergraduate or even a PhD. Is the proof still worthless? Sure, at the moment, it might be. Finding such a proof and understanding the implications of it are different skills, the latter of which AI almost certainly does not possess at the moment. But there may come a time where the AI can view the bigger picture and make the leaps you described (say, an eka-Calculus from an eka-unit-circle). These leaps may be as unintelligible to us as the proof in OP is.
I guess the question is: assuming that we can’t make the proof beautiful enough to spark deeper human understanding, do we still want it if it sparks deeper AI understanding?
Personally I would hate to live in a universe where the boundaries of science are beyond intuitive human understanding, but I think it’s almost certainly the case. The idea that the rules are all within our grasp reeks of anthropocentrism to me. I would love for the universe to prove me wrong though. It’d be a pleasant, hilarious coincidence if they do fit within the boundaries of our understanding.
> what if an intuitive understanding of the true “boundaries” of mathematics (if such things exist) is beyond the capabilities of a human mind?
By extension: why should we assume that a human would still understand the problems - or the answers? If all of it is complete gibberish to a human and can never be applied in any way, shape, or form, then what's the point?
The way I view it there are two options here: either you completely ignore it and end up burning a massive amount of electricity on what is essentially a bunch of LLMs jerking each other off, or you blindly follow it and end up with a Machine God who can justify a genocide with a "This is the correct thing to do. Trust me bro, I have irrefutable proof - you won't understand it". There's just no sensible way to do post-human math in an inherently human world.
That’s a fair point. I wouldn’t want to take machine god’s proclamations on faith either. I’d prefer it if the knowledge was always within our grasp. There is also a possible middle ground where we don’t understand the questions or the answers but we still benefit from the effects of the application of that knowledge. “Sufficiently advanced technology is indistinguishable from magic” and all that. Hopefully we can still judge the results based on the effects. Rejecting a call to genocide should be easy enough, but on the other hand the Native Americans couldn’t foresee what the smallpox blankets would do to them. Working from a position where you are the weaker side of a knowledge gap is a scary thought. It’s a rational fear and I think a lot of people will end up on that side of the divide if AI continues to advance.
"Mathematics isn't about proving the statement! It's about the collection of substatements which lead to the proof of the statement."
"Okay, so what determines what is and is not allowed in the collection?"
"Whether the given substatement is true or not, of course."
Like this is obviously silly, right. In your view you could have two guys both trying to prove or disprove that the area of the unit circle is 3, and yet only the guy doing it with some vision of nobility where he's building up to this grand theory of approximations is the one actually doing "real" mathematics. The guy who's doing it just because he thinks it's neat and would like an answer to the problem itself doesn't count, and you suspect he couldn't even exist.
"Dear 13th-century Magical Oracle, It is possible to transmute lead into gold?" "Yes. This pile of giant incomprehensible spaghetti[0] is the irrefutable proof that it is possible."
What did the alchemist truly learn from this interaction? Is the answer in any way helpful for him? Will it lead to him making gold, and the implied endless riches it entails? Why should it bring him him to ask follow-up questions with useful answers like "The technology to do this doesn't exist yet and can't be developed within several lifetimes" and "It'll never be economically profitable to turn lead into gold", instead of blindly being fed piecemeal steps in the impossible task of one medieval guy building a LHC?
Writing true statements is absolutely trivial, anyone can do that. There's an endless amount of true statement which have absolutely no value to humanity whatsoever. Their proof is meaningless without something to give it context. "Substatement 1304 is true" has significantly less value than "here's something I call 'group theory', and it might also be helpful solving open issues in fields X, Y, and Z".
Mathematics is about widening our collective understanding. There will of course be some people out there who'll hear "The answer to life, the universe, and everything is forty-two", think "Neato!", and go on with their day without giving it any further thought - but I'd have a hard time calling them mathematicians.
That doesn't mean it is completely useless, of course. A "prove that no efficient algorithm exists to break this new encryption scheme" would be very helpful indeed. Until you realize it neglected to take quantum computing into account, of course.
The argument of your imaginary dialogue is very weak. To run a valid marathon one must cover 42 kms, but we do not run marathons because we want to be 42 kms away. Building a robot to do that would be probably an interesting feat for robotics, but it would not leave any mark in the history of the sport.
Isn't the question at hand whether its bad for mathematics if "prove the circle has area 3" devolves into 500 lines of inscrutable lean code, vs just having `pi r^2 == 3`? Sure they both "proved" it true/false, but knowing an answer isn't as useful as knowing why its an answer. Knowing an answer does have some value, its just not as valuable. If I can't work it out myself, I just trust the oracle.
Now if you ask "does the area of unit circle equal 4?", I don't really know, but we can go back to the oracle and ask again (we haven't learned the general pattern).
Also, I'm not sure that assuming this 'area of circle' question was cutting edge math, that the oracle wouldn't say 'yes, to a certain level of tolerance'. Can't count how many times I've seen agent decide a test needs to be loosened or deleted because its an "edge case" or "blocking". If you don't understand the proof you might get back 'yes' for some versions of 'is the area of unit circle 3' (depending on complexity of that ask).
It’s like the Hilbert’s problem and the birth of computers. Turing’s paper is like 66 pages, but the most valuable output is the Turing machine which can be explained in a few paragraphs.
> just as we don't re-prove Fermat's little theorem every time I use it in a proof
Exactly! There's a shared foundation, and everyone builds upon it. A mathematical paper is a whole bunch of Lego blocks being added to that foundation, and combining them in a hopefully-useful new interface.
But if the entire paper is just one giant black box, you only get to use the final interface: you lose the ability to meaningfully repurpose the individual Lego blocks to build a different interface. You end up having to reinvent the same Lego blocks over and over again, just with a slightly different color each time.
Don't want to re-prove each and every Lego brick? Then you shouldn't accept giant black box proofs.
I think the interesting part here is that the black box gives us a starting point. And it's not a true black box: if a model is generating Lean, we can examine the entire source, iteratively refine by refactoring into smaller self-contained units, and the compiler gives us the certainty that our decomposition is correct.
The theorem doesn't exist in a vacuum. It talks about objects that must be formally defined. And if that formal definition (which is part of the API) is not immediately compatible with those others use, and every single theorem comes with its own definitions of the objects they're working on, you're going to be reinventing the wheel over and over again.
Replacing thought and curation with repeated automation is tech debt, pushed down to fundamental knowledge and understanding.
> I see no reason not to accept the vibe coded blob if Lean says it's kosher except anthropocentrism.
The laws of mathematics exist and their truths hold before they are proven by humans or our machines, so in a very real sense the entire point of proving anything in the first place is anthropocentrism.
That, plus cleaning it up may reveal it contains proofs of other things we also want to know. Imagine if this happened to also contain as a sub-part a proof of all the open Millennium Prize Problems? We don't know until we investigate. (If it was a specific list of things to check from rather than expanding humanity's library, we could just ask an AI to do it for us… but as The Wachowski sisters wrote in their most famous script: "I say your civilization because as soon as we started thinking for you, it really became our civilization").
These proof checkers all have bugs, every single one, and since AI is still 100% incapable of understanding simple mathematics we should assume agents are likely to cheat by exploiting a kernel bug. So a human really does have to be able to read and understand the proof. There's no difference between blindly trusting Lean and blindly trusting Grigori Perelman: yes you can be reasonably confident the proof is correct. But you gotta check.
This future of "the AI built the compiler and it's totally incomprehensible spaghetti, but don't worry, it verified the compiler works using this AI-generated proof assistant whose codebase is also pure spaghetti" terrifies me.
That's not what I mean. I am talking about quirks and bugs that really are pretty subtle, and which really might turn up in e.g. a verified systems programming context: https://github.com/James-Hanson/junk-theorems-in-lean/blob/m... (there is a proof of 0=1 at the end - it is easy to understand where it comes from, but low-level compiler stuff like this is always a possibility.)
LLM agents will a) discover dumb counterintuitive stuff like this and b) exploit it to satisfy the kernel with c) the questionable lines being buried behind millions of lines of math slop. Humans have to check this stuff.
The idea behind the challenge to provide a few lines of human-written/checked scaffolding that defines an "API surface". The API surface is resistant to junk/slop filling out the sorry's. This is supposed to reduce the number of lines that humans have to read to only a handful, while the full valid AI-generated proof is tens of thousands of lines. The challenge was just solved a couple weeks, here's the discussion about it: https://leanprover.zulipchat.com/#narrow/channel/583336-Auto...
You might get the Riemann hypothesis robo-proven and unintelligibly dense. Would mathematic professionals trust it, or would they always be putting asterisks next to proofs downstream of it ?
We put asterisks next to proofs that use the axiom of choice. If a machine comes up with proof that’s not verifiable by humans, you can very well be sure that there will be asterisks.
>But denying a machine-valid proof just because it's incomprehensible with what a human being considers a reasonable effort made to unpack it just seems odd to me.
Why not just fork the original master branch of human science to an ai-enhanced one and see where that brings us?
But part of the point of mathematics is human understanding. I think most would be willing to accept the proof. They just wouldn't think it's nearly as useful as one that could be understood.
> Here’s one way to think about the difference between coming up with a formal proof and having something other mathematicians can use:
Here is a other one: hello_world.c versus hello_world.exe (apologies for windows extensions, just for illustration).
One is made by a human for human consumption and extension (though legible by a machine). The source code.
One is made by a machine for a machine. Unreadable by a human. The "binary", though that's a terrible misnomer. (Sure you can disassemble but any nontrivial program is too much to cope with as a whole).
Source vs binary. Both are useful but only one is useful for human consumption.
Binaries are executed by machines but are not yet understandable by machines. (Now that we live in an era where machines can understand, imperfectly, like us)
Perhaps in the future we'll see golfing of formalized proofs as a valid and valuable form of mathematics, not just proving a theorem for the first time.
Of course there's no reason AI couldn't do that too.
> Who in their right mind would merge a 200,000-line unaudited vibe-coded blob
Anyone who understands type theory and how theorem provers work? It's sort of akin to saying "how do you know that a massive C++ program that compiles to machine code compiled to the correct machine code that will actually run and it's just not a random string of bits!?!?!", you know because the compilation would have failed otherwise(this is different than saying the program behaves correctly, but that's precisely the difference between formal proofs and compiled programs).
The entire argument both you and Bessis are implying is that mathematics must be human intelligible. But there's absolutely no reason to assume that every mathematical statement must have a human intelligible representation. There is also not reason to assume that if we restrict ourselves to the subset of mathematical statements that are human intelligible that this is of any use.
Just because people who don't want this to be true, and I can understand the motivation, doesn't mean that it isn't still the case.
Imagine tomorrow all source code for all software disappears.
Would we still be able to use computers? Of course! They don't need the source code to run.
Would we nevertheless be in big trouble? Oh definitely. We'd need to write all software again, from scratch. Some critical parts we could reverse engineer. Maybe even derive some structures that translate back into source code, but only because a human wrote that source code in the first place.
Hopefully the point is clear: A proof, even if it is correct, that is totally obscure and unintelligable by humans is not very useful for mathematics. It's a black box that doesn't further understanding of the structures and approaches to think about them, and that's what math is all about.Just having a big binary blob of a program doesn't help much if you want to add a feature.
That's also why biology is so hard. There is no source code. It's just millions and millions of years of evolution and things have evolved in weird ways that don't really make it easy to understand them, even though they clearly work.
You didn’t answer why merge it into a library focused on humans developing mathematics though.
It remains all of those things, sitting alone in its own repository of 200kLOC; what benefit comes from merging it into mathlib?
> There is also not reason to assume that if we restrict ourselves to the subset of mathematical statements that are human intelligible that this is of any use.
This is obviously silly:
Things that aren’t human intelligible aren’t human usable, so the restriction is necessary to have a collection of things humans can utilize.
> Things that aren’t human intelligible aren’t human usable
This is objectively false, people use things every single day they don't understand. We still have plenty of things about the world we don't understand but still find useful.
You are saying anything we know to be the case, but cannot understand why cannot be used? Can we just stop sleeping because we haven't reasoned why sleep is necessary even though we know it is necessary? I mean we still don't really understand gravity (we know how but not why)
If you’re fine with a future like Warhammer 40k where we all have to be Tech Priests making prayers and performing opaque rituals to get things out of the machine god because we no longer understand things, that’s fine, but that’s not a future the rest of us want.
It’s not a matter of what we want or don’t want, or are fine with. The universe doesn’t owe us explanations. Of course I’d prefer to understand everything, or as much as possible, but we don’t always get what we want.
In the 40k universe, modern humanity is below not just their peak of technological prowess, but even their peak of pre-AI prowess. Because they landed in a valley of superstition involving machines they do not understand — and have no way to rebuild towards their peak, without yet further losses by moving away from the magical machines.
That has nothing to do with the universe at large — and everything to do with human society and choices. Which we do have control over.
I’m familiar with 40K, I played Laserburn back in the 80s. I met Bryan Ansel before 40K existed.
The comment I replied to equivocated between absolute limits to comprehensibility to humans and sociologically constructed limits, implying that thinking there may be the former makes someone ‘fine with’ the latter. That’s nonsense.
The problem is that you cant stop it. If there is a wrong way to do something, then someone will do it. Thankfully we understand almost nothing already so it will be easy to adapt - and surrender to the will of the machine.
No — people don’t successfully use things they don’t understand every day.
They approximately use them with varying degrees of success, but also mistakes, broken inferences, etc.
My exact point is that your view reduces our ability to do mathematics to that broken, flawed usage and thereby undermines its utility for logical precision: mathematics is only useful because we cleanly understand it.
When you try to use mathematics without understanding, you cause disasters: stock market crashes from mispricing options, Amazon’s 2018 hiring freeze from misallocating $1B, etc.
Note: neither of your examples (sleep, gravity) are things that people intentionally use. They just happen to people.
I think it’s very telling you couldn’t think of an example.
It's easy to find counterexamples: the entire science of pharmacology is based on macroscopic effects that often lack a fundamental understanding of the underlying mechanisms of action. Psychopharmacology is the extreme example. Often, the fact that a drug worked made scientists investigate and discover the mechanism behind it, but for many drugs used every day by billions it's still a mystery, or it's understood only in very broad terms.
So what will you do if the doctor prescribes you an LLM-vibecoded drug that nobody understands how it works, yet it cures some deadly affliction with close to 100% efficacy?
What if, say, these incomprehensible math results lead to a revolution in quantum physics which unlocks chip topologies that are orders of magnitude faster than human comprehensible designs?
Would the high priestess of human reason pass her divining rod over such chips or life-saving drugs and reject it as the work of the AI devil?
Psychopharmacology is a great example of “doesn’t reliably work” as their products have serious and even disastrous side effects at times — including SSRIs triggering violent acts and suicides.
Again, my exact point is that mathematics loses its utility when you reduce it to that inaccurate usage. You no longer can have any faith in the conclusions — just like sometimes psychiatrists kill their patients with an SSRI prescription because they don’t understand the drugs.
> Would the high priestess of human reason pass her divining rod over such chips or life-saving drugs and reject it as the work of the AI devil?
My point is you can’t know if you’re turning it over for life saving drugs or poison, if you don’t understand what you’re getting.
This is true for any drug, any drug can presumably become a poison, can interact with some genetic or biological trait and trigger a side effect and so on. The complexity of the biological systems is so great that they defy clear deterministic understanding, but stochastic empiric knowledge and treatments still have immense value.
If you give me an inference chip that runs 200x faster, yes, it could be backdoored to take control of my dishwasher and kill me in my sleep - but I can't deny it runs 200x faster an account of nobody being able to explain why. The same for the mistery cancer drug that cured everyone who took it up to now, but could, without doubt, kill the next patient.
How many people drive cars without knowing how an engine works? Or make a phone call without knowing how voice compression for a cellular network does it's thing? Or eats food without knowing how it came together from the supply chain?
The mechanic who repairs the cars knows how the engine works.
The telco that manages loads and allocates networks knows how voice compression works.
The farmers and supermarkets know how the supply chain works.
None of your questions show why mathematics should include blobs of incomprehensible gloop, where no mathematician, no logician, no philosopher, no man on the street can make sense of said gloop, or use it in any way to further human knowledge.
When it's been decomposed down we can discuss this further, but now it's like saying red is red, just because.
Well, you said people can't use things they don't understand.
But I'll take your expanded statement, to include riding a horse, something even older than the engine. We don't understand fully how a horse works -- biology is still a matter of seeing fragments of the whole -- but people had no problem riding and breeding them before the invention of the car, and before the discovery of genetics.
Meanwhile, understanding the math of a thing -- like stock markets, or nuclear bombs -- does not prevent its use from going badly.
Math is useful and beautiful, and a helpful tool for expanding our understanding of the world, but it is not the whole of understanding, or the sole factor in successful application of science to the world .
Why not? One can surely use math even if they have no clue about how to prove theorems. I suck at math, but I use it every day, without knowing how to advance it.
I think it might be fair to say that a proof cannot be without value if it proves something meaningful to a human, that a human can use somehow? But such proof probably doesn’t belong in a library seemingly explicitly dedicated to human-graspable proofs. Just because it violates the intent.
> Can we just stop sleeping because we haven't reasoned why sleep is necessary even though we know it is necessary? I mean we still don't really understand gravity (we know how but not why
If you go to the fiction side of things, you’ll get a lot of things that humans would like to do, but can’t right now as we’re blocked by physics laws and our own biology. We have harnessed electricity, why can’t we harness gravity?
You would think that the people maintaining Mathlib are a subset of
{understands type theory and how theorem provers work}...
Yet they don't merge this stuff, for many different reasons, including that those 200K lines are free as in puppies.
You can have all the type theory of the world, but the library still needs maintenance.
Change a simp-lemma in a file close to the root. Oooh noes, now there's 987 errors in the 200K loc that we merged last week. And there's nobody there who understands how to fix them.
Just use AI to fix. Or maybe just don't merge the code and let it sit in a downstream library?
And wait until there's evidence that the code is stable, high-quality, with well-designed APIs. And then decide that it might be worthy including in a more foundational library.
> In particular, the process of converting a medium-sized Lean document (containing a few thousand lines of code, with some proofs AI-generated) into a nicely golfed and structured Mathlib submission has been an interesting experience. AI agents can be used to perform local golfs that can shave the size of the code somewhat, but global refactoring decisions, such as noticing that a certain argument appears multiple times across the document and can be abstracted into a standalone lemma that can may have additional utility beyond the file, is still largely beyond the reach of current AI tools. (I find that I can explain such a refactor to an AI agent, who can then execute it, but they struggle to spontaneously discover such refactors on their own.)
It's a mixed bag. Some people don't use AI, for the usual reasons that people don't use AI. Others have used AI to write Lean programs. For example the 200K loc example mentioned in OP. But generally speaking, the quality isn't good enough yet. And when you've looked at 13 AI-generated examples and found them lacking on various axes compared to Mathlib, then you'll take a break before being motivated to look at example 14.
I don't quite understand the objection. The 200,000 line Lean proof can be used in other proofs without needing to understand it. This is the biggest advantage of formally verified proofs.
Loved Bessis's book and his writings because of his human-centric view which is missed by the ai maximalists. I just cant wait for the upcoming AI bust to skim the froth in the debate.
Interactive theorem provers are what verifies (or proves) the proof here. This means you don’t even have to look at the actual proof to check its correctness. You just have to make sure the theorem definition is what you wanted (not to say this is trivial) and that no nonsense axioms were defined etc. Now for verifying the theorem prover itself, this is kind of a chicken and the egg problem. We know the mathematical foundations are solid. But to check the implementation of said foundations would require… another theorem prover. In practice, most theorem provers try to make their kernel as small as possible so it can be reasoned about by humans. Coming back to the original topic, mathematics isn’t all proofs though. Someone has to come up with new theories and models. I guess this is what mathematicians would be doing in the future if AI becomes better at proving things than humans. But I could see it narrowing the field, just like it’s doing in others.
I think 6) is a very good point. The simple reaction to it is, well, I just define a small verification kernel that I trust, and the rest is just scaffolding that does not need to be trusted in order to have full confidence in the verification. Of course, that is not really true, because how do you know that the data arrives properly at the kernel, and is properly read off the kernel? In practice though, the small verification kernel idea works very well.
I don't think that this is the final stage of how these systems are designed, though. I think we need to model the full system inside the system, and verify it this way. We can trust this verification because we are verifying it with a system with a small verification kernel, but afterwards, we can replace the small kernel with the modules we have proven to be correct now.
That’s an interesting way to think about it. While tests don’t satisfy mathematicians‘ standards for rigor one could instead look at interactive proofs from complexity theory. These are of interest if a problem doesn’t allow for short proofs, i.e. when the problem is not in NP [1].
In your scenario an adapted AI-assisted theorem prover would be the prover, and a mathematician the verifier.
I guess the question becomes about if we are meaningfully capable of doing this kind of verification any better than just adding another layer of automation
What _should_ happen is that we create abstraction. We create a proof for a concept or algorithms. Then we can move on. We need to develop artifacts that can carry proof. We cannot keep writing piles and piles of texts and Python code to build infrastructure. We would be rewriting the same thing 1000x.
What does that even mean? Sorry, this is just a nonsensical term.
The issue is not that the proofs could be wrong. It's that humans don't understand them even if they are verifibly correct.
In contrast, with software you don't know if it's correct. That's what you have a test for. Even if you understand it, there could be a bug. And tests could have bugs too, so you can have tests for tests. But proofs that are verified are correct. That's it.
Imagine you have a program that is formally proved to be correct. You don't need a test for it. However you might not understand it. Having a test or not does not change that.
TL;DR: Correctness and understandability are (mostly) independent properties.
One thing I might add is that not all programs can be proved to be correct for the simple reason that not all purposes of a program can be mathematically specified. For example, for (even "closed world" domain programs like) a chess engine, the one thing that matters (in the absence of a complete solution of a game like there exists in checkers) is "can beat world champions", which can only be tested empirically. Or sometimes, e.g. business logic software, the purpose can be mathematically specified but not in a simpler way than the code itself.
It cannot be mathematically specified because there’s no way to formalize the statements.
Also code is a mathematical object, but running code (the process) is not because it’s just electricity in metals and semiconductors. We modify the voltage somewhere and that leads to a transformation (light in displays, sound in speakers,..) somewhere else. We do have models for all of this but they are approximations, not the real representation.
Beyond the technical aspects, this new technology also leads to fundamental social changes.
In particular, until now, mathematics were one of the rare sciences were great scientists could emerge from any country with a good education system.
With the raise of strong AI tools, only scientists in rich countries with access to those tools might be able to advance faster on the most difficult problems like the millennium problems.
Mathematics might become like experimental sciences were you need to build expensive machines to make further progress, such as nuclear fusion.
Actually, even now, the strongest models in mathematics are only available to a few engineers and a few mathematicians selected by Openai and Google.
> mathematics were one of the rare sciences were great scientists could emerge from any country with a good education system.
But would they be able to realize their potential without further (mostly monetary and institutional) support from their country? In that way, things now are not so different from before. However, the story will probably change in a few years' time.
Computational intelligence obviates the role of the human in making capital reproduce and multiply. Mathematics as its own field of study might become pointless other than as a hobby - an endeavour reserved for those entitled to the machine's output. What the machine develops in the place of mathematics may not be recognizable enough to even share the same label - that is, if it even uses mathematical principles rather than just finding correlations that are 'good enough' for its use.
The use of computers in mathematics has been somewhat controversial from the very start.
There are of course all the computer-assisted proofs (see 4 color theorem), as well as the partially-assisted ones (see Viazovska et al on packing problems in dimensions 8, 24). But even finding a solution numerically, then rigorously verifying its properties can leave a lingering sense of incompleteness, of a gap in understanding. I like this one quote by (allegedly) Wigner that illustrates it well:
"It is nice to know that the computer understands the problem, but I would like to understand the problem, too."
Just that the set of proofs a human can interpret and the set of statements a human can understand overlap; conversely, you require that the statements/theorems humans can understand be a larger class than the proofs they can understand.
To me, it’s not obvious which of those should be true:
- can we only understand theorems for which we comprehend their proof?
- or can we understand theorems despite not comprehending the proof structure?
Within the mathematics community, opinions differ. But you’re elevating your perspective on that question into a law, without any evidence.
I don't know what your distinction between "understand" and "comprehend" but my point was not about these words, but about being "useful" and being "understandable".
I'm saying there's no relationship between a mathematical statement being useful and it being understandable.
If it is true that "understanding is a prerequisite for usefulness" (where "understanding" means that a statement can be proven in a way that is intelligible to humans) was a property of mathematical expressions, then this fact would certainly be useful (we could exclude any statements that no human understand from the world of useful mathematical expression). But, by that definition, we would need to understand that statement, so you would need to be able to prove that "understanding is a prerequisite for usefulness" in a human intelligible way.
Now what I just wrote is in itself not a proof that we can't know, but proving the above statement would involve expressing the claim in a mathematically verifiable way that was also understandable by humans, which does imply something remarkable about human cognition (something that would be intelligible no less!)
Well, there is something remarkable in human intelligence. We have yet to find anything like it in the known universe. As for the rest, the wise mathematicians are leaning, sorry, hard to lean. TT and co.
Mathematics has always been an experimental science to some extent. While Newton, Euler and Gauss would spend a lot time calculating numerical approximations by hand, modern mathematicians have been doing the same using computers and software. And once an a clear picture emerge about what’s going, you can start to formalize that and attempt to prove and communicate your results in the standard definition, proposition, lemma,
theorem scheme. (Btw there is even a journal called Experimental Mathematics devoted to this approach).
I don’t see that LLMs will fundamentally change this,
but rather accelerate the speed
of mathematical research.
Some computer generated proofs might of course be hard to understand, but at least their existence gives another data point work with.
Doing Mathematics is more than proving something, that’s just the end of a long road spent pondering at one’s desk about how things could work out.
> In some sense I always considered programming to be more trustworthy than maths arguments without the certainty of a solver proof.
But programming is a subset of mathematics. They are both formal languages. I suspect the trustworthiness is more in your comfort level than the ability to verify
Tests only work for a limited set of programming verification. In many cases you don’t actually know what the output for any given input should be, so there’s no way of verifying the AI-generated code. You just kind of have to trust it. The only exception I can think of is robotics and quantitative trading. Which have already been extensively utilizing AI.
I disagree, software engineering is a mature discipline now, and at this point we have so many testing frameworks (unit testing, syntax testing, regression testing, fuzzing, testing end to end, live, with a subset of known good and incorrect inputs, chaos monkeys, etc, etc, that to say "there's no way of verifying the AI-generated code" is frankly incorrect.
Or, if you insist, defend the "there's no way of verifying the code, at all", and not only AI-generated.
(if it helps I work in the company where before the code even starts being written, several extensive tests for it must be ready first. It's hard to even commit a broken code, and later in the pipeline it's very easy to catch the subtly broken or incorrect code)
So… more peer review backlog. That sounds fun. Oh, you want someone to review your paper, Mr phd in mathematics with 20 years of experience? Get in line behind chatGPT.
Well, if you can formalise the problem statement (this is the hard part) sufficiently well that the computer can produce a proof, you can be very sure the proof is sound.
A fundamental property of any formal proof is that it can be checked by a fairly stupid machine, automatically, because every step is a simple mechanical operation that names one of a handful of axioms and refers to a handful of earlier steps, the truth of which has already been established. So while coming up with a proof may require genius-level thinking, checking an existing fully fleshed out proof is simple -- just potentially very tedious because of the sheer number of steps.
That said, a typical human-written proof omits many steps considered "obvious" to a trained mathematician. Converting this to a formal proof involves interpreting what the original author "must have meant", which requires a lot of expertise and can go wrong -- or it may reveal that there is some inconsistency in the original claim itself.
> This is one of the great things about formalization: it would have avoided this entire debacle.
It's also a MASSIVE amount of SUPER TEDIOUS work. And it's the kind of work that folks who think up advanced math proofs tend to loathe. It's along the lines of programming by toggling in the code from front panel switches.
So what current mathematics does is judge a proof by whether or not the application of the proof somehow coincides with the result from some other adjacent branch of mathematical knowledge. So, a "novel" proof is expected to either help prove something in a slightly different branch of mathematics or simplify some already existing proof.
And that is, as I understand it, the crux of the matter with the Mochizuki proof of the "abc Conjecture." Solving the "abc Conjecture" should provide tools for solving other similar problems just like Wiles' proof of Fermat's Last Theorem provided an entire class of tools for dealing with modularity and elliptic curves. And yet Mochizuki seems to unable to do or demonstrate any of that.
So, Mochizuki's work is like someone dropping a gigantic and impenetrable proof of exactly and only Fermat's Last Theorem that doesn't apply to anything else. Sure, it would be an interesting (and true!) thing, but without the ability to use it further, it's a curiosity rather than a pillar.
How does this involve computer checking of a formal proof?
Last time I checked, it was a disagreement over whether an informal proof is sound, which is exactly the problem with informal proofs.
ETA: There might be a misunderstanding about what "formal proof" means. Even a very detailed, precise English-language description of a proof is generally not a formal proof. The bar is essentially: "It could be checked by a machine that follows simple rules." If different interpretations of a "proof" are possible, the "proof" is by definition informal. Informal proofs are valuable because they are strong evidence that there's a corresponding formal proof "underneath" that would establish the theorem's truth, and because they are (usually) much easier to understand.
There are also ways to cheat like that in Lean, but they are all easily identifiable. So when people talk about formalization, they mean formalization without such cheats.
By necessity perhaps, but they are desperate to find out why they work so doctors can kill less people. Almost half a million people die each year from medical errors in the USA.
Now they have a valid excuse that the human body is incredibly complex and not yet understood. We don’t have that excuse, because we build all of our software from the ground up. If we don’t understand it, that is our fault.
The big question is: why is government (or society) funding mathematics? What justifies this use of resources?
If it's utility, then why doesn't AI deliver utility? Isn't this an argument for AI?
One can expand this to various kinds of utility. Is knowing mathematical statements are true useful? In the ability to produce proofs useful? Does being able to prove carry over to other kinds of reasoning? Is producing a cadre of math-capable people useful in wartime? Is national prestige of value? There's a pro-AI reading of all these.
It it's beauty not utility, then why isn't math funding over with art funding, along with hills covered with acres of fabric, motorized embalmed animal carcasses, and religious symbols in bottles of urine?
Reading the article from the perspective of a non-mathematician the line that rang true for me was "I realized they derived joy, satisfaction, and meaning from the long journey toward understanding." My eldest child loves math and her whole life has been chaos except when she is solving math problems. I see that human element in science and I know deep down we need that.
I've been in the software industry for 30 years and I understand that sentiment perfectly on a personal level. However, also having used ChatGPT for the first time this year to help solve a technical problem, I was surprised to learn that feeling didn't go away. In fact, it was because I leaned on my experience and technical understanding to get to the point where I realized I needed help that I decided to try a new tool. I didn't feel any shame or disappointment in myself, rather I felt excited to learn something new that came out of the solution. It sent me on a new path of learning.
Now, that was research and not implementation. While plenty of code options were presented by ChatGPT, I analyzed the solution and educated myself from it. My final fix looked very different from the proposal because I do things a certain way, based on experience and learning from my many mistakes. In this scenario I was the secondary verification. My peers the tertiary. AI made the proposal, humans did the verification and could not have done so without cumulative knowledge and experience.
I have to assume that all fields utilizing AI will remain as they have always been, human education and experience will come first no matter what the tools available, because we are the ones impacted by the data produced by AI. As many math-oriented commenters here have already noted, human verification is a necessity and to do that, you must understand the discipline within which the data is being produced.
Personally, the idea of reaching solutions in math and computing (for example) exponentially beyond human capacity is exciting; I want certain answers before I die! But it still must be human-verified and the solutions should be for humans, not for machines, and not for "time to market". Repositories full of unvetted AI-generated code is bad enough, but once you start engineering structures, spacecraft and medicine strictly with AI, well...
I'm not a mathematician but AFAICT AI in mathematics only helped for paradigmatic works, either verification of well explained, and even researched, conjectures. Ideas that have been labelled as potentially interesting, are verifiable, and in a well known area. It is already quite useful... but it is NOT about paradigm change. It's not about revolutionary ideas. It's about being able to slightly more efficiently do more of of the same.
It is precisely NOT about big questions but rather potentially covering and thus cornering away all the "small" questions.
PS: already argued for something similar in recent Ergos related work, see comment history.
In my field that sometimes involves some applied mathematics, it definitely help with busy work, giving raw ideas or writing some chain in latex so I don’t have to spell everything in details myself in the doc
Isn’t that somewhat similar to everywhere else AI is being employed? More of the same — just faster? We aren’t getting novel software architecture out of AI, those things are still important, but it does help us rule out bad things (bugs, security vulnerabilities, etc) and help us focus on the important things. In that vein, AI could help mathematicians by ruling things out faster.
To assume that the universe is disorderly presumes knowing some esoteric objective truth about how the universe works. Since the field of natural sciences does not provide that truth (by design), and practices that do aim to provide that truth (metaphysics, religion, etc.) seem to have different versions of it, I would say that there’s a fair chance that the universe is not that disorderly and we just lack better models for describing it.
The way I look at it, the absurd complexity that rule systems can get into is a check and punishment for the arrogant and the tyrannical. 'Look I designed these rules myself just follow them and everything will be fine.' Ah but what about this edge case. 'Hm, I guess we can add a few more rules.' Bam, next thing you know it's a Byzantine bureaucratic nightmare any your empire refuses to grow anymore.
Meanwhile, if you put honorable people in charge and trust them to do the right thing, they can figure things out ad hoc.
Unfortunately, mathematics suffers as a bystander. But the good news is that the muggles will always want to keep the eggheads around because they would rather die than think about the negative consequences of having their dreams come true exactly like they wanted.
Human mathematicians could become “priests to oracles.”
Priests were interpreting the oracles (at least at a place like Delphi) according to the context of the people asking the questions aka participating in politics of that ancient times.
Subjectivity was a feature and I’m not sure that fits to mathematics though.
I wonder if mathematics as a science field moves more into engineering or if a different branch will emerge that is closer to that because to the point of the article, science is about understanding not just results.
However given the history of mathematics that's not really out of order. There's a lot of mysticism intertwined with early mathematics and it's only relatively recently that mathematics has taken the beads out of her hair, put on a business suit, and gotten to work.
I mean even Galileo's, the patron saint of skepticism, job was observing the stars so that he could write horoscopes.
And we see the attitude repeated even today. I work in a technical field and whenever I mention that I think type theory is kind of cool the response more often then not is for them to give me the ol' "Thank goodness for people like you, I'm just not a math person."
You don't have to be as good as the model you are overseeing, but it sure helps, otherwise you will only be able to evaluate partial claims, missing mistakes, and potentially the big picture.
The article poses if AI will be a tool, a collaborator or an oracle. Why not all 3?
If mathematics is human understanding of logical consequences, understanding is the priority. But if AI proves something we can't understand but can utilize, that is a different sort of useful.
We are getting awfully close to "the answer of the universe is 42" and having it not be a joke...
I don’t know about “close”,
but there are certainly results in math that are considered deep because they require the use of a “Hard Theorem” at some point. That kind of building on top of something Very Difficult is still possible without understanding the “Very Difficult” part. I’d say a lot of not-amazing math is built by believing the platform works but not being able to built it yourself.
I couldn’t build an internal combustion engine or even a plastic box, so maybe there’s nothing wrong with this approach.
I am a mathematician, and I was never the kind to like to struggle by working on problems, but I developed a lot of intuition by 1) thinking deeply about definitions and proofs and why are they this way and not another 2) reading a lot of blogs and expository papers by great mathematicians, even (the more philosophically minded) mathoverflow q&a's (so I absorbed their way of thinking unconsciously). For example, I tell my students to read all 300 of John Baez's This Week's Finds posts [1] and they will deeply understand more math than 99% of their peers.
This is not the "standard" advice that usually gets peddled but for me it "worked".
As an aspiring amateur of recreational mathematics, it gives me joy to find this recommendation and start reading the series. It reminds me of Martin Gardner's "Mathematical Games" column, and its successor, Douglas Hofstadter's "Metamagical Themas".
For a while now I've been chewing on Baez's paper, "Physics, Topology, Logic and Computation: A Rosetta Stone". I'm intrigued by it. At first baffled, then continued studying related topics, coming back to re-read and appreciate what it's saying. I'm exploring what it means from the computation angle, in the context of Haskell Curry, combinators, lambda calculus, cellular automata, rewriting systems. And more far-fetched thoughts, like Stephen Wolfram, Max Tegmark, or how it may relate to biology such as Michael Levin's research.
On the role of AI in mathematics, it brings up more questions about the use of computers in experimental/exploratory mathematics. Part of the concensus emerging, it seems, is the formalization of programs as proofs, in particular the Lean language ecosystem. It's like the field of software engineering is rediscovering its roots.
The deeper I go the more I want to participate, write and talk about the fun things I'm discovering. Maybe one day write a paper of my own, if I can manage to do it in a respectable rigorous manner. For that I know I still have a lot of basics to learn, cultural etiquette too, and probably lean on something like Lean to be able to demonstrate that the ideas are sound.
>more recently, a new general-purpose AI system from OpenAI disproved an important conjecture in combinatorial geometry. This result would have been worthy of publication in a major mathematics journal if humans had been the authors
The quality of the mathematics is a function of who has authored it?
Argument started when 4 colour map proof was machine assisted. So 1976. (Which the fine article says: I remember my dad being in this argument at the time from the comp. sci side, albiet as a mathematicianby training)
There's yet another major issue of the centralization of power and knowledge:
> Some worry about the accessibility of AI tools. Traditionally, mathematicians have required little more than intuition, training, and a pen and paper to advance their field. If this slow, deliberative process is no longer valued by society, and particularly by research funders, then mathematics could become an elitist activity, only practiced by select organizations that can afford to work with proprietary AI models.
This can be true of anything LLMs do better than existing options. Because the best LLMs require enormous resources to develop, access to them can be very limited. Right now they are priced for market share. What happens when your small law firm attorney, or self-representation, goes up against a large firm with LLM expertise? Can the kid from the poor high school compete with the kid from the rich school with premium LLM access, in mathematics for example?
the poor kid always had disadvantages, had to help the family, while the rich kid could focus on the math, and maybe get into a good math place with family help
What's always been true is change. We decide our future. Sickness has always been there, but that doesn't mean we can't have a enormous influence on our fates.
So much that has been true for millenia we've changed in the last century or three.
Much can be resolved when it is understood math is discovered not created. AI is a tool. if it makes discovery or proof easier that is still mathematics. A proof stands on its own logic regardless how it is derived. The root concern is how ai may provide uplift for mathematical discovery outside of socially expected channels.
I only did undergraduate level in Maths, and to me there is a key aesthetic element which makes it created. The choice of axioms to use, the choice of with theorems are interesting.
Yes the "truth" (doesn't exist, see Gödels theorem) is discovered in a vast, wild landscape that Mathematicians explore.
But which areas are worth exploring is a critical question. Partly driven by application, partly aesthetic. It's a quest for simple things that are a bit surprising, or that were hard to make the statements so simple.
It’s a well known problem in higher mathematics that even if you’ve solved a problem, often the proofs are incredibly long and complex and require an extensive amount of time spent by peers to review it.
It would be great if someone could explain to me how AI improves this situation. Even if AI thinks it’s solved a problem, unless the proof is incredibly efficient and well explained, it will be difficult to verify the correctness. One hallucination in 300 steps of logic is enough to destroy the entire proof.
In 2012 Mochizuki claimed to have proved the abc conjecture by developing a new branch of mathematics. He was a respected mathematician, but the theories he had developed were so complex no one could determine if he was correct. It took six years until two number theorists dissected the proof and found a fatal flaw in it.
Mochizuki and a group of mathematicians still claim that Stix and Scholze didn't actually identify a flaw and his proof was published in a journal (where Mochizuki is the chief editor and the reviewers were people from Mochizuki's instituion). I think most of the math community don't believe his proof although Mochizuki and some others claim it's valid.
If you have the LLM generate Lean code, and it compiles, then the proof is correct and you don't need to bother checking its working. (You still need to check that it is proving the theorem you asked it to prove).
If there's concern, I understand Lean can dump out a proof object, and this object can be checked for validity by a separate, smaller program. One could even write that smaller program multiple times.
It would be interesting to see how Lean is being tested. Pummeling it with inputs from AI proof search engines sounds like a good way to expose hidden bugs, just as pummeling a compiler with randomly generated programs is a good way to expose previously unknown compiler bugs.
But let's assume Lean is perfect:
- you have to trust your file system
- you have to trust your OS
- are you sure your monitor displays "proof is valid" correctly
- maybe there's a bug in your CPU hardware?
- or background radiation flipped a bit in your RAM
- do you trust your brain? Maybe you went to bed last night firmly believing "that proof is wrong" and you woke up this morning with a different belief that "the proof is valid".
> It would be great if someone could explain to me how AI improves this situation.
It's main utility is in the search step, not the verification step. The search is the bulk of the work and creativity. Separately, as the sibling commenter pointed out, it will likely get better at the verification step as well, with integrations of tools like Lean.
> One hallucination in 300 steps of logic is enough to destroy the entire proof.
The situation with human mathematicians is not much different. Eg, Wiles original proof of Fermat's Last Theorem contained errors found by reviewers, which he later repaired.
>The situation with human mathematicians is not much different. Eg, Wiles original proof of Fermat's Last Theorem contained errors found by reviewers, which he later repaired.
In fact, it was Wiles himself who realized there was an error.
Yes, at the prodding of the reviewer (from wikipedia):
> After the announcement, Nick Katz was appointed as one of the referees to review Wiles's manuscript. In the course of his review, he asked Wiles a series of clarifying questions that led Wiles to recognise that the proof contained a gap.
It seems likely Katz played a crucial role here.
Wiles is just a prominent example. In practice, other humans often play this "verification role" for mathematicians, whether via a formal review process or informal discussion.
I would imagine that in the future AI will be doing proofs in Lean or whatever the successor to it, which gives you a pretty good confidence it’s correct.
> even if you’ve solved a problem, often the proofs are incredibly long and complex and require an extensive amount of time spent by peers to review it.
I hope mathematicians have the self-respect of not establishing rules such as "if a proof is longer than X, break it up in smaller ones, otherwise it'll be outright rejected".
It's amazing how much attention this issue has gotten. What is lost in the hype is no AI can tell you if a proof is correct. An AI can produce a convincing looking proof, but it can have a subtle but critical error or make an assumption that is unfounded. Thus, it ultimately comes down to humans. A mathematician has to craft the prompt, and mathematician to interpret/check the results. Also, these programs are very expensive and propitiatory. They are not like the commercial AI that regular people use. It takes considerable prompting and trial an error to solve even Olympiad/Putnam problems, and tons of work by humans pouring over the results to see if it's correct. For every Erdos problem that captures the headlines, there are many where it failed or untold hours of prompting and token burn to get that result, and manhours verify it.
I don't think you understand the workflow. Terrence Tao has done a lot of work using them in conjunction with LEAN.
You aren't having the AI check the proof, you interactively work on the same LEAN proof, handing off between the AI assistant and having LEAN check it and provide feedback for both of you when there's a mistake.
AI can't yet come up with any new ideas to make the inductive leap to solve a math problem. New ideas are what get the accolades and using an old idea just means the original author missed something. We are still at the author missed something stage that AI is doing today.
It can definitely be a good research assistant though
I had assumed that the recent OpenAI solution to an Erdos problem represented original mathematical thought.
I went looking for more details, and found this Scientific American article which provided some nuance I had previously missed, namely that the mathematicians involved don't think genAI created any really new mathematics - just applied what existed in an intelligent, elegant way:
All well and good, but we should also admit that under this criterion 95% of published math papers don't contain new ideas but are just filling in stuff that previous authors missed.
News at 5: Team of corporate mathematicians working for OpenAI rewrites the foundation of mathematics in a billion lines of vibe-coded theorems and proofs that are so complex it's impossible for humans to review or understand. Next generation of children don't need to learn math anymore, they just get a subscription to OpenMath™.
> A clear explanation can be found in Alex Kontorovich’s account of his own learning curve with formalized mathematics. In a nutshell: Mathlib, the dominant Lean library, is a human-curated formalization of an ever-growing fraction of existing human mathematics. It exposes clean APIs and abstractions, without which no autoformalization could take place. By contrast, Math Inc’s autoformalized proof of Viazovska’s results exposes no intelligible interface. Who in their right mind would merge a 200,000-line unaudited vibe-coded blob into the master branch of global human science?
https://davidbessis.substack.com/p/the-fall-of-the-theorem-e...
If there are other lemmas, etc buried inside that 200k blob that can be factored out and proved and used themselves, so much the better. But denying a machine-valid proof just because it's incomprehensible with what a human being considers a reasonable effort made to unpack it just seems odd to me. I see no reason not to accept the vibe coded blob if Lean says it's kosher except anthropocentrism.
(i) accepting that a piece of code is a valid Lean proof (ii) merging a valid Lean proof into Mathlib.
Valid Lean proofs need maintenance. Mathlib is a living blob of code. People care about how fast the proofs typecheck. Many other properties of code play a role.
Not everything that is true is worthy of immediately including in the Encyclopaedia Brittanica.
As far as I can tell from colleagues in other domains, it’s the same there. One paper will mention something off-hand and that’ll cause someone else to have a spark of insight, which turns into it’s own valuable research
Mathematics has to be also understood from the perspective of theory building, not just problem solving.
I couldn't disagree more.
A lot of mathematical "problems" are almost entirely pointless. Nobody genuinely cares about the moving sofa problem, or about square packing, or about the minimum number of colors needed to draw a map - it is the math that is developed during the solving process that is valuable!
An answer to a question like "what is the exact area of a unit circle" is a mere curiosity. Calculating a good-enough approximation is trivial, after all. But wanting an exact answer leads to developing calculus, which leads to most modern physics. Science was able to make a giant leap forwards due to the techniques developed, while the actual answer itself is mostly useless.
But let’s consider a hypothetical: what if an intuitive understanding of the true “boundaries” of mathematics (if such things exist) is beyond the capabilities of a human mind? If there truly is no way to simplify some proofs down from 200,000 line incomprehensible gibberish to something you could teach to a high schooler or undergraduate or even a PhD. Is the proof still worthless? Sure, at the moment, it might be. Finding such a proof and understanding the implications of it are different skills, the latter of which AI almost certainly does not possess at the moment. But there may come a time where the AI can view the bigger picture and make the leaps you described (say, an eka-Calculus from an eka-unit-circle). These leaps may be as unintelligible to us as the proof in OP is.
I guess the question is: assuming that we can’t make the proof beautiful enough to spark deeper human understanding, do we still want it if it sparks deeper AI understanding?
Personally I would hate to live in a universe where the boundaries of science are beyond intuitive human understanding, but I think it’s almost certainly the case. The idea that the rules are all within our grasp reeks of anthropocentrism to me. I would love for the universe to prove me wrong though. It’d be a pleasant, hilarious coincidence if they do fit within the boundaries of our understanding.
By extension: why should we assume that a human would still understand the problems - or the answers? If all of it is complete gibberish to a human and can never be applied in any way, shape, or form, then what's the point?
The way I view it there are two options here: either you completely ignore it and end up burning a massive amount of electricity on what is essentially a bunch of LLMs jerking each other off, or you blindly follow it and end up with a Machine God who can justify a genocide with a "This is the correct thing to do. Trust me bro, I have irrefutable proof - you won't understand it". There's just no sensible way to do post-human math in an inherently human world.
"Okay, so what determines what is and is not allowed in the collection?"
"Whether the given substatement is true or not, of course."
Like this is obviously silly, right. In your view you could have two guys both trying to prove or disprove that the area of the unit circle is 3, and yet only the guy doing it with some vision of nobility where he's building up to this grand theory of approximations is the one actually doing "real" mathematics. The guy who's doing it just because he thinks it's neat and would like an answer to the problem itself doesn't count, and you suspect he couldn't even exist.
What did the alchemist truly learn from this interaction? Is the answer in any way helpful for him? Will it lead to him making gold, and the implied endless riches it entails? Why should it bring him him to ask follow-up questions with useful answers like "The technology to do this doesn't exist yet and can't be developed within several lifetimes" and "It'll never be economically profitable to turn lead into gold", instead of blindly being fed piecemeal steps in the impossible task of one medieval guy building a LHC?
Writing true statements is absolutely trivial, anyone can do that. There's an endless amount of true statement which have absolutely no value to humanity whatsoever. Their proof is meaningless without something to give it context. "Substatement 1304 is true" has significantly less value than "here's something I call 'group theory', and it might also be helpful solving open issues in fields X, Y, and Z".
Mathematics is about widening our collective understanding. There will of course be some people out there who'll hear "The answer to life, the universe, and everything is forty-two", think "Neato!", and go on with their day without giving it any further thought - but I'd have a hard time calling them mathematicians.
That doesn't mean it is completely useless, of course. A "prove that no efficient algorithm exists to break this new encryption scheme" would be very helpful indeed. Until you realize it neglected to take quantum computing into account, of course.
[0]: https://home.cern/alice-detects-conversion-lead-gold-lhc/
Now if you ask "does the area of unit circle equal 4?", I don't really know, but we can go back to the oracle and ask again (we haven't learned the general pattern).
Also, I'm not sure that assuming this 'area of circle' question was cutting edge math, that the oracle wouldn't say 'yes, to a certain level of tolerance'. Can't count how many times I've seen agent decide a test needs to be loosened or deleted because its an "edge case" or "blocking". If you don't understand the proof you might get back 'yes' for some versions of 'is the area of unit circle 3' (depending on complexity of that ask).
So yeah, generated machine-valid proof can be denied if it's incomprehensible, same as human machine-valid proof can be denied for same reasons.
just as we don't re-prove Fermat's little theorem every time I use it in a proof, because well, it's a theorem.
Exactly! There's a shared foundation, and everyone builds upon it. A mathematical paper is a whole bunch of Lego blocks being added to that foundation, and combining them in a hopefully-useful new interface.
But if the entire paper is just one giant black box, you only get to use the final interface: you lose the ability to meaningfully repurpose the individual Lego blocks to build a different interface. You end up having to reinvent the same Lego blocks over and over again, just with a slightly different color each time.
Don't want to re-prove each and every Lego brick? Then you shouldn't accept giant black box proofs.
Replacing thought and curation with repeated automation is tech debt, pushed down to fundamental knowledge and understanding.
The laws of mathematics exist and their truths hold before they are proven by humans or our machines, so in a very real sense the entire point of proving anything in the first place is anthropocentrism.
That, plus cleaning it up may reveal it contains proofs of other things we also want to know. Imagine if this happened to also contain as a sub-part a proof of all the open Millennium Prize Problems? We don't know until we investigate. (If it was a specific list of things to check from rather than expanding humanity's library, we could just ask an AI to do it for us… but as The Wachowski sisters wrote in their most famous script: "I say your civilization because as soon as we started thinking for you, it really became our civilization").
This future of "the AI built the compiler and it's totally incomprehensible spaghetti, but don't worry, it verified the compiler works using this AI-generated proof assistant whose codebase is also pure spaghetti" terrifies me.
Please show me a proof of `False` in Lean.
LLM agents will a) discover dumb counterintuitive stuff like this and b) exploit it to satisfy the kernel with c) the questionable lines being buried behind millions of lines of math slop. Humans have to check this stuff.
The idea behind the challenge to provide a few lines of human-written/checked scaffolding that defines an "API surface". The API surface is resistant to junk/slop filling out the sorry's. This is supposed to reduce the number of lines that humans have to read to only a handful, while the full valid AI-generated proof is tens of thousands of lines. The challenge was just solved a couple weeks, here's the discussion about it: https://leanprover.zulipchat.com/#narrow/channel/583336-Auto...
Why not just fork the original master branch of human science to an ai-enhanced one and see where that brings us?
The novel approaches to solving challenging proofs is very useful.
Here is a other one: hello_world.c versus hello_world.exe (apologies for windows extensions, just for illustration).
One is made by a human for human consumption and extension (though legible by a machine). The source code.
One is made by a machine for a machine. Unreadable by a human. The "binary", though that's a terrible misnomer. (Sure you can disassemble but any nontrivial program is too much to cope with as a whole).
Source vs binary. Both are useful but only one is useful for human consumption.
Of course there's no reason AI couldn't do that too.
Anyone who understands type theory and how theorem provers work? It's sort of akin to saying "how do you know that a massive C++ program that compiles to machine code compiled to the correct machine code that will actually run and it's just not a random string of bits!?!?!", you know because the compilation would have failed otherwise(this is different than saying the program behaves correctly, but that's precisely the difference between formal proofs and compiled programs).
The entire argument both you and Bessis are implying is that mathematics must be human intelligible. But there's absolutely no reason to assume that every mathematical statement must have a human intelligible representation. There is also not reason to assume that if we restrict ourselves to the subset of mathematical statements that are human intelligible that this is of any use.
Just because people who don't want this to be true, and I can understand the motivation, doesn't mean that it isn't still the case.
Would we still be able to use computers? Of course! They don't need the source code to run.
Would we nevertheless be in big trouble? Oh definitely. We'd need to write all software again, from scratch. Some critical parts we could reverse engineer. Maybe even derive some structures that translate back into source code, but only because a human wrote that source code in the first place.
Hopefully the point is clear: A proof, even if it is correct, that is totally obscure and unintelligable by humans is not very useful for mathematics. It's a black box that doesn't further understanding of the structures and approaches to think about them, and that's what math is all about.Just having a big binary blob of a program doesn't help much if you want to add a feature.
That's also why biology is so hard. There is no source code. It's just millions and millions of years of evolution and things have evolved in weird ways that don't really make it easy to understand them, even though they clearly work.
It remains all of those things, sitting alone in its own repository of 200kLOC; what benefit comes from merging it into mathlib?
> There is also not reason to assume that if we restrict ourselves to the subset of mathematical statements that are human intelligible that this is of any use.
This is obviously silly:
Things that aren’t human intelligible aren’t human usable, so the restriction is necessary to have a collection of things humans can utilize.
This is objectively false, people use things every single day they don't understand. We still have plenty of things about the world we don't understand but still find useful.
You are saying anything we know to be the case, but cannot understand why cannot be used? Can we just stop sleeping because we haven't reasoned why sleep is necessary even though we know it is necessary? I mean we still don't really understand gravity (we know how but not why)
In the 40k universe, modern humanity is below not just their peak of technological prowess, but even their peak of pre-AI prowess. Because they landed in a valley of superstition involving machines they do not understand — and have no way to rebuild towards their peak, without yet further losses by moving away from the magical machines.
That has nothing to do with the universe at large — and everything to do with human society and choices. Which we do have control over.
The comment I replied to equivocated between absolute limits to comprehensibility to humans and sociologically constructed limits, implying that thinking there may be the former makes someone ‘fine with’ the latter. That’s nonsense.
They approximately use them with varying degrees of success, but also mistakes, broken inferences, etc.
My exact point is that your view reduces our ability to do mathematics to that broken, flawed usage and thereby undermines its utility for logical precision: mathematics is only useful because we cleanly understand it.
When you try to use mathematics without understanding, you cause disasters: stock market crashes from mispricing options, Amazon’s 2018 hiring freeze from misallocating $1B, etc.
Note: neither of your examples (sleep, gravity) are things that people intentionally use. They just happen to people.
I think it’s very telling you couldn’t think of an example.
So what will you do if the doctor prescribes you an LLM-vibecoded drug that nobody understands how it works, yet it cures some deadly affliction with close to 100% efficacy?
What if, say, these incomprehensible math results lead to a revolution in quantum physics which unlocks chip topologies that are orders of magnitude faster than human comprehensible designs?
Would the high priestess of human reason pass her divining rod over such chips or life-saving drugs and reject it as the work of the AI devil?
Again, my exact point is that mathematics loses its utility when you reduce it to that inaccurate usage. You no longer can have any faith in the conclusions — just like sometimes psychiatrists kill their patients with an SSRI prescription because they don’t understand the drugs.
> Would the high priestess of human reason pass her divining rod over such chips or life-saving drugs and reject it as the work of the AI devil?
My point is you can’t know if you’re turning it over for life saving drugs or poison, if you don’t understand what you’re getting.
If you give me an inference chip that runs 200x faster, yes, it could be backdoored to take control of my dishwasher and kill me in my sleep - but I can't deny it runs 200x faster an account of nobody being able to explain why. The same for the mistery cancer drug that cured everyone who took it up to now, but could, without doubt, kill the next patient.
The telco that manages loads and allocates networks knows how voice compression works.
The farmers and supermarkets know how the supply chain works.
None of your questions show why mathematics should include blobs of incomprehensible gloop, where no mathematician, no logician, no philosopher, no man on the street can make sense of said gloop, or use it in any way to further human knowledge.
When it's been decomposed down we can discuss this further, but now it's like saying red is red, just because.
But I'll take your expanded statement, to include riding a horse, something even older than the engine. We don't understand fully how a horse works -- biology is still a matter of seeing fragments of the whole -- but people had no problem riding and breeding them before the invention of the car, and before the discovery of genetics.
Meanwhile, understanding the math of a thing -- like stock markets, or nuclear bombs -- does not prevent its use from going badly.
Math is useful and beautiful, and a helpful tool for expanding our understanding of the world, but it is not the whole of understanding, or the sole factor in successful application of science to the world .
Signed, a mathematician.
I think it might be fair to say that a proof cannot be without value if it proves something meaningful to a human, that a human can use somehow? But such proof probably doesn’t belong in a library seemingly explicitly dedicated to human-graspable proofs. Just because it violates the intent.
It’s not like such proofs mustn’t exist at all.
If you go to the fiction side of things, you’ll get a lot of things that humans would like to do, but can’t right now as we’re blocked by physics laws and our own biology. We have harnessed electricity, why can’t we harness gravity?
Reasoning by analogy...
Yet they don't merge this stuff, for many different reasons, including that those 200K lines are free as in puppies.
You can have all the type theory of the world, but the library still needs maintenance.
Change a simp-lemma in a file close to the root. Oooh noes, now there's 987 errors in the 200K loc that we merged last week. And there's nobody there who understands how to fix them.
Just use AI to fix. Or maybe just don't merge the code and let it sit in a downstream library?
And wait until there's evidence that the code is stable, high-quality, with well-designed APIs. And then decide that it might be worthy including in a more foundational library.
So why don;t they use AI to write Lean programs? That should make the AI-proofs more readily human undersrndable.
> In particular, the process of converting a medium-sized Lean document (containing a few thousand lines of code, with some proofs AI-generated) into a nicely golfed and structured Mathlib submission has been an interesting experience. AI agents can be used to perform local golfs that can shave the size of the code somewhat, but global refactoring decisions, such as noticing that a certain argument appears multiple times across the document and can be abstracted into a standalone lemma that can may have additional utility beyond the file, is still largely beyond the reach of current AI tools. (I find that I can explain such a refactor to an AI agent, who can then execute it, but they struggle to spontaneously discover such refactors on their own.)
[1] https://mathstodon.xyz/@tao/116789373239346609
When a codebase gets too large, you eventually can't understand all of it. Even code I wrote myself, I can't fully grasp it.
In those cases, we usually write tests.
But when tests get too big, we end up writing tests for the tests.
Eventually, it feels like we're heading into an era of proofs for proofs.
For me, this problem usually unfolds like this:
1.I can't trust SDKs or Stack Overflow code.
2.So I write tests.
3.But I can't trust the tests either.
4.So I use test coverage, mutation testing, property testing, and fuzzing.
5.If that's still not enough, I add formal verification.
6.And then the problem becomes: can I trust the verifier?
That's how it ends up. Wouldn't human work shift toward verifying the verification systems themselves?
[1] https://en.wikipedia.org/wiki/Interactive_proof_system
What does that even mean? Sorry, this is just a nonsensical term.
The issue is not that the proofs could be wrong. It's that humans don't understand them even if they are verifibly correct.
In contrast, with software you don't know if it's correct. That's what you have a test for. Even if you understand it, there could be a bug. And tests could have bugs too, so you can have tests for tests. But proofs that are verified are correct. That's it.
Imagine you have a program that is formally proved to be correct. You don't need a test for it. However you might not understand it. Having a test or not does not change that.
TL;DR: Correctness and understandability are (mostly) independent properties.
One thing I might add is that not all programs can be proved to be correct for the simple reason that not all purposes of a program can be mathematically specified. For example, for (even "closed world" domain programs like) a chess engine, the one thing that matters (in the absence of a complete solution of a game like there exists in checkers) is "can beat world champions", which can only be tested empirically. Or sometimes, e.g. business logic software, the purpose can be mathematically specified but not in a simpler way than the code itself.
Also code is a mathematical object, but running code (the process) is not because it’s just electricity in metals and semiconductors. We modify the voltage somewhere and that leads to a transformation (light in displays, sound in speakers,..) somewhere else. We do have models for all of this but they are approximations, not the real representation.
In particular, until now, mathematics were one of the rare sciences were great scientists could emerge from any country with a good education system.
With the raise of strong AI tools, only scientists in rich countries with access to those tools might be able to advance faster on the most difficult problems like the millennium problems.
Mathematics might become like experimental sciences were you need to build expensive machines to make further progress, such as nuclear fusion.
Actually, even now, the strongest models in mathematics are only available to a few engineers and a few mathematicians selected by Openai and Google.
But would they be able to realize their potential without further (mostly monetary and institutional) support from their country? In that way, things now are not so different from before. However, the story will probably change in a few years' time.
Computational intelligence obviates the role of the human in making capital reproduce and multiply. Mathematics as its own field of study might become pointless other than as a hobby - an endeavour reserved for those entitled to the machine's output. What the machine develops in the place of mathematics may not be recognizable enough to even share the same label - that is, if it even uses mathematical principles rather than just finding correlations that are 'good enough' for its use.
There are of course all the computer-assisted proofs (see 4 color theorem), as well as the partially-assisted ones (see Viazovska et al on packing problems in dimensions 8, 24). But even finding a solution numerically, then rigorously verifying its properties can leave a lingering sense of incompleteness, of a gap in understanding. I like this one quote by (allegedly) Wigner that illustrates it well:
"It is nice to know that the computer understands the problem, but I would like to understand the problem, too."
But a montage about weight lifting does not a body builder make.
But why should it be the case that this is always possible?
It's entirely reasonable that the set of useful mathematical proofs is a proper superset of human intelligible useful proofs.
In fact, to argue the contrary would imply there is something incredibly remarkable about human cognition.
If you can't explain something in a way that a child could understands it, you don't fully understand it either.
Just that the set of proofs a human can interpret and the set of statements a human can understand overlap; conversely, you require that the statements/theorems humans can understand be a larger class than the proofs they can understand.
To me, it’s not obvious which of those should be true:
- can we only understand theorems for which we comprehend their proof?
- or can we understand theorems despite not comprehending the proof structure?
Within the mathematics community, opinions differ. But you’re elevating your perspective on that question into a law, without any evidence.
I don't know what your distinction between "understand" and "comprehend" but my point was not about these words, but about being "useful" and being "understandable".
I'm saying there's no relationship between a mathematical statement being useful and it being understandable.
If it is true that "understanding is a prerequisite for usefulness" (where "understanding" means that a statement can be proven in a way that is intelligible to humans) was a property of mathematical expressions, then this fact would certainly be useful (we could exclude any statements that no human understand from the world of useful mathematical expression). But, by that definition, we would need to understand that statement, so you would need to be able to prove that "understanding is a prerequisite for usefulness" in a human intelligible way.
Now what I just wrote is in itself not a proof that we can't know, but proving the above statement would involve expressing the claim in a mathematically verifiable way that was also understandable by humans, which does imply something remarkable about human cognition (something that would be intelligible no less!)
I don’t see that LLMs will fundamentally change this, but rather accelerate the speed of mathematical research.
Some computer generated proofs might of course be hard to understand, but at least their existence gives another data point work with.
Doing Mathematics is more than proving something, that’s just the end of a long road spent pondering at one’s desk about how things could work out.
If you don’t understand the problem you can’t be sure that the computer does.
But I also definitely don't understand the problem if I can't get the computer to understand it, with tests.
In some sense I always considered programming to be more trustworthy than maths arguments without the certainty of a solver proof.
With all of these questions in the air, epistemology might be making a comeback.
Type theory can also be an independent synthetic foundation atop which you build mathematics.
I disagree, software engineering is a mature discipline now, and at this point we have so many testing frameworks (unit testing, syntax testing, regression testing, fuzzing, testing end to end, live, with a subset of known good and incorrect inputs, chaos monkeys, etc, etc, that to say "there's no way of verifying the AI-generated code" is frankly incorrect.
Or, if you insist, defend the "there's no way of verifying the code, at all", and not only AI-generated.
(if it helps I work in the company where before the code even starts being written, several extensive tests for it must be ready first. It's hard to even commit a broken code, and later in the pipeline it's very easy to catch the subtly broken or incorrect code)
A fundamental property of any formal proof is that it can be checked by a fairly stupid machine, automatically, because every step is a simple mechanical operation that names one of a handful of axioms and refers to a handful of earlier steps, the truth of which has already been established. So while coming up with a proof may require genius-level thinking, checking an existing fully fleshed out proof is simple -- just potentially very tedious because of the sheer number of steps.
That said, a typical human-written proof omits many steps considered "obvious" to a trained mathematician. Converting this to a formal proof involves interpreting what the original author "must have meant", which requires a lot of expertise and can go wrong -- or it may reveal that there is some inconsistency in the original claim itself.
The controversy around Mochizuki and the "abc Conjecture" proof is a contrary example.
This is one of the great things about formalization: it would have avoided this entire debacle.
It's also a MASSIVE amount of SUPER TEDIOUS work. And it's the kind of work that folks who think up advanced math proofs tend to loathe. It's along the lines of programming by toggling in the code from front panel switches.
So what current mathematics does is judge a proof by whether or not the application of the proof somehow coincides with the result from some other adjacent branch of mathematical knowledge. So, a "novel" proof is expected to either help prove something in a slightly different branch of mathematics or simplify some already existing proof.
And that is, as I understand it, the crux of the matter with the Mochizuki proof of the "abc Conjecture." Solving the "abc Conjecture" should provide tools for solving other similar problems just like Wiles' proof of Fermat's Last Theorem provided an entire class of tools for dealing with modularity and elliptic curves. And yet Mochizuki seems to unable to do or demonstrate any of that.
So, Mochizuki's work is like someone dropping a gigantic and impenetrable proof of exactly and only Fermat's Last Theorem that doesn't apply to anything else. Sure, it would be an interesting (and true!) thing, but without the ability to use it further, it's a curiosity rather than a pillar.
Last time I checked, it was a disagreement over whether an informal proof is sound, which is exactly the problem with informal proofs.
ETA: There might be a misunderstanding about what "formal proof" means. Even a very detailed, precise English-language description of a proof is generally not a formal proof. The bar is essentially: "It could be checked by a machine that follows simple rules." If different interpretations of a "proof" are possible, the "proof" is by definition informal. Informal proofs are valuable because they are strong evidence that there's a corresponding formal proof "underneath" that would establish the theorem's truth, and because they are (usually) much easier to understand.
> “Programming is understanding. If you don't understand what you are doing, you are not programming. You are generating text.”
Perhaps a proof without understanding is just generating numbers.
in medicine they use all kinds of drugs which they don't really understand how they work. anesthetics is a great example
Now they have a valid excuse that the human body is incredibly complex and not yet understood. We don’t have that excuse, because we build all of our software from the ground up. If we don’t understand it, that is our fault.
If it's utility, then why doesn't AI deliver utility? Isn't this an argument for AI?
One can expand this to various kinds of utility. Is knowing mathematical statements are true useful? In the ability to produce proofs useful? Does being able to prove carry over to other kinds of reasoning? Is producing a cadre of math-capable people useful in wartime? Is national prestige of value? There's a pro-AI reading of all these.
It it's beauty not utility, then why isn't math funding over with art funding, along with hills covered with acres of fabric, motorized embalmed animal carcasses, and religious symbols in bottles of urine?
I've been in the software industry for 30 years and I understand that sentiment perfectly on a personal level. However, also having used ChatGPT for the first time this year to help solve a technical problem, I was surprised to learn that feeling didn't go away. In fact, it was because I leaned on my experience and technical understanding to get to the point where I realized I needed help that I decided to try a new tool. I didn't feel any shame or disappointment in myself, rather I felt excited to learn something new that came out of the solution. It sent me on a new path of learning.
Now, that was research and not implementation. While plenty of code options were presented by ChatGPT, I analyzed the solution and educated myself from it. My final fix looked very different from the proposal because I do things a certain way, based on experience and learning from my many mistakes. In this scenario I was the secondary verification. My peers the tertiary. AI made the proposal, humans did the verification and could not have done so without cumulative knowledge and experience.
I have to assume that all fields utilizing AI will remain as they have always been, human education and experience will come first no matter what the tools available, because we are the ones impacted by the data produced by AI. As many math-oriented commenters here have already noted, human verification is a necessity and to do that, you must understand the discipline within which the data is being produced.
Personally, the idea of reaching solutions in math and computing (for example) exponentially beyond human capacity is exciting; I want certain answers before I die! But it still must be human-verified and the solutions should be for humans, not for machines, and not for "time to market". Repositories full of unvetted AI-generated code is bad enough, but once you start engineering structures, spacecraft and medicine strictly with AI, well...
It is precisely NOT about big questions but rather potentially covering and thus cornering away all the "small" questions.
PS: already argued for something similar in recent Ergos related work, see comment history.
Meanwhile, if you put honorable people in charge and trust them to do the right thing, they can figure things out ad hoc.
Unfortunately, mathematics suffers as a bystander. But the good news is that the muggles will always want to keep the eggheads around because they would rather die than think about the negative consequences of having their dreams come true exactly like they wanted.
Priests were interpreting the oracles (at least at a place like Delphi) according to the context of the people asking the questions aka participating in politics of that ancient times.
Subjectivity was a feature and I’m not sure that fits to mathematics though.
I wonder if mathematics as a science field moves more into engineering or if a different branch will emerge that is closer to that because to the point of the article, science is about understanding not just results.
This is a decidedly anti-enlightenment statement.
However given the history of mathematics that's not really out of order. There's a lot of mysticism intertwined with early mathematics and it's only relatively recently that mathematics has taken the beads out of her hair, put on a business suit, and gotten to work.
I mean even Galileo's, the patron saint of skepticism, job was observing the stars so that he could write horoscopes.
And we see the attitude repeated even today. I work in a technical field and whenever I mention that I think type theory is kind of cool the response more often then not is for them to give me the ol' "Thank goodness for people like you, I'm just not a math person."
Would some one with tokens to burn mind checking that statement out and post back. Be sure to use long dashes too.
i.e. You have to be a good engineer to understand the well generated LLM code and a program
If mathematics is human understanding of logical consequences, understanding is the priority. But if AI proves something we can't understand but can utilize, that is a different sort of useful.
We are getting awfully close to "the answer of the universe is 42" and having it not be a joke...
I couldn’t build an internal combustion engine or even a plastic box, so maybe there’s nothing wrong with this approach.
Is direct experience and struggle really the only driver for developing intuition?
This is not the "standard" advice that usually gets peddled but for me it "worked".
[1] https://math.ucr.edu/home/baez/twf.html
As an aspiring amateur of recreational mathematics, it gives me joy to find this recommendation and start reading the series. It reminds me of Martin Gardner's "Mathematical Games" column, and its successor, Douglas Hofstadter's "Metamagical Themas".
For a while now I've been chewing on Baez's paper, "Physics, Topology, Logic and Computation: A Rosetta Stone". I'm intrigued by it. At first baffled, then continued studying related topics, coming back to re-read and appreciate what it's saying. I'm exploring what it means from the computation angle, in the context of Haskell Curry, combinators, lambda calculus, cellular automata, rewriting systems. And more far-fetched thoughts, like Stephen Wolfram, Max Tegmark, or how it may relate to biology such as Michael Levin's research.
On the role of AI in mathematics, it brings up more questions about the use of computers in experimental/exploratory mathematics. Part of the concensus emerging, it seems, is the formalization of programs as proofs, in particular the Lean language ecosystem. It's like the field of software engineering is rediscovering its roots.
The deeper I go the more I want to participate, write and talk about the fun things I'm discovering. Maybe one day write a paper of my own, if I can manage to do it in a respectable rigorous manner. For that I know I still have a lot of basics to learn, cultural etiquette too, and probably lean on something like Lean to be able to demonstrate that the ideas are sound.
The quality of the mathematics is a function of who has authored it?
> Some worry about the accessibility of AI tools. Traditionally, mathematicians have required little more than intuition, training, and a pen and paper to advance their field. If this slow, deliberative process is no longer valued by society, and particularly by research funders, then mathematics could become an elitist activity, only practiced by select organizations that can afford to work with proprietary AI models.
This can be true of anything LLMs do better than existing options. Because the best LLMs require enormous resources to develop, access to them can be very limited. Right now they are priced for market share. What happens when your small law firm attorney, or self-representation, goes up against a large firm with LLM expertise? Can the kid from the poor high school compete with the kid from the rich school with premium LLM access, in mathematics for example?
the poor kid always had disadvantages, had to help the family, while the rich kid could focus on the math, and maybe get into a good math place with family help
So much that has been true for millenia we've changed in the last century or three.
Yes the "truth" (doesn't exist, see Gödels theorem) is discovered in a vast, wild landscape that Mathematicians explore.
But which areas are worth exploring is a critical question. Partly driven by application, partly aesthetic. It's a quest for simple things that are a bit surprising, or that were hard to make the statements so simple.
People get confused by this created/discovered thing. Of course it is discovered. It was there even before you created it. ;-)
It would be great if someone could explain to me how AI improves this situation. Even if AI thinks it’s solved a problem, unless the proof is incredibly efficient and well explained, it will be difficult to verify the correctness. One hallucination in 300 steps of logic is enough to destroy the entire proof.
If you have the LLM generate Lean code, and it compiles, then the proof is correct and you don't need to bother checking its working. (You still need to check that it is proving the theorem you asked it to prove).
It would be interesting to see how Lean is being tested. Pummeling it with inputs from AI proof search engines sounds like a good way to expose hidden bugs, just as pummeling a compiler with randomly generated programs is a good way to expose previously unknown compiler bugs.
But let's assume Lean is perfect: - you have to trust your file system - you have to trust your OS - are you sure your monitor displays "proof is valid" correctly - maybe there's a bug in your CPU hardware? - or background radiation flipped a bit in your RAM - do you trust your brain? Maybe you went to bed last night firmly believing "that proof is wrong" and you woke up this morning with a different belief that "the proof is valid".
It's main utility is in the search step, not the verification step. The search is the bulk of the work and creativity. Separately, as the sibling commenter pointed out, it will likely get better at the verification step as well, with integrations of tools like Lean.
> One hallucination in 300 steps of logic is enough to destroy the entire proof.
The situation with human mathematicians is not much different. Eg, Wiles original proof of Fermat's Last Theorem contained errors found by reviewers, which he later repaired.
In fact, it was Wiles himself who realized there was an error.
> After the announcement, Nick Katz was appointed as one of the referees to review Wiles's manuscript. In the course of his review, he asked Wiles a series of clarifying questions that led Wiles to recognise that the proof contained a gap.
It seems likely Katz played a crucial role here.
Wiles is just a prominent example. In practice, other humans often play this "verification role" for mathematicians, whether via a formal review process or informal discussion.
I hope mathematicians have the self-respect of not establishing rules such as "if a proof is longer than X, break it up in smaller ones, otherwise it'll be outright rejected".
You aren't having the AI check the proof, you interactively work on the same LEAN proof, handing off between the AI assistant and having LEAN check it and provide feedback for both of you when there's a mistake.
(edit: lol didn't realize the sibling comment below is essentially my comment)
It can definitely be a good research assistant though
I had assumed that the recent OpenAI solution to an Erdos problem represented original mathematical thought.
I went looking for more details, and found this Scientific American article which provided some nuance I had previously missed, namely that the mathematicians involved don't think genAI created any really new mathematics - just applied what existed in an intelligent, elegant way:
https://archive.is/oNCSO
So, impressive though the result is, perhaps your claim has more grounds than I had thought.