LW - "AI achieves silver-medal standard solving International Mathematical Olympiad problems" by gjm

The Nonlinear Library

25-07-2024 • 3分

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "AI achieves silver-medal standard solving International Mathematical Olympiad problems", published by gjm on July 25, 2024 on LessWrong. Google DeepMind reports on a system for solving mathematical problems that allegedly is able to give complete solutions to four of the six problems on the 2024 IMO, putting it near the top of the silver-medal category. Well, actually, two systems for solving mathematical problems: AlphaProof, which is more general-purpose, and AlphaGeometry, which is specifically for geometry problems. (This is AlphaGeometry 2; they reported earlier this year on a previous version of AlphaGeometry.) AlphaProof works in the "obvious" way: an LLM generates candidate next steps which are checked using a formal proof-checking system, in this case Lean. One not-so-obvious thing, though: "The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found." (That last bit is reminiscent of something from the world of computer go: a couple of years ago someone trained a custom version of KataGo specifically to solve the infamous Igo Hatsuyoron problem 120, starting with ordinary KataGo and feeding it training data containing positions reachable from the problem's starting position. They claim to have laid that problem to rest at last.) AlphaGeometry is similar but uses something specialized for (I think) Euclidean planar geometry problems in place of Lean. The previous version of AlphaGeometry allegedly already performed at gold-medal IMO standard; they don't say anything about whether that version was already able to solve the 2024 IMO problem that was solved using AlphaGeometry 2. AlphaProof was able to solve questions 1, 2, and 6 on this year's IMO (two algebra, one number theory). It produces Lean-formalized proofs. AlphaGeometry 2 was able to solve question 4 (plane geometry). It produces proofs in its own notation. The solutions found by the Alpha... systems are at https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/imo-2024-solutions/index.html. (There are links in the top-of-page navbar to solutions to the individual problems.) (If you're curious about the IMO questions or want to try them yourself before looking at the machine-generated proofs, you can find them -- and those for previous years -- at https://www.imo-official.org/problems.aspx.) One caveat (note: an earlier version of what I wrote failed to notice this and quite wrongly explicitly claimed something different): "First, the problems were manually translated into formal mathematical language for our systems to understand." It feels to me like it shouldn't be so hard to teach an LLM to convert IMO problems into Lean or whatever, but apparently they aren't doing that yet. Another caveat: "Our systems solved one problem within minutes and took up to three days to solve the others." Later on they say that AlphaGeometry 2 solved the geometry question within 19 seconds, so I guess that was also the one that was done "within minutes". Three days is a lot longer than human IMO contestants get given, but this feels to me like the sort of thing that will predictably improve pretty rapidly. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org