Ask HN: Why did AlphaProof fail at IMO combinatorial problems?
AlphaProof uses a pre-trained language model + MCTS (Monte Carlo tree search) to generate Lean proofs to solve the IMO (International Mathematical Olympiad) algebraic problems (I think).
Why doesn't this work for IMO combinatoric problems?
DeepMind AI solved 100% of the IMO algebra problems using AlphaProof, and geometry problems using AlphaGeometry 2, but none (?) of the combinatorial problems.
I don’t see much commentary on why the failure on combinatorial problems. Why is that?
To try answering myself:
It's because algebraic type problems are typically of the form "Prove algebraic equation E", where E is relatively easy to translate into Lean language.
Whereas combinatoric problem types are of the form "Prove some open-ended expression P" that even if translated into Lean, it's not clear where to start to connect it to axioms or the eventual problem solution.
If that’s an okay way to describe it, then AlphaProof’s language model, likely trained only on Lean statements (?), would have a tough time generating plausible next steps to combinatorial problems for MCTS to explore?
That doesn’t mean the LM + MCTS architecture is bad though. It means the language model needs to be better at generating plausible next steps (sequence of tokens) with maybe a less axiomatic approach to evaluating the fitness of that step (less Lean reliance?).
Yes you can get a 7/7 score for a combinatorics problem in IMO with a good amount of handwaving e.g. instead of formally describing a configuration of monsters in P5 of IMO 2024 you can get away with a few sentences that describes what you're doing. There's actually a decent amount of compression being done by using informal language, so I expect formal proofs of combinatorics problems to be significantly longer than formal proofs of algebra/number theory/geometry problems.
I should say that handwaving is also ok in other topics, but formalizing handwaving in other topics is easier.