Why AIs Get Flummoxed by Some Games: New Research Explains Training Failures

Why AIs Get Flummoxed by Some Games: New Research Explains Training Failures

When winning depends on intuiting a mathematical function, AIs come up short. A new paper published in Machine Learning reveals that the training methods powering world-champion game-playing AIs like AlphaGo and AlphaChess fail entirely on an entire category of games—including remarkably simple ones like Nim .


Quick Overview

DetailInformation
Research TopicAI failure modes in impartial games
Study AuthorsBei Zhou and Soren Riis
Published InMachine Learning
Key Game StudiedNim (simple matchstick game)
Game CategoryImpartial games (players share same pieces/rules)
Critical FunctionParity function (determines winning positions)
FindingAlpha-style training cannot learn parity functions
ImplicationSymbolic reasoning remains beyond current methods

The Problem: Impartial Games

What Are Impartial Games?

Impartial games differ from something like chess, where each player has their own set of pieces. In impartial games, the two players share the same pieces and are bound by the same set of rules . Nim, the game studied, is a critical example:

  • Players take turns removing matchsticks from a pyramid-shaped board
  • Choose a row, remove anywhere from one item to the entire row
  • Last player with a legal move wins

Nim’s importance stems from a theorem showing that any position in an impartial game can be represented by a configuration of a Nim pyramid . If something applies to Nim, it applies to all impartial games.

The Parity Function

One distinctive feature of impartial games is that at any point, you can evaluate the board and determine which player has the potential to win by feeding the configuration into a parity function —simple math that tells you whether you’re winning .


The Experiment: Can AI Learn Nim by Playing Itself?

Training Method

The researchers asked a simple question: What happens if you take the AlphaGo approach—training an AI purely by playing itself from only the rules—and try to develop a Nim-playing AI? Could it develop a representation of the parity function through self-play ?

Results: Catastrophic Failure

Board SizeResult
5 rowsAI improved quickly, still learning after 500 iterations
6 rowsRate of improvement slowed dramatically
7 rowsGains stopped entirely by 500 iterations

For a seven-row Nim board, the trained AI was indistinguishable from a random move generator . When asked to evaluate all potential moves, it rated every one as roughly equivalent—even though three optimal winning moves existed .

The researchers concluded that Nim requires learning the parity function to play effectively, and Alpha-style training is incapable of doing so .


Why This Matters Beyond Nim

Similar Issues in Chess and Go

The researchers found signs that similar problems can crop up in chess-playing AIs:

  • Identified “wrong” chess moves (missing mating attacks, throwing end-games) initially rated highly by the AI’s board evaluator
  • Only deep tree searches (looking multiple moves ahead) prevented these gaffes
  • Chess players have found mating combinations requiring long chains that software often misses

In Nim, optimal winning branches must be played out to the end to demonstrate their value, making this sort of avoidance much harder .

The Core Problem: Association vs. Reasoning

“AlphaZero excels at learning through association but fails when a problem requires a form of symbolic reasoning that cannot be implicitly learned from the correlation between game states and outcomes.”
— Zhou and Riis

Even when rules enable simple mathematical formulas for deciding what to do, Alpha-style training cannot identify them. The result is a “tangible, catastrophic failure mode” .


Implications for AI Development

ApplicationConcern
Math ProblemsOften require symbolic reasoning, not just association
Scientific DiscoveryMay need extrapolation from data to general rules
AI SafetyFailure modes could exist in critical systems
Training DesignNeed new approaches beyond self-play

Lots of people are exploring AI for math problems, which often require the sort of symbolic reasoning involved in extrapolating from examples to general rules like the parity function. While it may not be obvious how to train an AI to do that, it’s useful to know which approaches will clearly not work .


Key Takeaways

TakeawayDetails
Impartial gamesCategory including Nim where players share pieces/rules
Parity functionSimple math determines winning positions—but AIs can’t learn it
Training failureAlpha-style self-play fails entirely on 7-row Nim
Broader relevanceSimilar blind spots exist in chess/Go AI
Symbolic reasoningCurrent methods cannot learn what they can’t associate

Leave a Reply

Your email address will not be published. Required fields are marked *