Why AIs Get Flummoxed by Some Games: New Research Explains Training Failures

When winning depends on intuiting a mathematical function, AIs come up short. A new paper published in Machine Learning reveals that the training methods powering world-champion game-playing AIs like AlphaGo and AlphaChess fail entirely on an entire category of games—including remarkably simple ones like Nim .

Quick Overview

Detail	Information
Research Topic	AI failure modes in impartial games
Study Authors	Bei Zhou and Soren Riis
Published In	Machine Learning
Key Game Studied	Nim (simple matchstick game)
Game Category	Impartial games (players share same pieces/rules)
Critical Function	Parity function (determines winning positions)
Finding	Alpha-style training cannot learn parity functions
Implication	Symbolic reasoning remains beyond current methods

The Problem: Impartial Games

What Are Impartial Games?

Impartial games differ from something like chess, where each player has their own set of pieces. In impartial games, the two players share the same pieces and are bound by the same set of rules . Nim, the game studied, is a critical example:

Players take turns removing matchsticks from a pyramid-shaped board
Choose a row, remove anywhere from one item to the entire row
Last player with a legal move wins

Nim’s importance stems from a theorem showing that any position in an impartial game can be represented by a configuration of a Nim pyramid . If something applies to Nim, it applies to all impartial games.

The Parity Function

One distinctive feature of impartial games is that at any point, you can evaluate the board and determine which player has the potential to win by feeding the configuration into a parity function —simple math that tells you whether you’re winning .

The Experiment: Can AI Learn Nim by Playing Itself?

Training Method

The researchers asked a simple question: What happens if you take the AlphaGo approach—training an AI purely by playing itself from only the rules—and try to develop a Nim-playing AI? Could it develop a representation of the parity function through self-play ?

Results: Catastrophic Failure

Board Size	Result
5 rows	AI improved quickly, still learning after 500 iterations
6 rows	Rate of improvement slowed dramatically
7 rows	Gains stopped entirely by 500 iterations

For a seven-row Nim board, the trained AI was indistinguishable from a random move generator . When asked to evaluate all potential moves, it rated every one as roughly equivalent—even though three optimal winning moves existed .

The researchers concluded that Nim requires learning the parity function to play effectively, and Alpha-style training is incapable of doing so .

Why This Matters Beyond Nim

Similar Issues in Chess and Go

The researchers found signs that similar problems can crop up in chess-playing AIs:

Identified “wrong” chess moves (missing mating attacks, throwing end-games) initially rated highly by the AI’s board evaluator
Only deep tree searches (looking multiple moves ahead) prevented these gaffes
Chess players have found mating combinations requiring long chains that software often misses

In Nim, optimal winning branches must be played out to the end to demonstrate their value, making this sort of avoidance much harder .

The Core Problem: Association vs. Reasoning

“AlphaZero excels at learning through association but fails when a problem requires a form of symbolic reasoning that cannot be implicitly learned from the correlation between game states and outcomes.”
— Zhou and Riis

Even when rules enable simple mathematical formulas for deciding what to do, Alpha-style training cannot identify them. The result is a “tangible, catastrophic failure mode” .

Implications for AI Development

Application	Concern
Math Problems	Often require symbolic reasoning, not just association
Scientific Discovery	May need extrapolation from data to general rules
AI Safety	Failure modes could exist in critical systems
Training Design	Need new approaches beyond self-play

Lots of people are exploring AI for math problems, which often require the sort of symbolic reasoning involved in extrapolating from examples to general rules like the parity function. While it may not be obvious how to train an AI to do that, it’s useful to know which approaches will clearly not work .

Key Takeaways

Takeaway	Details
Impartial games	Category including Nim where players share pieces/rules
Parity function	Simple math determines winning positions—but AIs can’t learn it
Training failure	Alpha-style self-play fails entirely on 7-row Nim
Broader relevance	Similar blind spots exist in chess/Go AI
Symbolic reasoning	Current methods cannot learn what they can’t associate