Can a neural network learn to play a solved game perfectly? We built one to find out.
Connect 4, developed by Howard Wexler, is deceptively simple: two players drop chips into a 6×7 grid, racing to align four in a row. The rules take seconds to learn. Mastery is another matter entirely.
Humans make mistakes. We overlook threats, misjudge positions, and fail to see optimal moves. Connect 4 has been mathematically solved since 1988—the first player can always force a win with perfect play. But "perfect play" requires evaluating positions that number in the trillions.
This raises a fundamental question: can we create an AI that plays Connect 4 perfectly? And if so, what does that tell us about machine intelligence versus human reasoning?
Developing a Connect-4 solver demands careful consideration of methodology. We explored multiple approaches before arriving at our final implementation.
Since each chip position in the Connect-4 grid could be represented in a 2D array, we initially decided to develop a neural network. Each column is represented by an integer (indexed from zero), and the neural network would "pick" where to drop its chip. Because the output is a single integer, a properly trained neural network could make decisions faster than an algorithm calculating every possible position.
We paired randomly generated algorithms with a perfect AI, simulating slight mutations after every game. The goal: generate an algorithm that could approximate the solved game. Unfortunately, since only a perfect AI can beat the perfect AI, the neural network never improved—it lost every game. The only way it could have succeeded was random chance, with near-zero probability.
We dynamically generated a tree of all possible moves from the current board state, analyzing what percentage of each branch leads to victory versus defeat. With trillions of possible permutations, we implemented pruning to cut off branches destined for failure early—prioritizing more promising paths.
Using a public dataset from Hugging Face compiled with Pascal Pons' Connect 4 solver (~160 million game board states), we trained the AI to predict win probabilities for both players at each position. This sample-based approach extended predictions across the entire game tree.
We chose to make the AI play against mutated versions of itself. This makes it far easier to evaluate win-to-loss ratios—playing against the perfect AI wouldn't allow learning, since it would always lose.
Our process of cherry-picking the best-performing AI mimics real-life evolution and fitness. We "breed" the ultimate AI by selecting the best children—like how dog breeders choose dogs with the most desirable traits and breed similar dogs to eventually produce the best possible outcome.
Using device: cuda GPU: NVIDIA GeForce RTX 4070 ============================================================ (against the perfect algo) ============================================================ Initializing population of 40 neural networks... Population created! Starting training for 50 generations... Evaluated 10/40 networks Evaluated 20/40 networks Evaluated 30/40 networks Evaluated 40/40 networks Gen 0 Best fitness: -12.000 (range: -12.000 → -12.000) ... Gen 15 Best fitness: -12.000 (range: -12.000 → -12.000)
We refined our approach with an intermediate selection phase after the main tournament:
If no models survive against the random AI with 100% success, we take the top three from the tournament instead.
During testing, our AI won against the perfect AI in 39 moves. However, in a theoretically perfect game where our AI moves first, the minimum number of moves to win should be 41—since the opponent plays perfectly.
This discrepancy indicates potential issues in our evaluation system or reveals interesting edge cases in the game tree that warrant further investigation.
The program successfully plays Connect-4 and demonstrates strategic decision-making. However, it has not yet achieved truly "perfect" play, as evidenced by the move count anomalies observed during testing. The AI is still not playing perfectly—further refinement is needed.
Unlike tree-algorithm approaches, our learning method allows the AI to learn from itself rather than following pre-written mathematical instructions. This gives the AI autonomy and independence—it acts through learned behavior, not rigid programming.
Our selection process mirrors natural evolution: cherry-picking the fittest individuals to produce the next generation. The alterations between generations are determined through mutations, mimicking biological inheritance.
Instead of scoring entire games, we shifted to evaluating individual moves. This provides more granular feedback for the learning process and allows the AI to understand which specific decisions lead to better outcomes.
Training on probabilistic movesets from the dataset and implementing more sophisticated evaluation functions could push the AI closer to perfect play. The CNN approach using reinforcement learning shows the most promise.