Connect-4 AI | Programming Project

Introduction

Connect 4, developed by Howard Wexler, is deceptively simple: two players drop chips into a 6×7 grid, racing to align four in a row. The rules take seconds to learn. Mastery is another matter entirely.

Humans make mistakes. We overlook threats, misjudge positions, and fail to see optimal moves. Connect 4 has been mathematically solved since 1988—the first player can always force a win with perfect play. But "perfect play" requires evaluating positions that number in the trillions.

This raises a fundamental question: can we create an AI that plays Connect 4 perfectly? And if so, what does that tell us about machine intelligence versus human reasoning?

[ Project Visual ]

Research Questions

?What is the optimal approach to creating a perfect (or near-perfect) Connect 4 AI?
?Can neural networks "think" and "reason" in ways comparable to humans?
?Setting aside raw speed, is the AI's decision-making process fundamentally better or worse than human intuition?
?What ethical questions emerge from developing algorithms that surpass human capability?
?Can any algorithm achieve true perfection, or will biases always exist—such as hardcoded opening books?

Program

Developing a Connect-4 solver demands careful consideration of methodology. We explored multiple approaches before arriving at our final implementation.

Initial Considerations

?Is utilizing search tables/tree algorithms ideal for this situation?
?Would it make sense to develop a neural network?
?What is the best way to train a neural network?

Since each chip position in the Connect-4 grid could be represented in a 2D array, we initially decided to develop a neural network. Each column is represented by an integer (indexed from zero), and the neural network would "pick" where to drop its chip. Because the output is a single integer, a properly trained neural network could make decisions faster than an algorithm calculating every possible position.

Approach 01

Genetic Algorithm + Perfect Games

We paired randomly generated algorithms with a perfect AI, simulating slight mutations after every game. The goal: generate an algorithm that could approximate the solved game. Unfortunately, since only a perfect AI can beat the perfect AI, the neural network never improved—it lost every game. The only way it could have succeeded was random chance, with near-zero probability.

Approach 02

Brute Force Tree Search

We dynamically generated a tree of all possible moves from the current board state, analyzing what percentage of each branch leads to victory versus defeat. With trillions of possible permutations, we implemented pruning to cut off branches destined for failure early—prioritizing more promising paths.

Approach 03

Convolutional Neural Network + Reinforcement Learning

Using a public dataset from Hugging Face compiled with Pascal Pons' Connect 4 solver (~160 million game board states), we trained the AI to predict win probabilities for both players at each position. This sample-based approach extended predictions across the entire game tree.

[ Architecture Diagram ]

Evolution Strategy

We chose to make the AI play against mutated versions of itself. This makes it far easier to evaluate win-to-loss ratios—playing against the perfect AI wouldn't allow learning, since it would always lose.

Our process of cherry-picking the best-performing AI mimics real-life evolution and fitness. We "breed" the ultimate AI by selecting the best children—like how dog breeders choose dogs with the most desirable traits and breed similar dogs to eventually produce the best possible outcome.

Training Output

Using device: cuda
GPU: NVIDIA GeForce RTX 4070
============================================================
(against the perfect algo)
============================================================

Initializing population of 40 neural networks...
Population created! Starting training for 50 generations...

  Evaluated 10/40 networks
  Evaluated 20/40 networks
  Evaluated 30/40 networks
  Evaluated 40/40 networks
Gen 0 Best fitness: -12.000  (range: -12.000 → -12.000)
  ...
Gen 15 Best fitness: -12.000  (range: -12.000 → -12.000)

Selection Process

We refined our approach with an intermediate selection phase after the main tournament:

Take the top 10 performers and play them against a random AI (two games each—starting first and second)
Only those with 100% success rate against the random AI move on
Survivors play the perfect AI, ranked by: least moves to win → most moves to win → most moves to lose → least moves to lose
Top three become parents for the next generation

If no models survive against the random AI with 100% success, we take the top three from the tournament instead.

Results

160M+Training States

40Network Population

50Generations

[ Performance Graph ]

Observations

During testing, our AI won against the perfect AI in 39 moves. However, in a theoretically perfect game where our AI moves first, the minimum number of moves to win should be 41—since the opponent plays perfectly.

This discrepancy indicates potential issues in our evaluation system or reveals interesting edge cases in the game tree that warrant further investigation.

Current Capabilities

The program successfully plays Connect-4 and demonstrates strategic decision-making. However, it has not yet achieved truly "perfect" play, as evidenced by the move count anomalies observed during testing. The AI is still not playing perfectly—further refinement is needed.

Discussion

Autonomy Through Learning

Unlike tree-algorithm approaches, our learning method allows the AI to learn from itself rather than following pre-written mathematical instructions. This gives the AI autonomy and independence—it acts through learned behavior, not rigid programming.

Evolutionary Parallels

Our selection process mirrors natural evolution: cherry-picking the fittest individuals to produce the next generation. The alterations between generations are determined through mutations, mimicking biological inheritance.

Move-Level Evaluation

Instead of scoring entire games, we shifted to evaluating individual moves. This provides more granular feedback for the learning process and allows the AI to understand which specific decisions lead to better outcomes.

Future Directions

Training on probabilistic movesets from the dataset and implementing more sophisticated evaluation functions could push the AI closer to perfect play. The CNN approach using reinforcement learning shows the most promise.

[ Comparison Chart ]

References & Resources

github.com/PascalPons/connect4 blog.gamesolver.org huggingface.co/datasets/TonyCWang/ConnectFour