16. Creating a Snake Game AI with Deep Q-Learning: A Step-by-Step Guide

Introduction

Artificial intelligence (AI) can be applied in many fun and interesting ways. One such project is creating an AI that plays the classic Snake game. By utilizing deep Q-learning, a powerful reinforcement learning technique, we can train an AI to navigate the snake, avoid obstacles, and collect food to maximize its score. In this article, we'll break down the project into seven key steps, explaining how the snake game was modified and the AI was built using PyTorch and a neural network.

1. Snake Game - Modify the Game to Make It Ready for AI

To begin with, the original Snake game needs to be modified to suit AI training. The main task is to strip down the game logic to focus on the essential aspects that the AI needs to learn from. This means removing the user interface (UI) and instead providing a simple mechanism for the game to take input in terms of moves and return feedback like the current score and whether the game has ended.

2. Snake Game - No UI - Stripping Down to the Essentials for Efficient Training

Once the game is ready for AI training, we no longer need any visuals or player inputs. The game simply returns the necessary information (like snake position, food location, and possible dangers) in a format that the AI can use to make decisions. This stripped-down version allows the AI to focus solely on learning the optimal actions based on the game state.

3. Artificial Neural Network (ANN) - Building the Brain Behind Our Intelligent Snake

The heart of our AI is the artificial neural network (ANN) that predicts the best move the snake should take at any given time. The network takes the current game state as input and outputs a set of values corresponding to each possible action (moving left, right, up, or down).


class ANN(nn.Module):
    def __init__(self, state_size, action_size):
        super(ANN, self).__init__()
        self.fc1 = nn.Linear(state_size, 64)
        self.fc2 = nn.Linear(64, action_size)

    def forward(self, state):
        x = self.fc1(state)
        x = F.relu(x)
        return self.fc2(x)

4. Replay Memory - Leveraging Past Experiences to Enhance Learning

One of the key components of deep Q-learning is the use of "replay memory." The idea is to store past experiences (state, action, reward, next state, done) in a memory buffer. This allows the AI to learn from its past mistakes and successes, rather than only from the most recent state transitions. The ReplayMemory class manages this memory buffer, which is essential for stabilizing the training process.


class ReplayMemory:
    def __init__(self, capacity):
        self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        self.capacity = capacity
        self.memory = []

    def push(self, event):
        self.memory.append(event)
        if len(self.memory) > self.capacity:
            del self.memory[0]

    def sample(self, k):
        experiences = random.sample(self.memory, k=k)
        # Prepare data for training
        return states, actions, rewards, next_states, dones

5. Agent - Designing the Decision-Maker that Powers the Snake

The agent is the core component that interacts with the environment (the game). It makes decisions based on the current state of the game and its past experiences. The agent uses the ANN to predict the best move and stores its experiences in replay memory for later learning. The agent follows the epsilon-greedy strategy, balancing exploration (random moves) and exploitation (choosing the best-known move).


class Agent:
    def __init__(self, state_size, action_size):
        self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        self.state_size = state_size
        self.action_size = action_size
        self.local_network = ANN(state_size, action_size).to(self.device)
        self.target_network = ANN(state_size, action_size).to(self.device)
        self.optimizer = optim.Adam(self.local_network.parameters(), lr=learning_rate)
        self.memory = ReplayMemory(replay_buffer_size)

    def get_action(self, state, epsilon):
        state = torch.from_numpy(state).float().unsqueeze(0).to(self.device)
        self.local_network.eval()
        with torch.no_grad():
            action_values = self.local_network(state)
        self.local_network.train()
        if random.random() > epsilon:
            move = torch.argmax(action_values).item()
        else:
            move = random.randint(0, 3)
        return move

6. Hyperparameters - Fine-Tuning the Settings That Shape Our AI’s Performance

Hyperparameters play a crucial role in how well the AI learns. In this project, several key hyperparameters were defined to control the training process:

epsilon controls the exploration-exploitation trade-off.
gamma defines how much the agent values future rewards.
minibatch_size determines how many experiences the agent uses in each training step.
learning_rate defines the step size for weight updates.

These hyperparameters were fine-tuned to improve the AI’s performance during training.


# Hyperparameters
number_episodes = 100000
epsilon_starting_value = 1.0
epsilon_ending_value = 0.001
learning_rate = 0.01
gamma = 0.95

7. Train Agent / Run - Bringing It All Together to Train and See Our AI in Action

Once everything is set up, the agent begins training in the game environment. The agent interacts with the environment by taking actions, observing the results, and learning from past experiences. The training process continues for a specified number of episodes, gradually improving the agent's performance.


if __name__ == "__main__":
    game = Game()
    agent = Agent(state_size=state_size, action_size=action_size)
    agent.load()
    max_score = 0
    epsilon = epsilon_starting_value

    for episode in range(0, number_episodes):
        game.reset()
        score = 0
        for t in range(maximum_number_steps_per_episode):
            state_old = agent.get_state(game)
            action = agent.get_action(state_old, epsilon)
            move = [0, 0, 0, 0]
            move[action] = 1
            reward, done, score = game.run(move)
            state_new = agent.get_state(game)
            agent.step(state_old, action, reward, state_new, done)
            if done:
                break
        max_score = max(max_score, score)
        agent.save_model()
        agent.save_data(max_score, epsilon)

Complete Video

Conclusion

You can create a fully functional snake game AI using deep Q-learning with these steps. The AI gradually learns to navigate the snake, collect food, and avoid collisions, ultimately mastering the game.

Code

You can find the full project on GitHub here.

ANN 4