Chapter 11: Neuroevolution
Reading about nature is fine, but if
a person walks in the woods and listens
carefully, they can learn more than
what is in books.
—George Washington Carver
![](/static/55a495d6d6634c6eda35df35fc4d89ff/6d4eb/11_nn_ga_1.png)
The star-nosed mole (Condylura cristata), found mainly in the northeastern United States and eastern Canada, has a unique and highly specialized nasal organ. Evolved over numerous generations, its nose consists of 22 tentacles with over 25,000 minute sensory receptors. Despite the moles being functionally blind, these tentacles allow them to create a detailed spatial map of their surroundings. They can navigate their dark underground habitat with astonishing precision and speed, quickly identifying and consuming edible items in a matter of milliseconds.
Congratulations! You’ve made it to the final act of this book. Take a moment to celebrate all that you’ve learned.
![](/static/6f75146ba2d1313d04125c9b3532c372/9ea8b/11_nn_ga_2.png)
Throughout this book, you’ve explored the fundamental principles of interactive physics simulations with p5.js, dived into the complexities of agent and other rule-based behaviors, and dipped your toe into the exciting realm of machine learning. You’ve become a natural!
However, Chapter 10 merely scratched the surface of working with data and neural network–based machine learning—a vast landscape that would require countless sequels to this book to cover comprehensively. My goal was never to go deep into neural networks, but simply to establish the core concepts in preparation for a grand finale, where I find a way to integrate machine learning into the world of animated, interactive p5.js sketches and bring together as many of our new Nature of Code friends as possible for one last hurrah.
The path forward passes through the field of neuroevolution, a style of machine learning that combines the GAs from Chapter 9 with the neural networks from Chapter 10. A neuroevolutionary system uses Darwinian principles to evolve the weights (and in some cases, the structure itself) of a neural network over generations of trial-and-error learning. In this chapter, I’ll demonstrate how to use neuroevolution with a familiar example from the world of gaming. I’ll then finish off by varying Craig Reynolds’s steering behaviors from Chapter 5 so that they are learned through neuroevolution.
Reinforcement Learning
Neuroevolution shares many similarities with another machine learning methodology that I briefly referenced in Chapter 10, reinforcement learning, which incorporates machine learning into a simulated environment. A neural network–backed agent learns by interacting with the environment and receiving feedback about its decisions in the form of rewards or penalties. It’s a strategy built around observation.
Think of a little mouse running through a maze. If it turns left, it gets a piece of cheese; if it turns right, it receives a little shock. (Don’t worry, this is just a pretend mouse.) Presumably, the mouse will learn over time to turn left. Its biological neural network makes a decision with an outcome (turn left or right) and observes its environment (yum or ouch). If the observation is negative, the network can adjust its weights in order to make a different decision the next time.
In the real world, reinforcement learning is commonly used not for tormenting rodents but rather for developing robots. At time t, the robot performs a task and observes the results. Did it crash into a wall or fall off a table, or is it unharmed? As time goes on, the robot learns to interpret the signals from its environment in the optimal way to accomplish its tasks and avoid harm.
Instead of a mouse or a robot, now think about any of the example objects from earlier in this book (walker, mover, particle, vehicle). Imagine embedding a neural network into one of these objects and using it to calculate a force or another action. The neural network could receive its inputs from the environment (such as distance to an obstacle) and output some kind of decision. Perhaps the network chooses from a set of discrete options (move left or right) or picks a set of continuous values (the magnitude and direction of a steering force).
Is this starting to sound familiar? It’s no different from the way a neural network performed after training in the Chapter 10 examples, receiving inputs and predicting a classification or regression! Actually training one of these objects to make a good decision is where the reinforcement learning process diverges from the supervised learning approach. To better illustrate, let’s start with a hopefully easy-to-understand and possibly familiar scenario, the game Flappy Bird (see Figure 11.1).
The game is deceptively simple. You control a small bird that continually moves horizontally across the screen. With each tap or click, the bird flaps its wings and rises upward. The challenge? A series of vertical pipes spaced apart at irregular intervals emerge from the right. The pipes have gaps, and your primary objective is to navigate the bird safely through these gaps. If you hit a pipe, it’s game over. As you progress, the game’s speed increases, and the more pipes you navigate, the higher your score.
![Figure 11.1: The Flappy Bird game](/static/a703fe8d9796362703d7c168d17aa844/9ea8b/11_nn_ga_3.png)
Suppose you want to automate the gameplay, and instead of a human tapping, a neural network will make the decision of whether to flap. Could machine learning work here? Skipping over the initial data steps in the machine learning life cycle for a moment, let’s think about how to choose a model. What are the inputs and outputs of the neural network?
This is quite the intriguing question because, at least in the case of the inputs, there isn’t a definitive answer. If you don’t know much about the game or don’t want to put your thumb on the scale in terms of identifying which aspects of the game are important, it might make the most sense to have the inputs be all the pixels of the game screen. This approach attempts to feed everything about the game into the model and let the model figure out for itself what matters.
I’ve played Flappy Bird enough that I feel I understand it quite well, however. I can therefore bypass feeding all the pixels to the model and boil down the essence of the game to just a few input data points necessary for making predictions. These data points, often referred to as features in machine learning, represent the distinctive characteristics of the data that are most salient for the prediction. Imagine biting into a mysteriously juicy fruit—features like its taste (sweet!), texture (crisp!), and color (a vibrant red!) help you identify it as an apple. In the case of Flappy Bird, the most crucial features are listed here:
- The y-position of the bird
- The y-velocity of the bird
- The y-position of the next top pipe’s opening
- The y-position of the next bottom pipe’s opening
- The x-distance to the next pipe
These features are illustrated in Figure 11.2.
![Figure 11.2: The Flappy Bird input features for a neural network](/static/cd5f383c6f5b90d77c33e9b0c05e069e/9ea8b/11_nn_ga_4.png)
The neural network will have five inputs, one for each feature, but what about the outputs? Is this a classification problem or a regression problem? This may seem like an odd question to ask in the context of a game like Flappy Bird, but it’s actually quite important and relates to the way the game is controlled. Tapping the screen, pressing a button, or using keyboard controls are all examples of classification. After all, the player has only a discrete set of choices: tap or not; press W, A, S, or D on the keyboard. On the other hand, using an analog controller like a joystick leans toward regression. A joystick can be tilted in varying degrees in any direction, translating to continuous output values for both its horizontal and vertical axes.
For Flappy Bird, the outputs represent a classification decision with only two choices:
- Flap.
- Don’t flap.
This means the network should have two outputs, suggesting an overall network architecture like the one in Figure 11.3.
![Figure 11.3: The neural network for Flappy Bird as ml5.js might design it](/static/576ce33ed18da59b66f8bc314ce045ec/884c3/11_nn_ga_5.png)
I now have all the information necessary to configure a model and let ml5.js build it:
let options = {
inputs: 5,
outputs: ["flap", "no flap"],
task: "classification"
};
let birdBrain = ml5.neuralNetwork(options);
What next? If I were following the steps I laid out in Chapter 10, I’d have to go back to steps 1 and 2 of the machine learning process: data collection and preparation. How exactly would that work here? One idea could be to scour the earth for the greatest Flappy Bird player of all time and record them playing for hours. I could log the input features for every moment of gameplay along with whether the player flapped or not. Feed all that data into the model, train it, and I can see the headlines already: “Artificial Intelligence Bot Defeats Flappy Bird.”
But wait a second; has a computerized agent really learned to play Flappy Bird on its own, or has it simply learned to mirror the gameplay of a human? What if that human missed a key aspect of Flappy Bird strategy? The automated player would never discover it. Not to mention that collecting all that data would be incredibly tedious.
The problem here is that I’ve reverted to a supervised learning scenario like the ones from Chapter 10, but this is supposed to be a section about reinforcement learning. Unlike supervised learning, in which the correct answers are provided by a training dataset, the agent in reinforcement learning learns the answers—the optimal decisions—through trial and error by interacting with the environment and receiving feedback. In the case of Flappy Bird, the agent could receive a positive reward every time it successfully navigates a pipe, but a negative reward if it hits a pipe or the ground. The agent’s goal is to figure out which actions lead to the most cumulative rewards over time.
At the start, the Flappy Bird agent won’t know the best time to flap its wings, leading to many crashes. As it accrues more and more feedback from countless play-throughs, however, it will begin to refine its actions and develop the optimal strategy to navigate the pipes without crashing, maximizing its total reward. This process of learning by doing and optimizing based on feedback is the essence of reinforcement learning.
As the chapter goes on, I’ll explore the principles I’m outlining here, but with a twist. Traditional techniques in reinforcement learning involve defining a strategy (called a policy) and a corresponding reward function to provide feedback for adjusting the policy. Instead of going down this road, however, I’m going to turn toward the star of this chapter, neuroevolution.
Evolving Neural Networks Is NEAT!
Instead of using traditional backpropagation, a policy, and a reward function, neuroevolution applies principles of GAs and natural selection to train the weights in a neural network. This technique unleashes many neural networks on a problem at once. Periodically, the best-performing neural networks are “selected,” and their “genes” (the network connection weights) are combined and mutated to create the next generation of networks. Neuroevolution is especially effective in environments where the learning rules aren’t precisely defined or the task is complex, with numerous potential solutions.
One of the first examples of neuroevolution can be found in the 1994 paper “Genetic Lander: An Experiment in Accurate Neuro-genetic Control” by Edmund Ronald and Marc Schoenauer. In the 1990s, traditional neural network training methods were still nascent, and this work explored an alternative approach. The paper describes how a simulated spacecraft—in a game aptly named Lunar Lander—can learn how to safely descend and land on a surface. Rather than use handcrafted rules or labeled datasets, the researchers opted to use GAs to evolve and train neural networks over multiple generations. And it worked!
In 2002, Kenneth O. Stanley and Risto Miikkulainen expanded on earlier neuroevolutionary approaches with their paper