Friday, September 22, 2006

Evolving a Nim player


In the post Most Human Human I suggested that you could program a computer randomly and actually make the computer do something you didn't explicitly program it to do. I propose here an example of how this could be done.

First I will introduce you to the game of Nim. Nim is an impartial two player game in which you alternate taking matches from one or several piles of matches. The goal of this version of the game is to be the player not taking the last match from the board. In this example I will use the simplest version where you only have one pile of 21 matches and you are forced to take one, two or three matches when it is you turn.

How should we design a scheme that a Nim player can follow which is appropriate for random programming? Every position in this version of Nim can be expressed by a number 1,2,3,...,21. Some of these positions will be loosing and some will be winning. So lets use a scheme where every number is assigned a 1 if it is loosing and 0 if it is a winning position for the player to move. A program could look like the sequence below where the value of a pile with 1 stick is represented by the number to the left, and the value of a pile with 21 sticks is the right most number.

010010010101001100100

A player could use this scheme to know which move to make in every position. The example above tells the player that the initial position of 21 sticks is a winning position as the last number in the sequence is a zero. How does he know how many sticks to take when it is his move. He looks at the scheme and sees that the three following positions are numbered 010. This means that if he takes two sticks he will give the opponent a position with value 1 which he would think was a loosing position. If he have multiple options he chooses randomly between those. Note that this representation is not always consistent, but let evolution take care of that.

Now lets create an initial random population of creatures who's genome are the same as the scheme above. Cross breeding two creatures would be done by choosing a random position in the genome and swap the genes just like nature does it. Below '-' marks the crossoverpoint.

P1: 10011001 - 1010100101001
P2: 01001001 - 0101001100100

gives two possible child creatures.

C1: 10011001 - 0101001100100
C2: 01001001 - 1010100101001

Sometimes we also will change a bit randomly, this is called a mutation and allows new traits to emerge in a stagnant population. When the initial population is in place we can start matching up creatures against each other playing Nim for their life, losers will be replaced by offspring generated from parents chosen randomly from the population. After a few generations we will see that almost all creatures will have the same gene in the left most position. The total gene pool can be described like this

1XXXXXXXXXXXXXXXXXXXX

where the 1 means that every creature now know that when there is only one stick left you have a loosing position, you are forced to take it and therefore loose. The creatures that had a failing gene here would succumb fast to the more sophisticated ones. After running the program for a few more generations you will see that consensus have emerged with regard to a few more genes, like this

1000XXXXXXXXXXXXXXXXX

Now evolution has arrived at the notion that when there are 2,3 or 4 sticks left these positions must be winning as a player faced with these values always have the possibility to leave the next player with the loosing position of one stick left. After running some more generations all the creatures will look the same(except for a few random mutated ones that always will be a minority). The final dominant genome will look like this.

100010001000100010001

So, the computer have randomly, guided by evolution, arrived at a strategy to play a perfect game of Nim. The interesting thing here is that the programmer of the genetic algorithm may very well be unaware of this optimal strategy beforehand. This is different than the algorithms used in chess programs which are designed to search deep, evaluating the board from preset patterns designed by humans. If on the other hand the evaluation function was designed by an evolving population of competing chess programs we maybe could arrive at a chess program who has better understanding of chess than any human.

No comments: