Genetic programming

Question

I recently was browsing Reddit and came across a post linking to a "JavaScript genetic algorithm" example. I've really been fascinated with the concepts of genetic algorithms and programming, however even after some Googling I am still left slightly confused. How does it work?

I suppose the vocabulary terms are confusing me more than anything else. I would appreciate brief examples and perhaps explanations. Just the concept of genetic programming and how I could implement it in my projects and why?

There's a good book by Mat Buckland called "AI Techniques for Game Programming" (http://www.amazon.com/Techniques-Programming-Premier-Press-Development/dp/193184108X) in which half of the book covers genetic algorithms. The title of the book is a bit of a misnomer, it's a book on GAs and Neural Nets. It's a great intro to the topic. — Steven Evers, Jun 04 '11 at 15:18

score 19 · Answer 1 · answered Jun 04 '11 at 16:07

Sounds like you're talking about Genetic Algorithms more-so than Genetic Programming, but here's my contribution to your understanding.

It can be handy to think of GAs in terms of the parts that they are composed of.

So let's say you have some sort of problem. The first thing you need is a way to express what a solution will look like. If you had a travelling salesman problem with cities A, B, C, D, E then you already know what a solution could look like, an array of the names of the cities [B, C, A, D, E].

This is the Gene.

Otherwise known as a potential solution to the problem. Like Steven A. Lowe mentions, bit strings are common way to encode genes, but it's not necessary; it just makes certain things easier. The important part is that you have a way to represent a solution in this array-like fashion.

Now. How do you know if the solution is any good? You need a function that can tell you, and grade the solution. So, again for at TSP you might have a function that measures the distance traveled using the path [B, C, A, D, E]. The 'grade' that you assign could simply be the distance traveled but in more complicated problems you could include things like the cost of travel and other things.

This is the Fitness Function.

So now you can take a potential solution and find out if it's any good. What's next?

Next we need to start our first generation. So we generate a bunch of random solutions. It doesn't matter if they're good or not. This is your initial, or seed, population. You can call this your gene pool.

So you take your initial gene pool and you apply your fitness function to them all and give them all a grade. Now you need to take two of them and make a new population from them - for the next generation. Who do you select? Well you don't want to necessarily select only the fittest of the fit, that can lead to some problems. Instead you need a selection function.

One way to select that is easy to visualize is using a sort of wheel: each gene is a slice on a wheel, and their fitness score indicates how big their slice is (the better the fitness, the bigger the slice). Put a pin pointing to the wheel and give it a spin (ie. generate a random number). The pin points to the first parent. Do it again for the second parent.

Now, you need to create new children. You want to combine the parents to produce a new population. There are various ways to do this, but they're all called the crossover function. You can split them in half and swap the halves between the parents, or do some kind of interleaving. This is very analogous to mammalian parents producing new children -> they both contribute their genes to the new child.

Once you have this new generation, you throw in random, yet rare, mutation to each child. I have often seen mutation rates occur at less than 1%. The mutation function will randomly change something in your encoded gene. If your gene is a bit string, it could swap a bit, if it's a array of cities, it could swap 2 cities in the list. The important part is that it is a relatively rare occurrence and mixes things up.

Repeat this process until a desired number of generations, or until your fitness function produces parents with consistently high fitness scores and you have a solution that is (hopefully, if you've done everything right) optimal.

That was a bit wordy, so let me summarize with a metaphor:

Genes are people : people solve problems
Fitness Functions are grades : People get a grade based on how well they solve a problem
You select 2 people to breed a new population : you give people with better grades better chances of breeding
When the parents breed, they combine to produce children.
You rarely and randomly mutate their children
You grade the children of the new population
Rinse and repeat

Hope this helps.

This is a great explanation. I always thought genetic algorithms are better described as darwinian algorithms or evolutionary algorithms, but "genetic" certainly does describe the mechanics better (if not the overall idea of it). I'll call them Darwinian genetic algorithms. — Steven Lu, Sep 26 '14 at 09:43
@Florian Margaine: Game of life is a cellular automaton, an unrelated concept (starting from the fact that game of life is entirely deterministic, while GA are stochastic). — scrwtp, Sep 26 '14 at 13:22
This is, hands down, the single best explanation of GA I've ever heard. I've seen genetic algorithms mentioned in the past on multiple occasions, usually with offhand explantions, but never really understood what they were until now. Thanks! — Michael, Sep 26 '14 at 15:42
I wish I'd seen this explanation when I first started learning GAs! — Avrohom Yisroel, Nov 29 '17 at 15:49

Steven A. Lowe · Answer 2 · 2011-04-19T21:03:07.680

7

encode a solution to a problem as a bit-string

write a function (called a "fitness" function) that evaluates how 'good' the encoded solution is given a bit-string - the result is usually a number between 0 and 1

randomly generate a bunch of these bit-strings and evaluate their fitness

choose some of the bunch - typically the more-fit ones - and cut them in half and swap halves to make some new bit-strings (crossover)

then sometimes, randomly flip a few bits in some of the new bit-strings (mutation)

repeat until a good solution evolves

why do this: some problems have enormous possible solution spaces, so large that evaluating all possibilities is impractical (c.f. Traveling Salesman Problem)

I highly recommend the book Genetic Algorithms in Search, Optimization, and Machine Learning

edited Apr 19 '11 at 21:03

answered Apr 19 '11 at 20:47

Steven A. Lowe

33,808
2
84
151

An Amazon search on "Genetic Algorithms" got me four pages of stuff. I only looked at the first page, but none of the books there were titled "Genetic Algorithms". Could you provide more detail on the book, such as full title and author? – David Thornley Apr 19 '11 at 20:57
Challenge: restate the answer as a genetic algorithm. [-: – veryfoolish Apr 19 '11 at 20:57
@David link added; published in 1989 so there may be better ones out now but this one explined things well – Steven A. Lowe Apr 19 '11 at 21:04
1

@veryfoolish: first, restate the question as a bounded discrete-space solution – Steven A. Lowe Apr 19 '11 at 21:04
@David Genetic algorithms are also likely to be a chapter or two in a larger book about artificial intelligence. – Barry Brown Jun 05 '11 at 19:48
@veryfoolish StackExchange is already a genetic algorithm, the solvers are just so good we need to give them a new problem every iteration. – Weaver Sep 26 '14 at 11:17

score 6 · Answer 3 · edited Jun 16 '20 at 10:01

Genetic programming is a way of having the computer write programs for you!

Don't think "programs" like MS Word, think of "programs" as the following:

function(x){ return x*2; }

This function (or program), by itself, doesn't have a reason to exist. We are looking for solutions to problems. If you need to find the sum of two numbers, you just open the calculator and do the math. What if someone gave you the following table and asked you to figure out the relationship between result and x and y:

x   y   result
99  1   (3.02)
79  88   2.01 
21  62   5.01 
84  52  (6.58)
12  70   5.54 
67  18   0.73

This data is your "training" data. Your computer will use this data to generate some hypothesis, then you will test it against actual data.

Say you don't know statistics and decide this problem is too hard to figure out on your own, so you'll get the computer to figure it out for you.

Have the computer randomly generate wild guesses

You have the computer generate a million answers, and see if any of them stick (guess...a million times!). Following is an example of a couple of guesses:

function(x,y){ return x+y; } // wrong
function(x,y){ return x/1*1*1*1*1*1+y; } //wrong, silly

You may or may not know this, but functions or programs can also be represented as trees, for example, the second function would be:

(+ (/ x (* 1 (* 1 (* 1 (* 1 (* 1 1)))) y)

You can make it look more like a tree by indenting it like so (btw, look up reverse polish notation and lisp syntax...but you will understand why we represent programs like this shortly):

(+ is at the top with two "leaves" of / and y. / itself has several children, etc.)

This is why you read so much about "trees" in genetic programming. In any case, we plug in the values of x and y into this function and it gives us the WRONG answer. Not surprising though since we randomly generated this.

You now decide to generate a million such solutions. All of them are wrong. However, you notice that some answers are closer to the right answer than others. In other words, some solutions are more "fit" than others. Note that the computer doesn't know what is "right" and "wrong" so you have to provide your own "fitness function." This function gets handed a potential solution, the training data, and is responsible for telling the GP system just how "fit" this solution is. As you can imagine, this function gets run millions upon millions of times.

What makes GP different

Here is what makes genetic programming different from wild guesses. You decide to do another round of million guesses; however, you do it a little more intelligently. You take the top 10% of the guesses (the ones that were closes to the actual values) and make them part of the second generation. You also take many of these solutions (perhaps the same 10%...i don't remember) and decide to "mix them up."

You randomly pick two solutions, randomly pick sub-trees and start swapping them. So part of solution A ends up under solution B and vice versa -- you just "crossed" them. You also take some solutions and simply "mutate" them...take some subtree and 'screw it up' a bit (hey, if the solution is terrible, 'screwing with it for no reason' might actually improve it).

A good way of thinking of this is the following: your mom and dad have certain attributes--hair color, height, likelihood of disease, etc. You, as a child inherit different attributes from both of your parents. If both of your parents were olympic athletes, you will also be a super athlete, right? Well, biologists, sociologists and even historians might take issue with this idea, but computer scientists are not concerned with the morality of eugenics here. They just saw a "system" doing a pretty good job providing solutions, so they decided to model it in software.

If it doesn't actually match biology, but still provides good answers... many computer scientists collectively say "whatever dude, and thanks for the terminology." Also note that all your brothers and sisters and not EXACTLY the same... even through they have same parents. Each person has genes which mutate for whatever reason (please don't show this to a biologist, the point is to understand the motivation behind much of the terminology).

So now we are getting the computer to generate millions of programs and measuring their fitness. The best solutions survive into the next generation. We also "mutate" and do cross-over on the "population" (notice how language of genetics and biology is being used). Once the second generation is created, fitness is again measured. Since this generation has the best solutions from the previous generation AND we crossed over and mutated the best solutions (along with mediocre population -- to keep up diversity), this generation should be at least a little better than the previous generation.

We continue this for a very large number of generations. Each generation (hopefully) provides better and better solutions, until we get the right answer. For example:

(+ (- 2.2 (/ x 11) (* 7 (cos y))))

Well look at this, this is correct!
(I copied this from http://en.wikipedia.org/wiki/Genetic_programming, which also has a graphic representation of this tree)

Odds and Ends

There are some important issues, like how do you decide which "terminals" (+, -, *, /, cos, sin, tan) are available to your GP system, how do you write the fitness function and how does the system handle non-sensical programs such as (1 + cos) or (2 / "hello") (among many others).

It is pretty boring to evolve equations. It gets more interesting if your terminal set looks like the following: (fire, sense enemy, move, ...) and your fitness function measures your health and the number of dead bodies of martial monsters.

I wrote most of this from memory but this is the basic idea. I did some GP in my college years. You should definitely play around with it. Don't worry about understanding all the terminology, just download some free GP systems, run through a few examples to get a feel for it and make up your own interesting examples (find relationships between different data sets, try to hook it up to game APIs, etc.)

score 1 · Answer 4 · edited Sep 26 '14 at 15:40

Survival of the Fittest: Natural Selection with Windows Forms was how I got introduced to Genetic Programming. It's an easy read with code available for download. The downside is that GP requires a means to execute code created at runtime, and at the time the article was written, C# wasn't well suited to this task. That is why the example uses CodeDOM to generate, compile and run code at run-time, which by itself adds another layer of complexity to it.

Things have changed since then with .NET now having an ExpressionTree API of its own, which would probably allow a more elegant GP implementation in C# than the one described in the article. But it's good enough to get an understanding of how GP works.

Here you can download a free ebook on GP, which also includes a very short java code example you might also find interesting.

score -1 · Answer 5 · answered Apr 19 '11 at 20:56

Genetic algorithms and genetic programming are related, but different concepts.

Genetic algorithms (GAs) are search algorithms for complex optimization problems. In a GA, you encode the parameters of a solution to some problem in a "DNA" bitstring, then randomly "breed" these bitstrings: have them reproduce by combining parts of them and apply "survival of the fittest" by deleting all the bitstrings you've got except for the ones that are best at solving your problem.

Genetic programming (GP) is even more complicated: here, you don't represent programs by their DNA (bitstrings), but by parse trees that you breed and select.

Genetic programming

5 Answers5

Have the computer randomly generate wild guesses

What makes GP different

Odds and Ends