One of the more intriguing types of algorithm is the genetic algorithm. Genetic algorithms mimic Darwinian natural selection, where "fitness" selects individuals for survival, breeding, and, hence, adaptive mutation. I covered the background on this in a previous column, and I also showed two Perl implementations, one that bred bytes and another that bred words.
In this article, I cover more advanced material on genetic algorithms in Perl. You might want to go back and read the first article before diving into this one; genetic algorithms have well-defined steps, and some of the code here draws upon code in the earlier article without explaining the details.
Before you begin, you should have Perl 5.6.0 or later installed on your system. The examples might work with earlier versions of Perl and on platforms other than the common UNIX ones (such as Windows), but they have not been tested in those environments, and it might take extra work to get them running.
The previous article showed an example that looked for a small set of dictionary words in the DNA of the individuals, and ranked individuals based on how many dictionary words (the longer, the better) the individual's DNA had.
Here, we begin with this dictionary example and modify it as shown in Listing 1 (download the full source in commands.pl). Now, each word (words in the DNA are separated by spaces) is an instruction to increase or decrease the fitness of the individual. If you make the rules such that it's too hard to gain fitness, your initial population doesn't stand a chance. If you loosen the rules too much, the individual fitness does not rise steadily as time goes on, making the genetic algorithm useless.
It wasn't surprising that the individuals adapted so quickly to these new rules and exploited them to achieve high scores after starting with completely random noise. What was surprising was that the rules were exploited in unexpected ways. For instance, when I made the rule that a numeric instruction would set the fitness to that number, the DNA dropped all other instructions in favor of the numeric one. When I cut the fitness for invalid instructions in an effort to stop the DNA from becoming all-numeric, the individuals simply moved the numeric instruction to the end of the DNA, where it was safe from invalid instructions.
Listing 1. The commands.pl fitness() function
# calculate the fitness of the DNA
sub fitness
{
my $dna = shift @_;
my @words = split ' ', dna_to_words($dna);
my $fitness = 0; # start with 0 fitness
my $max_entry_length = 20; # longest word we accept
# note that the 'words' here are command words or numbers
foreach my $word (@words) # execute all the words as "instructions"
{
if ($word eq 'M') # 'multiply' instruction
{
$fitness *= 2;
}
elsif ($word =~ /^A\D*(\d+)/) # 'amplify' instruction
{
$fitness *= $1
}
elsif ($word =~ /(\d+)/) # if the instruction is a number
{
$fitness += length($1); # increase the fitness depending on the run of digits
}
else
{
$fitness *= 0.80; # the punishment for a 'bad' instruction
}
}
return $fitness;
} # end of fitness()
|
I talk about the DNA and the individuals as if they were alive. In a way, they were. I will never forget their apathy when I gave them oppressive rules, and their exuberance when the rules permitted them to drastically raise their fitness in a way that I had not anticipated. Try to introduce new rules in the fitness() function and see for yourself how the individuals will evolve to survive.
After finishing up the command structure in the previous section, I asked myself what to do next. I could improve the algorithm, adding bells and whistles such as phenotypes or more flexible rules, or I could have more fun with the fitness() function.
In the Resources section, there's a link to MyBeasties, an advanced Perl module for genetic algorithm applications. I couldn't hope to improve on that module's comprehensive array of techniques for implementing the genetic algorithm, but within the framework we've built so far, I could make several fun examples that do not require the more advanced genetic algorithms techniques.
I implemented a sequence of instructions (moving from point A to point B) as the next fitness test. Every individual starts with fitness 1 at point A and can raise its fitness based on how close it gets to point B. Every command "U", "D", "L", or "R" moves up, down, left, or right, respectively. The "B" command moves back. A number after the command specifies how many times it is to run.
While moving, the individual has to follow a path. Stepping off the path gives the individual a fitness proportional to how far it got while on the path. See Listing 2; the full source is in motion.pl. Individuals regularly went nine steps on the path with a 110-byte DNA; it seems that a longer DNA allowed them to go further on the path as a matter of course. Without the numeric parameters, the individuals actually seemed to do better, perhaps because mutations are less likely to damage the DNA when instructions are simpler. Human DNA has only 4 basic building blocks (commonly written as G, A, T, and C), so this is not a far-fetched theory.
Note the way commands are implemented, with a flexible motion stack and a "back" command that is invoked in a manner consistent with the rest of the instructions (as a member of the %instructions hash). Also, use of the
Math::Complex module made simulating two-dimensional motion a trivial task. Adding more instructions would also be trivial. (Can you implement a "repeat last instruction" function?) Also note the drastic fitness increase for an instruction that follows the path.
Listing 2. The motion.pl fitness() function
# calculate the fitness of the DNA
sub fitness
{
my $dna = shift @_;
my @words = split ' ', dna_to_words($dna);
my $fitness = 2; # start with a small fitness
my $location = Math::Complex->make(0,0); # starting coordinates
# "ideal" coordinates
my $goal = Math::Complex->make(10,10);
# the path to the goal the DNA instructions must follow
my @path = split ' ', 'U U U R R R U U U R R R U U U R R R U R';
my $path_followed = 1;
# keep track of motion stack (array)
my @motion_stack;
# set up motion instructions
my $instructions = {
U => Math::Complex->make( 0, 1),
D => Math::Complex->make( 0,-1),
L => Math::Complex->make(-1, 0),
R => Math::Complex->make( 1, 0),
B => # move back
sub {
my $location = shift @_;
my $motion_stack = shift @_;
my $instructions = shift @_;
my $old_motion = pop @$motion_stack;
$location += $instructions->{$old_motion} if defined $old_motion;
return $location;
},
};
# note that the 'words' here are command words or numbers
foreach my $word (@words) # execute all the words as "instructions"
{
# only handle legitimate instructions (they can be embedded in bigger words)
my ($motion, $mag) = $word =~ m/([A-Z])(\d*)/;
if ($motion && exists $instructions->{$motion})
{
$mag = 1 unless $mag; # always run at least once
my $instruction = $instructions->{$motion};
foreach (1..$mag)
{
if (ref $instruction eq 'Math::Complex')
{
$location += $instruction; # use the motion vector
push @motion_stack, $motion;
}
elsif (ref $instruction eq 'CODE')
{
$location = $instruction->($location, \@motion_stack, $instructions);
# use the subroutine
}
}
}
}
# now see if the individual followed the necessary path
# (if he didn't, he "fell off the cliff")
foreach my $path_instruction (@path)
{
my $instruction = shift @motion_stack;
# get the actually executed instruction
if (defined $instruction && $instruction eq $path_instruction)
{
# increase fitness, this individual followed the proper path
$fitness *= 2;
$path_followed = 1; # the individual has not strayed
}
else
{
# increase fitness a little, so the individual is rewarded for
# having followed the path this far
$fitness++;
$path_followed = 0; # the individual has strayed, though
last;
}
}
if ($location == $goal && $path_followed)
{
# no point continuing, the individual found the target
die "Individual [@words] reached the target!"; }
return $fitness;
} # end of fitness()
|
In the original article, we used the DNA as a source of words and rewarded the individual's fitness for those words that were in our dictionary, depending on how long they were. Listing 3 (full source is in words2.pl) uses the original idea but employs approximate matching through the String::Approx module (which you may have to install). The idea is to reward not only for exact matches, but also for approximate matches (as a percentage). It would be best to reward on a sliding scale, but that would increase the complexity of the algorithm significantly.
This rewrite of the original proved very rewarding, routinely yielding three-letter words within 40 generations. The approximate match ensured that potentially successful DNA patterns were rewarded accordingly. Basing the reward not only on a match, but also on the length of the word fostered longer words in the DNA.
The fitness() function is similar to the one from the original example, but the reward structure is different and the loop is actually simplified. Notice how exact matches are significantly more valuable than approximate matches.
Listing 3. The words2.pl fitness() function
# this is a closure block!
{
# private static variable @dictionary in closure for fitness() only
my @dictionary;
# calculate the fitness of the DNA
sub fitness
{
my $dna = shift @_;
my @words = split ' ', dna_to_words($dna);
my $fitness = 0; # start with 0 fitness
my $max_entry_length = 20; # longest word we accept
my $precision = 90; # matching precision, in percent
my $imprecision = 100 - 90;
die "Can't use a negative imprecision (your precision must
be less than 100 but it's $precision)"
if $imprecision < 0;
# you can use any word list at the end of the program
# do the @dictionary initialization just once
unless (@dictionary)
{
@dictionary = <DATA>;
foreach (@dictionary)
{
chomp;
}
# eliminate words over $max_entry_length letters, and uppercase them
@dictionary = grep { length($_) > 1 && length($_) < $max_entry_length }
map { uc } @dictionary;
}
# there is no easy way to avoid this exhaustive check of the dictionary
# without complicating this example too much
foreach my $word (@words)
{
next unless length $ word > 1; # don't use single letters
# note we use "S#%", we don't want insert/append similarities,
# only replacement similarities
my @matches = amatch($word, "$imprecision%", @dictionary);
my @precise_matches = grep { $word eq $_ } @dictionary;
$fitness += scalar @matches;
$fitness += (10 ** length $_) foreach @precise_matches;
# reward longer words significantly more
}
return $fitness;
} # end of fitness()
}
|
I hope running the examples in this article is as fun for you as it was for me. Don't be afraid to play with the parameters and with the fitness() function. Be sure to examine MyBeasties (see the Resources section) if you are interested in using genetic algorithms in your own applications. Since speed is often a concern for genetic algorithm implementations, and thus they must be heavily optimized in most cases, there are few viable general-purpose genetic algorithm toolkits in any language. MyBeasties is a good example of a useful and fairly fast toolkit.
The field of genetic algorithms applications has unlimited potential. If quantum supercomputers are ever developed, genetic algorithms will suddenly become not only feasible, but preferable as an approach to solving certain problems. Quantum computing's premise of simultaneous evaluation of many potential solutions to a problem seems destined to fit in with genetic algorithms' premise of evolutionary computing with its large sample populations.
- Read Ted's other Perl articles in the "Cultured Perl" series on developerWorks.
- Download the code listings mentioned in this article:
- This article builds on my first article on genetic algorithms.
- Go to CPAN for all the Perl modules you could ever want.
- Go to Perl.com for more Perl information and related resources.
-
MyBeasties is a comprehensive set of Perl modules for genetic algorithm programming. They are much more advanced than the examples in this article as far as building a genetic algorithms framework goes, but perhaps they are too complex for the beginner Perl programmer.
-
String::Approx is a handy module for matching strings approximately. It is very fast.
- For a detailed overview of some of the interesting computational hurdles facing those involved in genome sequencing, read "Computational challenges in structural and functional genomics" (IBM Systems Journal 40:2, 2001)
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, three-tier client-server database architectures, UNIX system administration, CORBA, and project management.



