Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Cultured Perl: Genetic algorithms, the next generation

Advanced examples of genetic algorithms with Perl

Teodor Zlatanov (tzz@iglou.com), Programmer, Gold Software Systems
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, three-tier client-server database architectures, UNIX system administration, CORBA, and project management.

Summary:  Ted takes you another step towards working with genetic algorithms in Perl. He picks up where his first column on genetic algorithms left off, working with a listing that looks for a set of dictionary words in an individual's DNA.

Date:  01 Oct 2002
Level:  Introductory
Also available in:   Russian  Japanese

Activity:  6539 views
Comments:  

One of the more intriguing types of algorithm is the genetic algorithm. Genetic algorithms mimic Darwinian natural selection, where "fitness" selects individuals for survival, breeding, and, hence, adaptive mutation. I covered the background on this in a previous column, and I also showed two Perl implementations, one that bred bytes and another that bred words.

In this article, I cover more advanced material on genetic algorithms in Perl. You might want to go back and read the first article before diving into this one; genetic algorithms have well-defined steps, and some of the code here draws upon code in the earlier article without explaining the details.

Before you begin, you should have Perl 5.6.0 or later installed on your system. The examples might work with earlier versions of Perl and on platforms other than the common UNIX ones (such as Windows), but they have not been tested in those environments, and it might take extra work to get them running.

Words again

The previous article showed an example that looked for a small set of dictionary words in the DNA of the individuals, and ranked individuals based on how many dictionary words (the longer, the better) the individual's DNA had.

Here, we begin with this dictionary example and modify it as shown in Listing 1 (download the full source in commands.pl). Now, each word (words in the DNA are separated by spaces) is an instruction to increase or decrease the fitness of the individual. If you make the rules such that it's too hard to gain fitness, your initial population doesn't stand a chance. If you loosen the rules too much, the individual fitness does not rise steadily as time goes on, making the genetic algorithm useless.

It wasn't surprising that the individuals adapted so quickly to these new rules and exploited them to achieve high scores after starting with completely random noise. What was surprising was that the rules were exploited in unexpected ways. For instance, when I made the rule that a numeric instruction would set the fitness to that number, the DNA dropped all other instructions in favor of the numeric one. When I cut the fitness for invalid instructions in an effort to stop the DNA from becoming all-numeric, the individuals simply moved the numeric instruction to the end of the DNA, where it was safe from invalid instructions.


Listing 1. The commands.pl fitness() function
		
# calculate the fitness of the DNA
sub fitness
{
 my $dna = shift @_;  
 my @words = split ' ', dna_to_words($dna);
 my $fitness = 0;			# start with 0 fitness
 my $max_entry_length = 20;		# longest word we accept
 
 # note that the 'words' here are command words or numbers
 
 foreach my $word (@words)		# execute all the words as "instructions"
 {
  if ($word eq 'M')	# 'multiply' instruction
  {
   $fitness *= 2;
  }
  elsif ($word =~ /^A\D*(\d+)/)	# 'amplify' instruction
  {
   $fitness *= $1
  }
  elsif ($word =~ /(\d+)/)	      # if the instruction is a number
  {
   $fitness += length($1); # increase the fitness depending on the run of digits
  }
  else
  {
   $fitness *= 0.80;	      # the punishment for a 'bad' instruction
  }
 }
 
 return $fitness;
}					# end of fitness()

I talk about the DNA and the individuals as if they were alive. In a way, they were. I will never forget their apathy when I gave them oppressive rules, and their exuberance when the rules permitted them to drastically raise their fitness in a way that I had not anticipated. Try to introduce new rules in the fitness() function and see for yourself how the individuals will evolve to survive.


Harder puzzles

After finishing up the command structure in the previous section, I asked myself what to do next. I could improve the algorithm, adding bells and whistles such as phenotypes or more flexible rules, or I could have more fun with the fitness() function.

In the Resources section, there's a link to MyBeasties, an advanced Perl module for genetic algorithm applications. I couldn't hope to improve on that module's comprehensive array of techniques for implementing the genetic algorithm, but within the framework we've built so far, I could make several fun examples that do not require the more advanced genetic algorithms techniques.

I implemented a sequence of instructions (moving from point A to point B) as the next fitness test. Every individual starts with fitness 1 at point A and can raise its fitness based on how close it gets to point B. Every command "U", "D", "L", or "R" moves up, down, left, or right, respectively. The "B" command moves back. A number after the command specifies how many times it is to run.

While moving, the individual has to follow a path. Stepping off the path gives the individual a fitness proportional to how far it got while on the path. See Listing 2; the full source is in motion.pl. Individuals regularly went nine steps on the path with a 110-byte DNA; it seems that a longer DNA allowed them to go further on the path as a matter of course. Without the numeric parameters, the individuals actually seemed to do better, perhaps because mutations are less likely to damage the DNA when instructions are simpler. Human DNA has only 4 basic building blocks (commonly written as G, A, T, and C), so this is not a far-fetched theory.

Note the way commands are implemented, with a flexible motion stack and a "back" command that is invoked in a manner consistent with the rest of the instructions (as a member of the %instructions hash). Also, use of the Math::Complex module made simulating two-dimensional motion a trivial task. Adding more instructions would also be trivial. (Can you implement a "repeat last instruction" function?) Also note the drastic fitness increase for an instruction that follows the path.


Listing 2. The motion.pl fitness() function
		
# calculate the fitness of the DNA
sub fitness
{
 my $dna = shift @_;  
 my @words = split ' ', dna_to_words($dna);

 my $fitness  = 2;			# start with a small fitness
 my $location = Math::Complex->make(0,0); # starting coordinates
 # "ideal" coordinates
 my $goal     = Math::Complex->make(10,10);

 # the path to the goal the DNA instructions must follow
 my @path = split ' ', 'U U U R R R U U U R R R U U U R R R U R';
 my $path_followed = 1;
 
 # keep track of motion stack (array)
 my @motion_stack;
 
 # set up motion instructions
 my $instructions = {
		     U  => Math::Complex->make( 0, 1),
		     D  => Math::Complex->make( 0,-1),
		     L  => Math::Complex->make(-1, 0),
		     R  => Math::Complex->make( 1, 0),
		     B =>		# move back
		     sub {
		      my $location = shift @_;
		      my $motion_stack = shift @_;
		      my $instructions = shift @_;
		      my $old_motion = pop @$motion_stack;
		      $location += $instructions->{$old_motion} if defined $old_motion;
		      return $location;
		     },
		    };
 
 # note that the 'words' here are command words or numbers
 
 foreach my $word (@words)		# execute all the words as "instructions"
 {
  # only handle legitimate instructions (they can be embedded in bigger words)
  my ($motion, $mag) = $word =~ m/([A-Z])(\d*)/;
  if ($motion && exists $instructions->{$motion})
  {
   $mag = 1 unless $mag;		# always run at least once
   my $instruction = $instructions->{$motion};
   foreach (1..$mag)
   {
    if (ref $instruction eq 'Math::Complex')
    {
     $location += $instruction;		# use the motion vector
     push @motion_stack, $motion;
    }
    elsif (ref $instruction eq 'CODE')
    {
     $location = $instruction->($location, \@motion_stack, $instructions);
       # use the subroutine
    }
   }
  }
 }

 # now see if the individual followed the necessary path
   # (if he didn't, he "fell off the cliff")
 foreach my $path_instruction (@path)
 {
  my $instruction = shift @motion_stack;
    # get the actually executed instruction
  if (defined $instruction && $instruction eq $path_instruction)
  {
   # increase fitness, this individual followed the proper path
   $fitness *= 2;
   $path_followed = 1;			# the individual has not strayed
  }
  else
  {
   # increase fitness a little, so the individual is rewarded for
     # having followed the path this far
   $fitness++;
   $path_followed = 0;	# the individual has strayed, though
   last;
  }
 }

 if ($location == $goal && $path_followed)
 {
  # no point continuing, the individual found the target
  die "Individual [@words] reached the target!"; }
 
 return $fitness;
}					# end of fitness()
	


Back to words

In the original article, we used the DNA as a source of words and rewarded the individual's fitness for those words that were in our dictionary, depending on how long they were. Listing 3 (full source is in words2.pl) uses the original idea but employs approximate matching through the String::Approx module (which you may have to install). The idea is to reward not only for exact matches, but also for approximate matches (as a percentage). It would be best to reward on a sliding scale, but that would increase the complexity of the algorithm significantly.

This rewrite of the original proved very rewarding, routinely yielding three-letter words within 40 generations. The approximate match ensured that potentially successful DNA patterns were rewarded accordingly. Basing the reward not only on a match, but also on the length of the word fostered longer words in the DNA.

The fitness() function is similar to the one from the original example, but the reward structure is different and the loop is actually simplified. Notice how exact matches are significantly more valuable than approximate matches.


Listing 3. The words2.pl fitness() function
		
# this is a closure block!
{                               
 # private static variable @dictionary in closure for fitness() only
 my @dictionary;
 # calculate the fitness of the DNA
 sub fitness
 {
  my $dna = shift @_;  
  my @words = split ' ', dna_to_words($dna);
  my $fitness = 0;			# start with 0 fitness
  my $max_entry_length = 20;		# longest word we accept
  my $precision = 90;			# matching precision, in percent
  my $imprecision = 100 - 90;

  die "Can't use a negative imprecision (your precision must
                       be less than 100 but it's $precision)"
   if $imprecision < 0;

  # you can use any word list at the end of the program
  # do the @dictionary initialization just once
  unless (@dictionary)
  {
   @dictionary = <DATA>;
   foreach (@dictionary)
   {
    chomp;
   }   

   # eliminate words over $max_entry_length letters, and uppercase them
   @dictionary = grep { length($_) > 1  && length($_) < $max_entry_length }
    map { uc } @dictionary;
  }

  # there is no easy way to avoid this exhaustive check of the dictionary
  # without complicating this example too much
  foreach my $word (@words)
  {
   next unless length $ word > 1;	# don't use single letters
   # note we use "S#%", we don't want insert/append similarities,
     # only replacement similarities
   my @matches = amatch($word, "$imprecision%", @dictionary); 
   my @precise_matches = grep { $word eq $_ } @dictionary;
   $fitness += scalar @matches;
   $fitness += (10 ** length $_) foreach @precise_matches;
     # reward longer words significantly more
  }
  return $fitness;
 } # end of fitness()
}
  


Conclusion

I hope running the examples in this article is as fun for you as it was for me. Don't be afraid to play with the parameters and with the fitness() function. Be sure to examine MyBeasties (see the Resources section) if you are interested in using genetic algorithms in your own applications. Since speed is often a concern for genetic algorithm implementations, and thus they must be heavily optimized in most cases, there are few viable general-purpose genetic algorithm toolkits in any language. MyBeasties is a good example of a useful and fairly fast toolkit.

The field of genetic algorithms applications has unlimited potential. If quantum supercomputers are ever developed, genetic algorithms will suddenly become not only feasible, but preferable as an approach to solving certain problems. Quantum computing's premise of simultaneous evaluation of many potential solutions to a problem seems destined to fit in with genetic algorithms' premise of evolutionary computing with its large sample populations.


Resources

  • Read Ted's other Perl articles in the "Cultured Perl" series on developerWorks.

  • Download the code listings mentioned in this article:


  • This article builds on my first article on genetic algorithms.



  • Go to CPAN for all the Perl modules you could ever want.



  • Go to Perl.com for more Perl information and related resources.



  • MyBeasties is a comprehensive set of Perl modules for genetic algorithm programming. They are much more advanced than the examples in this article as far as building a genetic algorithms framework goes, but perhaps they are too complex for the beginner Perl programmer.



  • String::Approx is a handy module for matching strings approximately. It is very fast.



  • For a detailed overview of some of the interesting computational hurdles facing those involved in genome sequencing, read "Computational challenges in structural and functional genomics" (IBM Systems Journal 40:2, 2001)



About the author

Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, three-tier client-server database architectures, UNIX system administration, CORBA, and project management.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=11251
ArticleTitle=Cultured Perl: Genetic algorithms, the next generation
publish-date=10012002
author1-email=tzz@iglou.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers