Skip to main content

Cultured Perl: Tied variables

Examples of tying scalar, array, and hash variables through CPAN modules

Teodor Zlatanov (tzz@iglou.com), Programmer, Gold Software Systems
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, three-tier client-server database architectures, UNIX system administration, CORBA, and project management.

Summary:  Ted explains the basics of tying variables, using concrete examples of CPAN modules through their usage and implementation. He covers scalar, array, and hash variables.

Date:  01 Jan 2003
Level:  Introductory
Activity:  1855 views

Before we get started, you must have Perl 5.005 or newer installed on your system (see Resources for a link on where to acquire it). Preferably, your system should be a recent (2000 or later) mainstream UNIX (Linux, Solaris, BSD, etc.) installation, but other operating systems may work as well. The examples may work with earlier versions of Perl and UNIX, or with other operating systems -- but any failure on their part to function under such conditions should be considered an exercise for the reader to solve.

Tied variables are an essential tool for any Perl programmer. It is easiest to reuse existing code that uses the Tie::Scalar, Tie::Array, or Tie::Hash interfaces, but an understanding of the magic behind the scenes will prove useful, regardless of whether you want to write your own variations on that theme or just optimize your usage of tied variables.

Let's look at the three main types of tied variables: scalars, arrays, and hashes. The complexity of tied filehandles makes them a more advanced topic.

For each CPAN module mentioned in this article, you can look at the implementation with the CPAN interface. Type "cpan" or "perl -MCPAN -eshell" at the UNIX prompt, and you'll get a secondary prompt. Type "look Tie::Scalar::Timeout," for instance, to look at the Tie::Scalar::Timeout module, and you'll be able to see the contents of that module.

What is the meaning of "tying" a variable? The verb "tie" is used here as a synonym for "bind." Tying a variable is basically the binding of functions to the internal triggers for reading and writing that variable. This means that you, the programmer, can tell Perl to do something extra when a variable is used. From this simple premise, the tying interface has evolved into an object-oriented methodology in Perl, hiding OOP complexity behind a procedural interface.

Tying scalars

Scalar variables are simple and indispensable. They hold only one piece of data: a string, a number, the undefined value, a reference to another variable. The "$" in front of a variable tells Perl to treat it as a scalar. It is a simple matter to use a scalar variable:


Listing 1. Normal scalars
				
my $a = 'Hello';
$a = 'there';
$a = 89.2;

It's just as easy to use a tied scalar variable. For instance, let's take the example from the wonderful Tie::Scalar::Timeout module:


Listing 2. Tied scalars
				
use Tie::Scalar::Timeout;

tie my $k, 'Tie::Scalar::Timeout', EXPIRES => '+2s';

$k = 123;
sleep(3);
# $k is now undef

The first part, where the tie() function is invoked, is how we tell Perl that the variable $k is actually tied to the Tie::Scalar::Timeout package. Behind the scenes, Perl runs the TIESCALAR() function in the Tie::Scalar::Timeout module (essentially, this is like invoking new() on a regular object). TIESCALAR() returns an object of type Tie::Scalar::Timeout, which is assigned to $k.

The particular parameters given to Tie::Scalar::Timeout in the example ensure that it will expire after two seconds. The module provides other options, such as an expiration after a certain number of reads. Every time you read from the $k variable created in the above example, you invoke the FETCH() method in the Tie::Scalar::Timeout module:


Listing 3. How Tie::Scalar::Timeout does it internally
				
sub FETCH {
        my $self = shift;

        # if num_uses isn't set or set to a negative value, it won't
        # influence the expiry process

        if (($self->{NUM_USES} == 0) ||
           (time >= $self->{EXPIRY_TIME})) {
                # policy can be a coderef or a plain value
                return &{ $self->{POLICY} } if ref($self->{POLICY}) eq 'CODE';
                return $self->{POLICY};
        }
        $self->{NUM_USES}-- if $self->{NUM_USES} > 0;
        return $self->{VALUE};
}

Every time you write to a tied scalar variable, you invoke its STORE() method. Scalars also have UNTIE() and DESTROY() methods, which are normally not needed.

Note that a tied scalar variable, and any tied variable for that matter, needs to store its actual data somewhere. With Tie::Scalar::Timeout the data is stored in $self->{VALUE}, since the scalar variable we know as $k is actually a hash behind the scenes. Perl hides this layer of complexity from us, creating a sort of encapsulation very similar to what exists in OOP.

The meaning of the code above is that every time the value of the $k variable is requested, it could change. Thus, if you want your own Schroedinger box, just use the Tie::Scalar::Timeout module with a random timeout between 0 and 100, and read the value at 50 seconds. Assuming a good random number generator, you will get either 1 or undef based on the timeout. We assume that the time spent executing the instructions in the program is negligible, but there is a slight bias introduced by that time. Yes, we could just see if rand(100) is above 50, but where's the fun in that?


Listing 4. Schroedinger's timeout
				
use Tie::Scalar::Timeout;

# the timeout will be between 0 and 99
my $random_timeout = rand 100; 

tie my $k, 'Tie::Scalar::Timeout', VALUE => 1, EXPIRES => "+${random_timeout}s";

sleep(50);

print 'The timeout ', 
      ($k) ? 'did not happen' : 'happened', 
      "\n";

Getting a cat and shooting it if the timeout happens is left as an exercise for the reader.


Tying arrays

Arrays are trickier than scalars. They are sequential collections of scalars, so extra functionality is needed. Arrays have the TIEARRAY() (constructor, like new() for tied arrays), FETCH()/STORE() (same idea as with tied scalars, but with extra parameters), FETCHSIZE()/STORESIZE() (for array size management), and UNTIE() and DESTROY(), which can be used to close a file, for instance, or flush the output.

For tied arrays, FETCH() and STORE() need an extra parameter compared to their equivalents in tied scalars. The extra parameter is the index in the array. FETCHSIZE() and STORESIZE() are what's used when you invoke scalar(@ARRAY) and $#ARRAY = x, respectively.

The DELETE() and EXISTS() functions need to be implemented if the corresponding Perl delete() and exists() functions are needed.

There are also the POP(), PUSH(), SHIFT(), UNSHIFT(), SPLICE(), and EXTEND() functions (the first five correspond to the Perl functions of the same name in lowercase), but normally a module writer would inherit from Tie::StdArray and get those methods already implemented.

POP(), for instance, could be implemented as a FETCH() of the last element, followed by STORESIZE(FETCHSIZE()-1) (setting the size of the array to be one less, effectively removing the last element). Of course, if you're implementing POP() yourself, you either know exactly what you're doing, or you don't exactly know what you're doing.

If you want to write your own tied array, make sure to inherit from Tie::StdArray (see "perldoc Tie::Array"). All the functions are already defined for you, and you just need to override the ones you want to modify -- no need to reinvent the wheel. As a side note, tied arrays happen to be the most complex tied variable type, and the least often implemented on CPAN by my count. Hashes are not nearly as difficult. (If you are interested in the implementation particulars, look at the Tie::CharArray source code.)

As an example of tied arrays, we'll look at the CPAN Tie::CharArray module. That module allows programmers to treat a string as an array of letters, either as numeric codes or as one-character strings. An example from the documentation:


Listing 5. A string as an array
				
use Tie::CharArray;
my $foobar = 'a string';

tie my @foo, 'Tie::CharArray', $foobar;
$foo[0] = 'A';    # $foobar = 'A string'
push @foo, '!';   # $foobar = 'A string!'

This should be painfully familiar to C/C++/Java programmers.

Note that if, in the above example, line 3 were written as

tie my @foo, 'Tie::CharArray', 'a string';

then it would fail with the message "Modification of a read-only value attempted." Indeed, 'a string' is a constant string you can't modify. The @foo array uses the string passed to it directly, modifying the original string if a value is assigned to it.

Tie::CharArray is in fact a very good way to forget about the particulars of the Perl substr() or pack()/unpack() or split() functions. If you need to modify the characters at position 5 through 28 in a string, you could use substr() or pack()/unpack() or split(). You could also just write:


Listing 6. Addressing individual letters
				
use Tie::CharArray qw/chars/; 
$f = "jello is yellow"; 
my $chars = chars $f; 
foreach (5..28) 
{
 $chars->[$_] = "!";
};

Which one you prefer (built-in string manipulations or Tie::CharArray) is a matter of taste, but it's hard to argue against the readability of Listing 6.


Tying hashes

Now we get to the good stuff. Tied hashes are easier to write than tied arrays, and more useful.

Tied hashes implement the TIEHASH() constructor, the FETCH()/STORE() access methods, the EXISTS()/DELETE() methods which act exactly like exists() and delete() in Perl, CLEAR() to clear the hash, and FIRSTKEY()/NEXTKEY() to iterate through the array. You can just inherit from the Tie::StdHash package (in the Tie::Hash perldoc), which defines every method you will need, so you just need to override the ones you want.

We'll see the exact implementation of a tied hash in my Tie::Hash::TwoWay module. The module maintains two hashes internally, and automatically creates a reverse mapping in the second one when the first one gets data. For instance, if you assign the value ["Fido"] with key "dog" and the value ["Fido"] with key "friend" to a Tie::Hash::TwoWay tied hash (values have to be in an array reference), there will suddenly be a key "Fido" with values "dog" and "friend" (in an array) in that same Tie::Hash::TwoWay tied hash. See the example from the documentation:


Listing 7. Tie::Hash::TwoWay usage
				
use Tie::Hash::TwoWay;
tie %hash, 'Tie::Hash::TwoWay';

my %list = (
            Asimov => ['novelist', 'scientist'],
            King => ['novelist', 'horror'],
           );

foreach (keys %list)                  # these are the primary keys of the hash
{
 $hash{$_} = $list{$_};
}

# these will all print 'yes'
print 'yes' if exists $hash{scientist};
print 'yes' if exists $hash{novelist}->{Asimov};
print 'yes' if exists $hash{novelist}->{King};
print 'yes' if exists $hash{King}->{novelist};

Tie::Hash::TwoWay inherits from the Tie::StdHash module and overrides the STORE(), FETCH(), EXISTS(), DELETE(), CLEAR(), FIRSTKEY(), and NEXTKEY() methods. In addition, it defines a secondary_keys() method to get just the reverse mapping keys. Primary keys are stored in $self->{1} and secondary keys in $self->{0}; the numeric constants have symbolic names PRIMARY and SECONDARY that make them more readable in my opinion.

The following is the code of Tie::Hash::TwoWay, as it is in the module, except for the tying/inheritance/initialization preamble and the documentation. The listings are broken up by function. Remember we inherit from Tie::StdHash, so we don't need to define the TIEHASH() method for instance. We only redefine the methods whose behavior we need to modify.


Listing 8. The STORE() function
				
sub STORE
{
 my ($self, $key, $value) = @_;
 my $val_array_ref;

 if (ref $value eq 'ARRAY')		# array refs can be recognized
 {
  $val_array_ref = $value;
 }
 else			# everything else gets converted to array refs
 {
  $val_array_ref = [ $value ];
 }

 # add the values in the passed array to the primary and secondary hashes
 foreach my $value (@$val_array_ref)
 {
  $self->{SECONDARY}->{$value}->{$key} = 1;
  $self->{PRIMARY}->{$key}->{$value} = 1;
 }

 return 1;
}

The STORE() function creates an entry in both the primary and the secondary array (normal and reverse mapping). Arrays get treated directly, anything else gets treated as a scalar (and inserted in an array reference).


Listing 9. The FETCH() function
				
# return the primary or secondary key, in that order (duplicate keys
# are not detected here)
sub FETCH
{
 my ($self, $key) = @_;
 
 exists $self->{PRIMARY}->{$key} &&
  return $self->{PRIMARY}->{$key};
 
 exists $self->{SECONDARY}->{$key} &&
  return $self->{SECONDARY}->{$key};
 
 return undef;
}

The FETCH() function retrieves a key from either the primary or the secondary hash, where the primary hash is given preference. The logical shortcut of (statement1) && (statement2) is a common Perl idiom.


Listing 10. The EXISTS() function
				
# return the primary or secondary key existence, in that order
# (duplicate keys are not detected here)
sub EXISTS
{
 my ($self, $key) = @_;
 
 return undef unless (exists $self->{PRIMARY} &&
                      exists $self->{SECONDARY});
 
 return (exists $self->{PRIMARY}->{$key} ||
         exists $self->{SECONDARY}->{$key});
}

The EXISTS() function checks the existence of a key in the forward and reverse mappings, in that order.


Listing 11. The DELETE() function
				
# delete the primary or secondary key, in that order (duplicate keys
# are not detected here)
sub DELETE
{
 my ($self, $key) = @_;
 
 return undef unless (exists $self->{PRIMARY} &&
                      exists $self->{SECONDARY});
 
 # make sure to delete reverse associations as well
 if (exists $self->{PRIMARY}->{$key})
 {
  
  foreach (keys %{$self->{SECONDARY}})
  {
   delete $self->{SECONDARY}->{$_}->{$key};
   delete $self->{SECONDARY}->{$_}
    unless scalar keys %{$self->{SECONDARY}->{$_}};
  }
  
  return delete $self->{PRIMARY}->{$key};
 }

 if (exists $self->{SECONDARY}->{$key})
 {

  foreach (keys %{$self->{PRIMARY}})
  {
   delete $self->{PRIMARY}->{$_}->{$key};
   delete $self->{PRIMARY}->{$_}
    unless scalar keys %{$self->{PRIMARY}->{$_}};
  }

  return delete $self->{SECONDARY}->{$key};
 }

}

Deletions are a little more complex, because we also want to delete the reverse mapping for the key we're deleting. Thus, if we associate a->b, we want to remove the b->a (secondary) mapping as well, and remove the array value of the mapping if it's empty.


Listing 12. The CLEAR() function
				
sub CLEAR
{
 my ($self, $key) = @_;

 %$self = ();                           # clear the whole hash

 return 1;
}

Clearing the internal hash is very simple, and we can keep using this object because of the auto-vivification in STORE().


Listing 13. The FIRSTKEY() and NEXTKEY() functions
				

sub FIRSTKEY
{
 my ($self) = @_;

 return undef unless (exists $self->{PRIMARY} &&
                      exists $self->{SECONDARY});

 return each %{$self->{PRIMARY}};
}

sub NEXTKEY
{
 my ($self, $lastkey) = @_;

 return undef unless (exists $self->{PRIMARY} &&
                      exists $self->{SECONDARY});

 return each %{$self->{PRIMARY}};
}

The iterators FIRSTKEY() and NEXTKEY() seem complex, but in reality they just let the each() Perl function do all the work.


Listing 14. The secondary_keys() function
				
sub secondary_keys
{
 my ($self) = @_;
 
 return undef unless (exists $self->{PRIMARY} &&
                      exists $self->{SECONDARY});
 
 return keys %{$self->{SECONDARY}};
} 

Because the regular keys() iteration of FIRSTKEY() and NEXTKEY() is only over the primary keys, the secondary_keys() function is provided for the secondary mapping.


Conclusion

Unfortunately, tied filehandles are too complex to be covered in this article. They do offer fascinating possibilities, for instance writing/reading a file handle could write/read directly to a database, or send an e-mail when you close the file.

Tying disk (file) databases to a hash is a well-covered topic. You can read the "perldoc DB_File" documentation, see the relevant chapters in the Programming Perl Third Edition book, or read online through one of the many tutorials (see the Resources section for links to some of these sources).

From what we have done here, though, you can see that tied variables are very useful, and I encourage you to look through the many CPAN modules that deal with tied variables. You will almost certainly find a module that suits your needs.


Resources

About the author

Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, three-tier client-server database architectures, UNIX system administration, CORBA, and project management.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=11282
ArticleTitle=Cultured Perl: Tied variables
publish-date=01012003
author1-email=tzz@iglou.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers