Before we get started, you must have Perl 5.005 or newer installed on your system (see Resources for a link on where to acquire it). Preferably, your system should be a recent (2000 or later) mainstream UNIX (Linux, Solaris, BSD, etc.) installation, but other operating systems may work as well. The examples may work with earlier versions of Perl and UNIX, or with other operating systems -- but any failure on their part to function under such conditions should be considered an exercise for the reader to solve.
Tied variables are an essential tool for any Perl programmer. It is
easiest to reuse existing code that uses the Tie::Scalar, Tie::Array, or
Tie::Hash interfaces, but an understanding of the magic behind the scenes
will prove useful, regardless of whether you want to write your own variations on that theme or just optimize your usage of tied variables.
Let's look at the three main types of tied variables: scalars, arrays, and hashes. The complexity of tied filehandles makes them a more advanced topic.
For each CPAN module mentioned in this article, you can look at the
implementation with the CPAN interface. Type "cpan" or "perl -MCPAN -eshell" at the UNIX prompt, and you'll get a secondary prompt. Type
"look Tie::Scalar::Timeout," for instance, to look at the
Tie::Scalar::Timeout module, and you'll be able to see the contents of
that module.
What is the meaning of "tying" a variable? The verb "tie" is used here as a synonym for "bind." Tying a variable is basically the binding of functions to the internal triggers for reading and writing that variable. This means that you, the programmer, can tell Perl to do something extra when a variable is used. From this simple premise, the tying interface has evolved into an object-oriented methodology in Perl, hiding OOP complexity behind a procedural interface.
Scalar variables are simple and indispensable. They hold only one piece of data: a string, a number, the undefined value, a reference to another variable. The "$" in front of a variable tells Perl to treat it as a scalar. It is a simple matter to use a scalar variable:
Listing 1. Normal scalars
my $a = 'Hello'; $a = 'there'; $a = 89.2; |
It's just as easy to use a tied scalar variable. For instance, let's
take the example from the wonderful Tie::Scalar::Timeout module:
Listing 2. Tied scalars
use Tie::Scalar::Timeout; tie my $k, 'Tie::Scalar::Timeout', EXPIRES => '+2s'; $k = 123; sleep(3); # $k is now undef |
The first part, where the tie() function is invoked, is how we tell Perl
that the variable $k is actually tied to the Tie::Scalar::Timeout package.
Behind the scenes, Perl runs the TIESCALAR() function in the
Tie::Scalar::Timeout module (essentially, this is like invoking new() on a
regular object). TIESCALAR() returns an object of type
Tie::Scalar::Timeout, which is assigned to $k.
The particular parameters given to Tie::Scalar::Timeout in the example
ensure that it will expire after two seconds. The module provides other
options, such as an expiration after a certain number of reads.
Every time you read from the $k variable created in the above example, you
invoke the FETCH() method in the Tie::Scalar::Timeout module:
Listing 3. How Tie::Scalar::Timeout does it internally
sub FETCH {
my $self = shift;
# if num_uses isn't set or set to a negative value, it won't
# influence the expiry process
if (($self->{NUM_USES} == 0) ||
(time >= $self->{EXPIRY_TIME})) {
# policy can be a coderef or a plain value
return &{ $self->{POLICY} } if ref($self->{POLICY}) eq 'CODE';
return $self->{POLICY};
}
$self->{NUM_USES}-- if $self->{NUM_USES} > 0;
return $self->{VALUE};
}
|
Every time you write to a tied scalar variable, you invoke its STORE()
method. Scalars also have UNTIE() and DESTROY() methods, which are
normally not needed.
Note that a tied scalar variable, and any tied variable for
that matter, needs to store its actual data somewhere. With
Tie::Scalar::Timeout the data is stored in $self->{VALUE}, since the
scalar variable we know as $k is actually a hash behind the scenes. Perl
hides this layer of complexity from us, creating a sort of encapsulation
very similar to what exists in OOP.
The meaning of the code above is that every time the value of the $k
variable is requested, it could change. Thus, if you want your own
Schroedinger box, just use the Tie::Scalar::Timeout module with a random
timeout between 0 and 100, and read the value at 50 seconds. Assuming a
good random number generator, you will get either 1 or undef based on the
timeout. We assume that the time spent executing the instructions in the
program is negligible, but there is a slight bias introduced by that time.
Yes, we could just see if rand(100) is above 50, but where's the fun in that?
Listing 4. Schroedinger's timeout
use Tie::Scalar::Timeout;
# the timeout will be between 0 and 99
my $random_timeout = rand 100;
tie my $k, 'Tie::Scalar::Timeout', VALUE => 1, EXPIRES => "+${random_timeout}s";
sleep(50);
print 'The timeout ',
($k) ? 'did not happen' : 'happened',
"\n";
|
Getting a cat and shooting it if the timeout happens is left as an exercise for the reader.
Arrays are trickier than scalars. They are sequential collections of
scalars, so extra functionality is needed. Arrays have the TIEARRAY()
(constructor, like new() for tied arrays), FETCH()/STORE() (same idea as
with tied scalars, but with extra parameters), FETCHSIZE()/STORESIZE()
(for array size management), and UNTIE() and DESTROY(), which can be used
to close a file, for instance, or flush the output.
For tied arrays, FETCH() and STORE() need an extra parameter compared
to their equivalents in tied scalars. The extra parameter is the index in
the array. FETCHSIZE() and STORESIZE() are what's used when you invoke
scalar(@ARRAY) and $#ARRAY = x, respectively.
The DELETE() and EXISTS() functions need to be implemented if the
corresponding Perl delete() and exists() functions are needed.
There are also the POP(), PUSH(), SHIFT(), UNSHIFT(), SPLICE(), and
EXTEND() functions (the first five correspond to the Perl functions of the
same name in lowercase), but normally a module writer would inherit from
Tie::StdArray and get those methods already implemented.
POP(), for instance, could be implemented as a FETCH() of the last
element, followed by STORESIZE(FETCHSIZE()-1) (setting the size of the
array to be one less, effectively removing the last element). Of course,
if you're implementing POP() yourself, you either know exactly what you're
doing, or you don't exactly know what you're doing.
If you want to write your own tied array, make sure to inherit from
Tie::StdArray (see "perldoc Tie::Array"). All the functions are already
defined for you, and you just need to override the ones you want to modify
-- no need to reinvent the wheel. As a side note, tied arrays happen to
be the most complex tied variable type, and the least often implemented on
CPAN by my count. Hashes are not nearly as difficult. (If you are
interested in the implementation particulars, look at the Tie::CharArray
source code.)
As an example of tied arrays, we'll look at the CPAN Tie::CharArray module. That module allows programmers to treat a string as an array of letters, either as numeric codes or as one-character strings. An example from the documentation:
Listing 5. A string as an array
use Tie::CharArray; my $foobar = 'a string'; tie my @foo, 'Tie::CharArray', $foobar; $foo[0] = 'A'; # $foobar = 'A string' push @foo, '!'; # $foobar = 'A string!' |
This should be painfully familiar to C/C++/Java programmers.
Note that if, in the above example, line 3 were written as
tie my @foo, 'Tie::CharArray', 'a string';
then it would fail with the message "Modification of a read-only value
attempted." Indeed, 'a string' is a constant string you can't modify.
The @foo array uses the string passed to it directly, modifying the
original string if a value is assigned to it.
Tie::CharArray is in fact a very good way to forget about the
particulars of the Perl substr() or pack()/unpack() or split() functions.
If you need to modify the characters at position 5 through 28 in a string,
you could use substr() or pack()/unpack() or split(). You could also just
write:
Listing 6. Addressing individual letters
use Tie::CharArray qw/chars/;
$f = "jello is yellow";
my $chars = chars $f;
foreach (5..28)
{
$chars->[$_] = "!";
};
|
Which one you prefer (built-in string manipulations or Tie::CharArray) is
a matter of taste, but it's hard to argue against the readability of
Listing 6.
Now we get to the good stuff. Tied hashes are easier to write than tied arrays, and more useful.
Tied hashes implement the TIEHASH() constructor, the FETCH()/STORE()
access methods, the EXISTS()/DELETE() methods which act exactly like
exists() and delete() in Perl, CLEAR() to clear the hash, and
FIRSTKEY()/NEXTKEY() to iterate through the array. You can just inherit
from the Tie::StdHash package (in the Tie::Hash perldoc), which defines
every method you will need, so you just need to override the ones you
want.
We'll see the exact implementation of a tied hash in my
Tie::Hash::TwoWay module. The module maintains two hashes internally, and
automatically creates a reverse mapping in the second one when the first
one gets data. For instance, if you assign the value ["Fido"] with key
"dog" and the value ["Fido"] with key "friend" to a Tie::Hash::TwoWay tied
hash (values have to be in an array reference), there will suddenly be a
key "Fido" with values "dog" and "friend" (in an array) in that same
Tie::Hash::TwoWay tied hash. See the example from the documentation:
Listing 7. Tie::Hash::TwoWay usage
use Tie::Hash::TwoWay;
tie %hash, 'Tie::Hash::TwoWay';
my %list = (
Asimov => ['novelist', 'scientist'],
King => ['novelist', 'horror'],
);
foreach (keys %list) # these are the primary keys of the hash
{
$hash{$_} = $list{$_};
}
# these will all print 'yes'
print 'yes' if exists $hash{scientist};
print 'yes' if exists $hash{novelist}->{Asimov};
print 'yes' if exists $hash{novelist}->{King};
print 'yes' if exists $hash{King}->{novelist};
|
Tie::Hash::TwoWay inherits from the Tie::StdHash module and overrides
the STORE(), FETCH(), EXISTS(), DELETE(), CLEAR(), FIRSTKEY(), and
NEXTKEY() methods. In addition, it defines a secondary_keys() method to
get just the reverse mapping keys. Primary keys are stored in $self->{1}
and secondary keys in $self->{0}; the numeric constants have symbolic
names PRIMARY and SECONDARY that make them more readable in my opinion.
The following is the code of Tie::Hash::TwoWay, as it is in the
module, except for the tying/inheritance/initialization preamble and the
documentation. The listings are broken up by function. Remember we
inherit from Tie::StdHash, so we don't need to define the TIEHASH()
method for instance. We only redefine the methods whose behavior we need
to modify.
Listing 8. The STORE() function
sub STORE
{
my ($self, $key, $value) = @_;
my $val_array_ref;
if (ref $value eq 'ARRAY') # array refs can be recognized
{
$val_array_ref = $value;
}
else # everything else gets converted to array refs
{
$val_array_ref = [ $value ];
}
# add the values in the passed array to the primary and secondary hashes
foreach my $value (@$val_array_ref)
{
$self->{SECONDARY}->{$value}->{$key} = 1;
$self->{PRIMARY}->{$key}->{$value} = 1;
}
return 1;
}
|
The STORE() function creates an entry in both the primary and the
secondary array (normal and reverse mapping). Arrays get treated
directly, anything else gets treated as a scalar (and inserted in an array
reference).
Listing 9. The FETCH() function
# return the primary or secondary key, in that order (duplicate keys
# are not detected here)
sub FETCH
{
my ($self, $key) = @_;
exists $self->{PRIMARY}->{$key} &&
return $self->{PRIMARY}->{$key};
exists $self->{SECONDARY}->{$key} &&
return $self->{SECONDARY}->{$key};
return undef;
}
|
The FETCH() function retrieves a key from either the primary or the
secondary hash, where the primary hash is given preference. The logical
shortcut of (statement1) && (statement2) is a common Perl idiom.
Listing 10. The EXISTS() function
# return the primary or secondary key existence, in that order
# (duplicate keys are not detected here)
sub EXISTS
{
my ($self, $key) = @_;
return undef unless (exists $self->{PRIMARY} &&
exists $self->{SECONDARY});
return (exists $self->{PRIMARY}->{$key} ||
exists $self->{SECONDARY}->{$key});
}
|
The EXISTS() function checks the existence of a key in the forward and
reverse mappings, in that order.
Listing 11. The DELETE() function
# delete the primary or secondary key, in that order (duplicate keys
# are not detected here)
sub DELETE
{
my ($self, $key) = @_;
return undef unless (exists $self->{PRIMARY} &&
exists $self->{SECONDARY});
# make sure to delete reverse associations as well
if (exists $self->{PRIMARY}->{$key})
{
foreach (keys %{$self->{SECONDARY}})
{
delete $self->{SECONDARY}->{$_}->{$key};
delete $self->{SECONDARY}->{$_}
unless scalar keys %{$self->{SECONDARY}->{$_}};
}
return delete $self->{PRIMARY}->{$key};
}
if (exists $self->{SECONDARY}->{$key})
{
foreach (keys %{$self->{PRIMARY}})
{
delete $self->{PRIMARY}->{$_}->{$key};
delete $self->{PRIMARY}->{$_}
unless scalar keys %{$self->{PRIMARY}->{$_}};
}
return delete $self->{SECONDARY}->{$key};
}
}
|
Deletions are a little more complex, because we also want to delete the reverse mapping for the key we're deleting. Thus, if we associate a->b, we want to remove the b->a (secondary) mapping as well, and remove the array value of the mapping if it's empty.
Listing 12. The CLEAR() function
sub CLEAR
{
my ($self, $key) = @_;
%$self = (); # clear the whole hash
return 1;
}
|
Clearing the internal hash is very simple, and we can keep using this
object because of the auto-vivification in STORE().
Listing 13. The FIRSTKEY() and NEXTKEY() functions
sub FIRSTKEY
{
my ($self) = @_;
return undef unless (exists $self->{PRIMARY} &&
exists $self->{SECONDARY});
return each %{$self->{PRIMARY}};
}
sub NEXTKEY
{
my ($self, $lastkey) = @_;
return undef unless (exists $self->{PRIMARY} &&
exists $self->{SECONDARY});
return each %{$self->{PRIMARY}};
}
|
The iterators FIRSTKEY() and NEXTKEY() seem complex, but in reality they
just let the each() Perl function do all the work.
Listing 14. The secondary_keys() function
sub secondary_keys
{
my ($self) = @_;
return undef unless (exists $self->{PRIMARY} &&
exists $self->{SECONDARY});
return keys %{$self->{SECONDARY}};
}
|
Because the regular keys() iteration of FIRSTKEY() and NEXTKEY() is only
over the primary keys, the secondary_keys() function is provided for the
secondary mapping.
Unfortunately, tied filehandles are too complex to be covered in this article. They do offer fascinating possibilities, for instance writing/reading a file handle could write/read directly to a database, or send an e-mail when you close the file.
Tying disk (file) databases to a hash is a well-covered topic. You can read the "perldoc DB_File" documentation, see the relevant chapters in the Programming Perl Third Edition book, or read online through one of the many tutorials (see the Resources section for links to some of these sources).
From what we have done here, though, you can see that tied variables are very useful, and I encourage you to look through the many CPAN modules that deal with tied variables. You will almost certainly find a module that suits your needs.
- Read Ted's other Perl articles in the "Cultured Perl" series on developerWorks.
- To follow along with this article, you should have Perl 5.05 or
higher. Find Perl source code and tons more at the "Comprehensive Perl
Archive Network" -- better known as CPAN.
- Also at CPAN, you will
find all the Perl modules you could ever want.
- Go to Perl.com for more Perl
information and related resources.
- Find more on perltie, read the perltie perldoc
page.
- If you are interested in learning more about tied variables, check
out Programming Perl
Third Edition, by Larry Wall, Tom Christiansen, and Jon Orwant (O'Reilly
& Associates, 2000). It's the best guide to Perl today,
up-to-date with 5.005 and 5.6.0. Chapter 14 covers tied variables and is
a great resource.
- The perldoc
DB_File documentation is also a great resource.
- Modules:
- Find more resources for Linux developers in the developerWorks Linux zone.
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, three-tier client-server database architectures, UNIX system administration, CORBA, and project management.
Comments (Undergoing maintenance)





