Skip to main content

Cultured Perl: Perl 5.6 for C and Java programmers

How the new Perl 5.6 features stack up against C/C++/Java

Teodor Zlatanov (tzz@bu.edu), Programmer, Northern Light, Inc.
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, 3-tier client-server database architectures, UNIX system administration, CORBA, and project management. Contact Teodor at tzz@bu.edu.

Summary:  Ted Zlatanov explains some of the peculiarities in Perl 5.6 for C and Java programmers, who may actually be pleasantly surprised by some familiar features hailing from sources other than Perl, like operator ambiguity, multiple ways of doing the same thing, punctuation, regular expressions, and variable mechanism. All of them put variety and power at your fingertips. The point is, Perl isn't too far from anyone's familiar territory and may be useful to even C and Java programmers at some point. So here's your opportunity to enhance your Perl 5.6 skills.

Date:  01 Jan 2001
Level:  Introductory
Activity:  2233 views

Perl often bewilders even experienced programmers, primarily because it allegedly makes it too easy to write obfuscated code. But the confusion regarding Perl's structure, features, and philosophy is inevitable given that it's such a rich and powerful language, and that it was designed from the start to allow for more than one way to do the same thing.

Here we're going to look at some of the more confusing features of Perl 5.6, comparing and contrasting them to the corresponding C/C++/Java features. We'll concentrate on the principles in Larry Wall's paper "Natural Language Principles in Perl" (see the Resources later in this article), because they distinguish Perl from C, C++, and Java most readily. The exact mechanics of Perl's syntax are better learned from the "perldoc perlsyn" manual page and from Programming Perl, the best guide to Perl today (see Resources).




Interpreter mechanics

Novice Perl programmers notice right away that there seems to be no compilation. A Perl script is run immediately by the Perl interpreter ("perl" on UNIX systems, "perl.exe" on DOS/Windows systems, neither on MacOS systems). You can try it yourself: type the name of your Perl interpreter, or run it on a MacOS system, and you can start giving it expressions to evaluate right away. On most systems, the end-of-file (Control-D on UNIX) key sequence has to be used to indicate the end of user input. So, on a UNIX system, the following will print the result of "5+6":




> perl
(Perl is waiting for user input here, because no script name is given)
print 5+6

You press Control-D here
11  

Here you see that Perl ran through the one-line script, and evaluated the line with the effect of printing "11" to your screen.

The Perl interpreter has many options. The "-e" flag, for instance, will tell it to execute the command-line arguments as a script, so the command perl -e'print 5+6' (note the quotes around the print command) is equivalent to the small program above. The "-i" flag allows editing of files in place, sort of like running them through a filter. The "-n" and "-p" switches cause the interpreter to act mostly like an input-output filter, with actions specified by the programmer. The "-w" switch (highly recommended) turns warnings on and is similar to the C/C++ "-Wall" switch to the compiles, except the "-w" switch is also active during program execution.


Speed and benchmarking

People often compare Perl to C or C++ and complain that Perl is not fast enough. This is sometimes true, but I recommend that you use the Benchmark module (perldoc Benchmark) before you decide that a C or C++ program will be faster, because it's not always the case that it will be. Also, Perl is very good at linking to C/C++ code and libraries, and the built-in Perl functions such as sort and print are usually nearly as fast as the C code for the same thing. Again, benchmark before you decide what's better.

Remember that premature optimization is the root of all evil. If you write a working prototype in Perl, and then rewrite the program in another language, that's fine. Prototypes are meant to be quickly developed and easily thrown away.

Compared to Java, Perl does quite well, but benchmarks are still recommended. Java (unlike Perl) is very good at threading, so it's better to do algorithms that can be threaded in Java. But Perl's Tk GUI interface toolkit compares favorably to Java's Swing GUI libraries, and Java code can always be linked to a Perl program, and vice versa. So sometimes, you can even get the best of both worlds!


Exceptions, compilation and documentation

Perl has exceptions through CPAN modules or through the built-in eval() function. Eval evaluates a code block or a string as if it were a part of the program running inside a try/catch block in C++ or Java.

Perl does compile scripts before they are run, but not in the way that a C/C++/Java programmer would think of it. It is closest to the Java byte-compilation process in design and effect. The "perldoc perlrun" and "perldoc perlcc" manual pages have more information on compilation.

Documentation can be embedded inside Perl programs with the POD format. This is more general than the Javadoc format, which is best suited to API documentation, but more specific than C/C++/Java comments, which are so general that no embedded markup or sections are allowed.

Perl programs are not structured at all, even compared to C, C++, or Java. BEGIN blocks, for instance, will be executed first but can be specified many times throughout the program. Namespaces can begin and end anywhere. Definitions, variables, and functions bodies can occur anywhere and Perl will do its best to accommodate such madness.

Because of the loose structure, embedded comments, and overall ambiguity of the language when it is convenient, writing Perl is more like writing a letter in English than writing in any other programming language.


Language ambiguity

Perl tolerates ambiguity better than C/C++/Java. Commas, for example, can separate statements or function parameters:

print 'Hello', ' ', 'there.', "\n";     # print "Hello there\n"
foreach (1..10)
{
 my $i;
 $i = $_ * 2, print "$i\n";             # print evens from 2 to 20
} 

Perl disambiguates as best it can, though sometimes it's hard to do (in this respect Perl is much like English).

Another common ambiguity in Perl is that a variable will often be implicitly used. For example, "print" by itself will print the contents of the $_ variable. This makes sense when you realize that the $_ variable is the default for most operations when they are ambiguous. For instance:

$_ = "hello";
s/hello/hi/;                            # $_ is "hi" now
print;                                  # prints "hi"  

Note how straightforward the code becomes when default variables are used. Ambiguity makes it possible to shorten expressions, both in Perl and in English.


There's more than one way to do it (TMTOWTDI)

Every language has its idioms. In C, a for() loop is the best way to iterate over a range of numbers. In Java, static methods should be invoked with the class name instead of an instance name.

Perl has at least two ways to do anything. The TMTOWTDI principle is fundamental to the language, and diversity is not only tolerated but actively encouraged in the Perl community.

Let's take a look at an example of printing an array. All the expressions do the same thing.

print foreach @array;

foreach (@array) {print};

map {print} @array;

print @array;  

The way to understand the code above is the way to understand all Perl code. Don't worry about the right way to do it -- there's more than one right way. Think about the different approaches, and what they teach you about the language.

By the way, just because there's more than one way to do it doesn't mean there's no wrong way to do it :) There are always more ways to write bad code than there are to write good code. Make your code legible, use Perl's built-in functions instead of writing your own, and document obscure ways of doing obvious things.


Regular expression mayhem

Regular expressions are scary to the uninitiated. They look like a mish-mash of characters and exclamations. Many of us believe that regular expressions were actually invented by Kalahari bushmen who infiltrated computer science programs throughout the world's universities years ago.

Perl's regular expression heritage comes from shell scripting and the awk/grep tools. The language's capabilities, however, far exceed its original models.

Basic regular expressions are easy to write, but somewhat hard to read. For example, "con\w+" will match "contra" and "contrary" but not "pro" or "con". With Perl 5.6.0, however, regular expressions were put on steroids. Unicode character class specifiers, arbitrary code execution inside a pattern, flag toggles, conditional expressions, and many other features were added to the regular expression engine.

The best advice for the beginner is: learn the basics of regular expressions (see Resources or the "perldoc perlre" manual page), but stay away from the advanced features for a little while. Regular expressions are tricky beasts. They are significantly harder to read than all other Perl code, because they are usually written tightly and without comments (comments are not only possible, they are also highly recommended for anyone writing production code).

Regular expressions are available for C/C++/Java as external packages, but Perl is by far the best tool available today to do regular expression searches and substitutions. In rare cases, it may be slower than a pure C approach, but Perl should be the first tool considered for purely regular expression-oriented problems.


Scalars, arrays, and hashes: my oh my!

Unlike C, C++, or Java's variables, Perl's variables are auto-instantiated and typed by name. This is jarring to new Perl programmers, but very useful once understood.

I recommend the "use strict" pragma in all production code. Among other things, it ensures that variables are declared before they are used. This avoids bugs caused by typos, both common and annoying.

Without "use strict" you can encounter the following kinds of problems:

$i = 5;
print $j;                               # print $i 

The programmer made a typo: he hit j when he wanted i. Perl considers that fine, and prints nothing, which is the value of $j. Sometimes auto-instantiation is nice, but in my experience it is best turned off with "use strict" for all code that is to be shared with others.

Perl variables are either scalars, arrays, or hashes (there are more, but you will rarely encounter them directly). They can also be references, which are just scalars. Scalar names begin with "$", array names begin with "@", and hash names begin with "%".

Scalars are the regular Joes. They hold a single value, which will be either a string or a reference. Perl will convert from string to number as necessary. Which is, to say the least, surprising to new Perl programmers. Take a look at this, for instance:

$i = "hi there";
print 1+$i;                             # prints 1

The scalar $i contains the string "hi there" which has the numeric value 0. Thus, 1 + "hi there" yields 1.

Don't think of it as strings vs. numbers. There is only a scalar in memory, and it contains a scalar value. The value can be a number in numeric context (addition), or it can be a string in string context (printing). But there is still only one value.

Undefined scalars contain the "undef" value. You shouldn't compare anything to undef, the way you would compare things to null in C/C++/Java. Instead, use the defined() function, like so:

$i = "hi there";
print $i if defined $i;                 # prints "hi there"
undef $i;                               # set $i to be undef
print $i if defined $i;                 # prints nothing  

Arrays are lists of scalars. They automatically resize as needed, much like the Vector class in Java. C and C++ have no built-in equivalent to arrays, but there are many libraries such as the STL that provide similar functionality. An interesting property of arrays is that in scalar context they yield the number of elements in the array:

@a = ("hi there", "nowhere");
print scalar @a;                        # prints 2
push @a, "hello";                       # add "hello" at the end
print scalar @a;                        # prints 3  

Hashes are like arrays, but the scalars are not ordered by position. They are indexed by another scalar (a unique key). For example, a list of names indexed by social security number (a fairly unique key) can be a hash. Insertion of keys in the hash expands the hash automatically. Hashes are similar to the Java HashMap and Hashtable classes.

References are held inside scalars, and they can point to anything. Thus, it is possible to have an array of hashes or a hash of arrays or a hash of hashes or an array of arrays (N-dimensional array). There are several ways to access reference contents, either by explicit dereferencing or with the "->" operator. See the "perldoc perlref" manual page for further information, as this is a pretty broad topic.

C and C++ only have scalars as a built-in type. This forces programmers to go through many hoops when they want arrays and hashes, including using external libraries such as the STL.

Java has built-in types that correspond to arrays and hashes, but they are not as implicit in the language itself. Iterating over the keys of a hash, for example, takes about three times as much typing in Java as it does in Perl:

import java.util.Enumeration;
import java.util.Hashtable;
Hashtable hi = new Hashtable();
// fill in hi's values
// we can use an Iterator, still a lot of typing
for (Enumeration enum = hi.elements();
     enum.hasMoreElements();)
{
 Object o = enum.nextElement();
 // do something with o
}  

# note that this even includes the definition and initialization of
# the hash, and still is more compact than the Java code!

%hash = { a => "hi", b => "hello" };

foreach (values %hash)
{
 # do something with $_
} 


The missing pieces

Perl lacks many of C, C++, and Java's features. It is a different language, after all. Some of those features directly conflict with each other, like Java's single inheritance model and C++'s multiple inheritance model, for example. In such cases, it's obviously not possible to have both, and Perl comes up with its own way of doing things.

Because Perl programs can be linked to C libraries (and in fact, this is how much of Perl's functionality is implemented), there is almost nothing that C or C++ code can do that Perl cannot accomplish by linking. Here we'll try to limit our discussion to the language's built-in functionality, without external linking.

Compared to C and C++, Perl sometimes lacks execution speed. This can be a problem, but is often easily overcome with good programming and good use of Perl's built-in functionality.

Perl also lacks direct use of C and C++ libraries. Their constants and functionality have to be adapted into Perl through modules and various bindings, which can delay development and slow down execution. This is less of a problem lately, with a huge amount of bindings released on CPAN to date.

As a programmer skill, Perl is not as well accepted as C and C++. It is a young language, growing in popularity but not yet universally spoken. It is installed on most UNIX systems, however, and there are few operating systems to which Perl has not been ported.

Perl supports single or multiple inheritance hierarchies, encapsulation, and polymorphism, but only through external modules or programmer agreement. In other words, the language itself does not enforce strict OOP rules; it is up to the programmers to abide by the rules. This can be good and bad, depending on the programmers and on the project.

Perl's threads and Unicode support lag significantly behind Java, and a little behind C/C++. Java was designed to support threads and Unicode from the start, while C/C++ have had many more years than Perl to get it right, and more need for those features. Threads and Unicode support are still at the experimental stage in Perl, but this should change with the next stable release after 5.6.0.


The best of Perl

For the C/C++/Java programmer, Perl is invaluable for what it does better than those languages. Regular expressions, for instance, are trivial in Perl but quite hard to do in C, C++, or Java. Implicit function arguments, loose syntax, and liberal program structures add to Perl's charm.

Perl is not for everyone. It requires readiness to adapt, acceptance of all its faults, and of course a useful application. Don't use Perl just because it's cool; use Perl because it is the better tool. Use C, C++, or Java when they are better. A good programmer always has several tools at the ready.

Perl has some small deficiencies, which are always being ironed out by its tireless developers. Depending on the need for threads, Unicode support, or strict OOP practices, you may want to consider a language other than Perl to accommodate those specific needs.

Perl is a generic language. It is a flexible language that can act as glue between many disparate modules. It can implement any procedural or functional algorithm. Perl cuts the development cycle significantly, because there is less code to do common things, such as iterating over hash elements. But most importantly, programming in Perl is fun and always a learning experience.


Resources

About the author

Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, 3-tier client-server database architectures, UNIX system administration, CORBA, and project management. Contact Teodor at tzz@bu.edu.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=11068
ArticleTitle=Cultured Perl: Perl 5.6 for C and Java programmers
publish-date=01012001
author1-email=tzz@bu.edu
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers