Secure Web site access with Perl

Write a Perl script to automate Web-based logins

Perl and its LWP module make it a breeze to automate Web site access; too bad the breeze becomes a storm when the Web site requires a username and password for access. Fortunately, you can use Perl modules to calm the storm. Learn how to find, install, and use the WWW::Mechanize and Crypt::SSLeay modules in a Perl script that automates logging into a secure Web site in this article by Bret Swedeen.

Share:

Bret H Swedeen (swedeen@us.ibm.com), GWA IT Architect, IBM

Bret Swedeen joined IBM in 1997 as a Principal Knowledge Architect. A Certified Netware Engineer and Certified Lotus Professional System Administrator and Application Developer, Bret also wrote articles for the Lotus Notes Advisor, Database Advisor, LAN Times, and LAN Magazine. In 1997 he authored the Lotus Notes 4.5 Administrator's Guide from Sybex publishing. Currently Bret develops custom monitoring tools for IBM"s collaborative computing environment.



25 April 2006

Also available in Russian

I admit that I'm not the most seasoned Perl programmer. Fortunately, when I get stuck I know I can turn to numerous books, magazine articles, Web sites, newsgroups, and mailing lists for help. Despite everything at my disposal, however, one piece of critical information always remained elusive. No matter where I looked, I never found a good solution for using Perl and LWP to fetch a Web page from a secure site.

After much pain and suffering, I finally wrote a script to automate the login procedure myself. Along the way I had noticed others struggling with similar problems. Even with some variation, the basic question I saw again and again was this: How in the world do I send my username and password to a Web site using Perl? Having finally found a solution myself, I can hopefully answer that question once and for all.

Perl on Windows?

Surprise! My examples are not for Perl on variants of UNIX®; all my pain and suffering was eked out on the Windows platform. Since I use Perl on Windows®, my explanations and examples are based on Perl 5.8.x from ActiveState® (see Resources). Your mileage might vary if you use a version of Perl from elsewhere (I have no idea what that version might be, but I had to point it out). That said, most of my discussion should apply regardless of platform.

Hard stuff first

If you plan to communicate with a secure Web site, your session URL will start with HTTPS rather than HTTP. Unfortunately, the LWP (Library for WWW in Perl) module doesn't support HTTPS. To establish communication over a secure HTTP session you'll need to install a module called Crypt::SSLeay. This module is easily found at CPAN (see Resources), but since I develop on Windows that doesn't really help me.

Nearly all Perl programmers on Windows use Perl from ActiveState. The package has been compiled and installs similarly to other Window applications. The best part of Perl from ActiveState is the Perl Package Manager (PPM). Simply type ppm at the C:\Perl\Bin prompt and ppm starts. From there, you can search for any Perl module already compiled for Windows and install it in a snap. Unfortunately, most of the modules found in the default ActiveState repositories are very old or simply not available, which is the case with Crypt::SSLeay. Try a search for Crypt::SSLeay from the ppm prompt and you get a nice little error message: No matches for 'Crypt::SSLeay'; see 'help search'.

But don't despair -- Crypt::SSLeay already compiled for Windows does exist. You just need to look in a different module repository.

Finding and installing Crypt::SSLeay

I have no idea why Crypt::SSLeay isn't available from ActiveState. I do know that you can find it in a Canadian repository and install the module from the ppm prompt. Instead of typing install Crypt::SSLeay you need to type:

install http://theoryx5.uwinnipeg.ca/ppms/Crypt-SSLeay.ppd

Add TheoryX to your repository list

If you're interested, you can browse the available Perl packages compiled specifically for Win32 at the TheoryX repository. Just open a Web browser, type in http://theoryx5.uwinnipeg.ca/, and follow the link to ppm repositories for Win32 ActivePerl 8xx and 6xx. If you think you might want to install more modules from this repository, add it to your repository list at the ppm prompt with the rep add command. Type help at the ppm prompt for more repository command information.

Type the command correctly and the installation takes off without a hitch. Make a typographical mistake, however, and you get another error message: Error: Failed to download <your typographical error here>

Crypt::SSLeay installs everything you need automatically with the exception of two DLLs. During installation you'll be prompted to add libeay32.dll and ssleay32.dll. Answer yes when prompted; you need both of these files.

With that you have the hardest part out of the way (finding and installing Crypt::SSLeay for Windows, that is) and you're ready to start writing code.


Make your life easier

To send a username and password to a secure site is the next hurdle. While you can achieve this goal using just LWP, it seems more intuitive to write a script that interacts with a page similar to the way you might with a regular browser, or at least as close as possible.

I got my next break after I wrote some scripts and posted snippets of them on the listserv libwww@perl.org, looking for help. Someone wrote back to me and said "Hey, it would be a lot easier if you just used WWW::Mechanize." So off I went to CPAN (again) to investigate their advice.

One quick read of the documentation and the mystery of logging onto a secure Web site was solved. The WWW::Mechanize module allows you to interact with a Web site much like you would with a Web browser. It allows you to follow links and fill out forms. The module was exactly what I needed, and you need it too. Here's how to get it.

  1. Put aside your code and open a command window (you know, the one that takes you back to the good old days of DOS).
  2. Change to your C:\Perl\bin directory and type ppm. The Perl Package Manager starts and leaves you at the ppm prompt ppm>.
  3. At the ppm prompt type search WWW::Mechanize. The search returns a couple of matches. You want the one that simply says WWW::Mechanize (in my search that is the first match in the list).
  4. To install the module, type install 1 (if your search associates WWW::Mechanize to a different number, enter that number instead of 1).

WWW::Mechanize in action

Once the installation is complete, head over to CPAN and read the documentation for the WWW::Mechanize module (see Resources). You'll also find some great code snippets and useful cookbook examples with the online documentation. To get you started, I've written a quick WWW::Mechanize example. The script in Listing 1 retrieves the WWW::Mechanize module documentation page and dumps it to a file titled output.html.

Listing 1. Using WWW::Mechanize
1. #!c:\\perl\\bin
2. use strict;
3. use WWW::Mechanize;
4. my $url = "http://www.cpan.org";
5. my $searchstring = "WWW::Mechanize";
6. my $outfile = "out.htm";
7. my $mech = WWW::Mechanize->new();
8. $mech->get($url);
9. $mech->follow_link(text => "CPAN modules, distributions, and authors", n => 1);
10. $mech->form_name('f');
11. $mech->field(query => "$searchstring");
12. $mech->click();
13. my $output_page = $mech->content();
14. open(OUTFILE, ">$outfile");
15. print OUTFILE "$output_page";
16. close(OUTFILE);

The script is straightforward and probably self-explanatory, but here is a quick run-down of each line:

  • Lines 2 and 3 are the all-important USE statements. USE strict forces you to declare all variables and reduces the risk of Perl mistaking your intentions when using sub-procedures (which do not exist in the above example). USE WWW::Mechanize allows you to use the module previously installed.
  • Line 4 assigns the URL used later in the script to $url. Want to go to a different Web site? Start by changing $url.
  • Line 5 is what gets searched for at the declared URL.
  • Line 6 assigns a filename to the final output file.
  • Lines 7 and 8 create a new instance of WWW::Mechanize and then call the GET method for that instance using the URL previously assigned.
  • Line 9 assumes the page was received and follows a known link on that page (obviously you can put more error checking here but, for now, I just want to demonstrate how to retrieve a page). The link page is retrieved. Since I previously followed these steps with a standard browser I know that my next page provides a search field in a form named "f".
  • Line 10 references the form named "f" on the page.
  • Line 11 assigns the form field query the search string I want to search for.
  • Line 12 is the virtual button click, as if you were interacting with the page yourself.
  • Lines 13, 14, 15, and 16 assign the content of the returned page to $output_page, open a simple output file, write the contents to the file, and close the file.

That's it for the basic usage of WWW::Mechanize; now let's move on to using it with a secure Web site.


Find a secure site and log in

In Listing 2 you see a script example where I've tried to log into a Web-based e-mail account at Yahoo!® mail. Test this script out for yourself and see how it runs. (Obviously, you'll need a Web-based e-mail account for this test.)

Listing 2. Logging into a secure site
1. #!c:\\perl\\bin
2. use strict;
3. use WWW::Mechanize;
4. use HTTP::Cookies;
5. my $outfile = "out.htm";
6. my $url = "https://mail.yahoo.com/";
7. my $username = "your_email_username_here";
8. my $password = "your_account_password_here";
9. my $mech = WWW::Mechanize->new();
10. $mech->cookie_jar(HTTP::Cookies->new());
11. $mech->get($url);
12. $mech->form_name('login_form');
13. $mech->field(login => $username);
14. $mech->field(passwd => $password);
15. $mech->click();
16. my $output_page = $mech->content();
17. open(OUTFILE, ">$outfile");
18. print OUTFILE "$output_page";
19. close(OUTFILE);

Notice that most of the script is the same as the first one shown in Listing 1; the differences are as follows:

  • Line 4 tells the script to use cookies. Secure sites use cookies for authentication purposes. Exactly how the cookie process works is beyond the scope of this article. For now just know that you need cookie support to log into a secure Web site.
  • Line 6 is the URL to the secure Web site.
  • Lines 7 and 8 are the username and password for the Yahoo! mail account. Obviously, I didn't include my real username and password. You can easily substitute your account information in these lines so the script works for you.
  • Line 10 creates a new cookie instance for the previously created WWW::Mechanize instance.
  • Line 12 sets the form to the name specified on the page that the previously [created] URL lands on.
  • Lines 13 and 14 set the login and passwd properties to the username and password values previously defined. The rest of the script is the same as the one in Listing 1.

Keep in mind that I found the form name and fields login and passwd by browsing to yahoo.mail.com and examining the source of the HTML page that URL lands on.


Murphy's Law: it doesn't work

Does my script work? Of course not! Rarely ever do I get my Perl scripts to work on the first try; however, the failure is an opportunity to do a little troubleshooting.

A great place to start troubleshooting is with the communication between the script and the Web site. To better understand what goes on behind the scenes, add the following debug line after Line 4 in the Yahoo! mail script of Listing 2:

use LWP::Debug qw(+);

Once you add the statement, launch the script again.

The debug statement sends a flood of information to your screen. Take your time and try to understand everything on the screen. In the case of the Yahoo! mail script, I saw lots of positive information on the screen; however, the most important information came at the end:

Listing 3. Debug output
Line 1:  LWP::UserAgent::send_request: GET 
    https://login.yahoo.com/config/verify?.done=
    http%3a//us.####.mail.yahoo.com/ym/login%3f.rand=###############
Line 2:  LWP::UserAgent::_need_proxy: Not proxied
Line 3:  LWP::Protocol::http::request: ()
Line 4:  LWP::Protocol::collect: read 508 bytes
Line 5:  LWP::UserAgent::request: Simple response: OK

I've numbered the output lines in Listing 3 so they are easier to talk about; in normal circumstances such lines are not numbered. Also, I used number signs (#) instead of the actual numbers returned to protect my Yahoo! account.

Note: For better viewing, Line 1 is split onto multiple lines. It actually appears as a single line.

Troubleshooting the output

So what do you see in the debug output? Line 1 suggests Yahoo! is trying to confirm something and then redirect the script to another location. Also, note the positive Ok response at the end, which indicates I got something back.

Right or wrong, the script writes its output to a simple HTML file before it quits. The debug output indicates something was returned. Therefore, it's time to open up the output file in the text editor and have a look.

Sure enough, the output file contains some basic HTML code and a very telling message.

Listing 4. Error page output
<body>
If you are seeing this page, your browser settings prevent you
from automatically redirecting to a new URL. 
<p>
Please 
  <a href="http://us.f319.mail.yahoo.com/ym/login?.rand=###############">click here</a>
  to continue.

The script failed to redirect for some reason, but it is given an option to continue with the click here reference. At this point I see a very quick and easy solution to the problem.


A quick and easy solution

First, the page returned to the script made no reference to a failed login. The page simply said it could not redirect the browser and to click here to continue. So, I simply add one more line of code to the script after Line 15, and the click here option takes my script to its final destination, as shown in Listing 5.

Listing 5. The final script
1. #!c:\\perl\\bin
2. use strict;
3. use WWW::Mechanize;
4. use HTTP::Cookies;
5. my $outfile = "out.htm";
6. my $url = "https://mail.yahoo.com/";
7. my $username = "your_email_username_here";
8. my $password = "your_account_password_here";
9. my $mech = WWW::Mechanize->new();
10. $mech->cookie_jar(HTTP::Cookies->new());
11. $mech->get($url);
12. $mech->form_name('login_form');
13. $mech->field(login => $username);
14. $mech->field(passwd => $password);
15. $mech->click();
16. $mech->follow_link(text => "click here", n => 1);	
17. my $output_page = $mech->content();
18. open(OUTFILE, ">$outfile");
19. print OUTFILE "$output_page";
20. close(OUTFILE);

As long as the script continues to retrieve a redirect page, and Yahoo! mail continues to supply the same redirect failure message, my quick and easy solution does the trick. Obviously, not all secure Web sites respond like Yahoo! does. Be prepared to do a little detective work of your own to get your login script working.


Secure log-in checklist

I've shown you a Perl script that solves the mystery of logging into a secure Web site. To summarize, here is a checklist of must-haves for building successful, secure Web site login scripts with Perl:

  • Start with Crypt::SSLeay: Logging into a secure site is usually done over HTTPS. You need this module to make it possible. You can find it already compiled for Windows from the TheoryX server in Canada (see Resources).
  • Add WWW::Mechanize: Make your life easier and use this module, which allows you to write code that mimics Web site interaction by easily following links and filling out forms (a critical part of logging into a secure site).
  • Use cookies: Secure Web transactions use cookies. You need to turn them on with a use statement to get them to work automatically in your script.
  • Enable debugging: When things aren't working as expected, enable debugging with use LWP::Debug qw(+); This statement sends a flood of information to your screen; however, if you are patient, the output is very helpful.
  • Make an output file: Dump the final output, or the output after each page retrieval point in your script, to a simple HTML file and examine it. The contents of the file provides a clear picture of what the script got in return for its "get" requests.

Apply this checklist to your scripts and you'll soon be automating access to secure Web sites with Perl.

Resources

Learn

Get products and technologies

  • ActiveState: Download Perl for Windows for free.
  • CPAN: Find (almost) all the Perl modules you could ever want, including WWW::Mechanize.
  • TheoryX: Try this repository for hard-to-find Perl modules such as Crypt::SSLeay.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development, Linux
ArticleID=108731
ArticleTitle=Secure Web site access with Perl
publish-date=04252006