I admit that I'm not the most seasoned Perl programmer. Fortunately, when I get stuck I know I can turn to numerous books, magazine articles, Web sites, newsgroups, and mailing lists for help. Despite everything at my disposal, however, one piece of critical information always remained elusive. No matter where I looked, I never found a good solution for using Perl and LWP to fetch a Web page from a secure site.
After much pain and suffering, I finally wrote a script to automate the login procedure myself. Along the way I had noticed others struggling with similar problems. Even with some variation, the basic question I saw again and again was this: How in the world do I send my username and password to a Web site using Perl? Having finally found a solution myself, I can hopefully answer that question once and for all.
Hard stuff first
If you plan to communicate with a secure Web site, your session URL will start with HTTPS rather than HTTP. Unfortunately, the LWP (Library for WWW in Perl) module doesn't support HTTPS. To establish communication over a secure HTTP session you'll need to install a module called Crypt::SSLeay. This module is easily found at CPAN (see Resources), but since I develop on Windows that doesn't really help me.
Nearly all Perl programmers on Windows use Perl from ActiveState. The package has been compiled and installs similarly to other Window applications. The best part of Perl from ActiveState is the Perl Package Manager (PPM). Simply type
ppm at the
C:\Perl\Bin prompt and ppm starts. From there, you can search for any Perl module already compiled for Windows and install it in a snap. Unfortunately, most of the modules found in the default ActiveState repositories are very old or simply not available, which is the case with Crypt::SSLeay. Try a search for Crypt::SSLeay from the ppm prompt and you get a nice little error message:
No matches for 'Crypt::SSLeay'; see 'help search'.
But don't despair -- Crypt::SSLeay already compiled for Windows does exist. You just need to look in a different module repository.
Finding and installing Crypt::SSLeay
I have no idea why Crypt::SSLeay isn't available from ActiveState. I do know that you can find it in a Canadian repository and install the module from the ppm prompt. Instead of typing
install Crypt::SSLeay you need to type:
Type the command correctly and the installation takes off without a hitch. Make a typographical mistake, however, and you get another error message:
Error: Failed to download <your typographical error here>
Crypt::SSLeay installs everything you need automatically with the exception of two DLLs. During installation you'll be prompted to add libeay32.dll and ssleay32.dll. Answer yes when prompted; you need both of these files.
With that you have the hardest part out of the way (finding and installing Crypt::SSLeay for Windows, that is) and you're ready to start writing code.
Make your life easier
To send a username and password to a secure site is the next hurdle. While you can achieve this goal using just LWP, it seems more intuitive to write a script that interacts with a page similar to the way you might with a regular browser, or at least as close as possible.
I got my next break after I wrote some scripts and posted snippets of them on the listserv email@example.com, looking for help. Someone wrote back to me and said "Hey, it would be a lot easier if you just used WWW::Mechanize." So off I went to CPAN (again) to investigate their advice.
One quick read of the documentation and the mystery of logging onto a secure Web site was solved. The WWW::Mechanize module allows you to interact with a Web site much like you would with a Web browser. It allows you to follow links and fill out forms. The module was exactly what I needed, and you need it too. Here's how to get it.
- Put aside your code and open a command window (you know, the one that takes you back to the good old days of DOS).
- Change to your C:\Perl\bin directory and type
ppm. The Perl Package Manager starts and leaves you at the ppm prompt
- At the ppm prompt type
search WWW::Mechanize. The search returns a couple of matches. You want the one that simply says
WWW::Mechanize(in my search that is the first match in the list).
- To install the module, type
install 1(if your search associates WWW::Mechanize to a different number, enter that number instead of 1).
WWW::Mechanize in action
Once the installation is complete, head over to CPAN and read the documentation for the WWW::Mechanize module (see Resources). You'll also find some great code snippets and useful cookbook examples with the online documentation. To get you started, I've written a quick WWW::Mechanize example. The script in Listing 1 retrieves the WWW::Mechanize module documentation page and dumps it to a file titled output.html.
Listing 1. Using WWW::Mechanize
1. #!c:\\perl\\bin 2. use strict; 3. use WWW::Mechanize; 4. my $url = "http://www.cpan.org"; 5. my $searchstring = "WWW::Mechanize"; 6. my $outfile = "out.htm"; 7. my $mech = WWW::Mechanize->new(); 8. $mech->get($url); 9. $mech->follow_link(text => "CPAN modules, distributions, and authors", n => 1); 10. $mech->form_name('f'); 11. $mech->field(query => "$searchstring"); 12. $mech->click(); 13. my $output_page = $mech->content(); 14. open(OUTFILE, ">$outfile"); 15. print OUTFILE "$output_page"; 16. close(OUTFILE);
The script is straightforward and probably self-explanatory, but here is a quick run-down of each line:
- Lines 2 and 3 are the all-important
USE strictforces you to declare all variables and reduces the risk of Perl mistaking your intentions when using sub-procedures (which do not exist in the above example).
USE WWW::Mechanizeallows you to use the module previously installed.
- Line 4 assigns the URL used later in the script to
$url. Want to go to a different Web site? Start by changing
- Line 5 is what gets searched for at the declared URL.
- Line 6 assigns a filename to the final output file.
- Lines 7 and 8 create a new instance of WWW::Mechanize and then call the
GETmethod for that instance using the URL previously assigned.
- Line 9 assumes the page was received and follows a known link on that page (obviously you can put more error checking here but, for now, I just want to demonstrate how to retrieve a page). The link page is retrieved. Since I previously followed these steps with a standard browser I know that my next page provides a search field in a form named "f".
- Line 10 references the form named "f" on the page.
- Line 11 assigns the form field
querythe search string I want to search for.
- Line 12 is the virtual button click, as if you were interacting with the page yourself.
- Lines 13, 14, 15, and 16 assign the content of the returned page to
$output_page, open a simple output file, write the contents to the file, and close the file.
That's it for the basic usage of WWW::Mechanize; now let's move on to using it with a secure Web site.
Find a secure site and log in
In Listing 2 you see a script example where I've tried to log into a Web-based e-mail account at Yahoo!® mail. Test this script out for yourself and see how it runs. (Obviously, you'll need a Web-based e-mail account for this test.)
Listing 2. Logging into a secure site
1. #!c:\\perl\\bin 2. use strict; 3. use WWW::Mechanize; 4. use HTTP::Cookies; 5. my $outfile = "out.htm"; 6. my $url = "https://mail.yahoo.com/"; 7. my $username = "your_email_username_here"; 8. my $password = "your_account_password_here"; 9. my $mech = WWW::Mechanize->new(); 10. $mech->cookie_jar(HTTP::Cookies->new()); 11. $mech->get($url); 12. $mech->form_name('login_form'); 13. $mech->field(login => $username); 14. $mech->field(passwd => $password); 15. $mech->click(); 16. my $output_page = $mech->content(); 17. open(OUTFILE, ">$outfile"); 18. print OUTFILE "$output_page"; 19. close(OUTFILE);
Notice that most of the script is the same as the first one shown in Listing 1; the differences are as follows:
- Line 6 is the URL to the secure Web site.
- Lines 7 and 8 are the username and password for the Yahoo! mail account. Obviously, I didn't include my real username and password. You can easily substitute your account information in these lines so the script works for you.
- Line 10 creates a new cookie instance for the previously created WWW::Mechanize instance.
- Line 12 sets the form to the name specified on the page that the previously [created] URL lands on.
- Lines 13 and 14 set the
passwdproperties to the username and password values previously defined. The rest of the script is the same as the one in Listing 1.
Keep in mind that I found the form name and fields
passwd by browsing to yahoo.mail.com and examining the source of the HTML page that URL lands on.
Murphy's Law: it doesn't work
Does my script work? Of course not! Rarely ever do I get my Perl scripts to work on the first try; however, the failure is an opportunity to do a little troubleshooting.
A great place to start troubleshooting is with the communication between the script and the Web site. To better understand what goes on behind the scenes, add the following debug line after Line 4 in the Yahoo! mail script of Listing 2:
use LWP::Debug qw(+);
Once you add the statement, launch the script again.
The debug statement sends a flood of information to your screen. Take your time and try to understand everything on the screen. In the case of the Yahoo! mail script, I saw lots of positive information on the screen; however, the most important information came at the end:
Listing 3. Debug output
Line 1: LWP::UserAgent::send_request: GET https://login.yahoo.com/config/verify?.done= http%3a//us.####.mail.yahoo.com/ym/login%3f.rand=############### Line 2: LWP::UserAgent::_need_proxy: Not proxied Line 3: LWP::Protocol::http::request: () Line 4: LWP::Protocol::collect: read 508 bytes Line 5: LWP::UserAgent::request: Simple response: OK
I've numbered the output lines in Listing 3 so they are easier to talk about; in normal circumstances such lines are not numbered. Also, I used number signs (#) instead of the actual numbers returned to protect my Yahoo! account.
Note: For better viewing, Line 1 is split onto multiple lines. It actually appears as a single line.
Troubleshooting the output
So what do you see in the debug output? Line 1 suggests Yahoo! is trying to confirm something and then redirect the script to another location. Also, note the positive
Ok response at the end, which indicates I got something back.
Right or wrong, the script writes its output to a simple HTML file before it quits. The debug output indicates something was returned. Therefore, it's time to open up the output file in the text editor and have a look.
Sure enough, the output file contains some basic HTML code and a very telling message.
Listing 4. Error page output
<body> If you are seeing this page, your browser settings prevent you from automatically redirecting to a new URL. <p> Please <a href="http://us.f319.mail.yahoo.com/ym/login?.rand=###############">click here</a> to continue.
The script failed to redirect for some reason, but it is given an option to continue with the
click here reference. At this point I see a very quick and easy solution to the problem.
A quick and easy solution
First, the page returned to the script made no reference to a failed login. The page simply said it could not redirect the browser and to
click here to continue. So, I simply add one more line of code to the script after Line 15, and the
click here option takes my script to its final destination, as shown in Listing 5.
Listing 5. The final script
1. #!c:\\perl\\bin 2. use strict; 3. use WWW::Mechanize; 4. use HTTP::Cookies; 5. my $outfile = "out.htm"; 6. my $url = "https://mail.yahoo.com/"; 7. my $username = "your_email_username_here"; 8. my $password = "your_account_password_here"; 9. my $mech = WWW::Mechanize->new(); 10. $mech->cookie_jar(HTTP::Cookies->new()); 11. $mech->get($url); 12. $mech->form_name('login_form'); 13. $mech->field(login => $username); 14. $mech->field(passwd => $password); 15. $mech->click(); 16. $mech->follow_link(text => "click here", n => 1); 17. my $output_page = $mech->content(); 18. open(OUTFILE, ">$outfile"); 19. print OUTFILE "$output_page"; 20. close(OUTFILE);
As long as the script continues to retrieve a redirect page, and Yahoo! mail continues to supply the same redirect failure message, my quick and easy solution does the trick. Obviously, not all secure Web sites respond like Yahoo! does. Be prepared to do a little detective work of your own to get your login script working.
Secure log-in checklist
I've shown you a Perl script that solves the mystery of logging into a secure Web site. To summarize, here is a checklist of must-haves for building successful, secure Web site login scripts with Perl:
- Start with Crypt::SSLeay: Logging into a secure site is usually done over HTTPS. You need this module to make it possible. You can find it already compiled for Windows from the TheoryX server in Canada (see Resources).
- Add WWW::Mechanize: Make your life easier and use this module, which allows you to write code that mimics Web site interaction by easily following links and filling out forms (a critical part of logging into a secure site).
usestatement to get them to work automatically in your script.
- Enable debugging: When things aren't working as expected, enable debugging with
use LWP::Debug qw(+);This statement sends a flood of information to your screen; however, if you are patient, the output is very helpful.
- Make an output file: Dump the final output, or the output after each page retrieval point in your script, to a simple HTML file and examine it. The contents of the file provides a clear picture of what the script got in return for its "get" requests.
Apply this checklist to your scripts and you'll soon be automating access to secure Web sites with Perl.
- Cultured Perl: Debugging Perl with ease (Teodor Zlatanov, developerWorks, November 2000): Read about techniques for debugging Perl.
- Cultured Perl (Teodor Zlatanov, developerWorks): Get more information on Perl programming from this column.
- Perl.com: Check out the O'Reilly Network's source for Perl information and related resources.
- developerWorks Web architecture zone: Find articles and tutorials on various Web technologies.
Get products and technologies
- ActiveState: Download Perl for Windows for free.
- CPAN: Find (almost) all the Perl modules you could ever want, including WWW::Mechanize.
- TheoryX: Try this repository for hard-to-find Perl modules such as Crypt::SSLeay.
- firstname.lastname@example.org: Exchange programming tips with other Perl programmers.
- developerWorks blogs: Get involved in the developerWorks community.