Managing ezines with JavaMail and XSLT, Part 2

Use XML and XSLT to automatically produce both plain text and HTML newsletters

In the conclusion of his series, Benoît Marchal demonstrates how to automate e-mail publishing chores with Java and XML. This concrete application of XML and XSLT describes an e-mail newsletter (e-zine) publishing application that outputs both HTML and plain text e-mail messages. Five reusable code samples include a Java program to send e-mails using JavaMail, an XSLT style sheet to convert the DocBook sample introduced in Part 1 to HTML, a Java configuration handler (in the form of a SAX ContentHandler), and the Java code that puts it all together in a multistepped transformation.

Benoit Marchal, Consultant, Pineapplesoft

Benoit MarchalBenoît Marchal is a consultant and writer based in Namur, Belgium. He wrote both XML by Example and Applied XML Solutions. He is a columnist for Gamelan.
Ben learned first hand about e-zine publishing when he launched Pineapplesoft Link in 1998. You can subscribe to his e-zine and find details on his latest projects at www.marchal.com.



27 October 2010 (First published 01 April 2001)

27 Oct 2010 - As a followup to reader comment, fixed broken link to the download file (see Download) and added entry for SAX tutorial ( see Resources).

This is the second part of this series on how to automate e-zine publishing using JavaMail and XSLT. In Part 1, you learned how to convert DocBook XML documents to a text format that satisfies the stringent requirements of e-mail publishing. This required three steps:

  1. Using XSLT to convert DocBook in an intermediate text markup language
  2. Using a custom-made Java application to reformat the text markup language into plain text
  3. Cleaning up the text with the help of various SAX filters

One of the main benefits of this approach is that it breaks down a complex process in a number of discrete steps. The XSLT style sheet buys you tremendous flexibility: If you ever decide to publish documents written to a vocabulary other than DocBook, only the style sheet must change. Furthermore, the SAX filters help to modularize the conversion component, which makes it more readable and easier to maintain.

In Part 2, I describe how to wrap this process with JavaMail, the standard Java language e-mail API, to send the e-zine over the wire. In the process, I'll revisit SAX event handling. This conclusion illustrates how to wrap XSLT processing in a larger application. Figure 1 from Part 1 illustrates how the pieces all fit together.

Figure 1. How the components of the solution interact
How the components of the solution interact

JavaMail 101

Before going any further, let's review JavaMail. If you are already familiar with JavaMail, you may want to skip to the next section.

SMTP host

Newcomers to JavaMail often are confused by the SMTP host (also known as SMTP relay, mailhost, or outgoing mail server).

The concept is simple: Your ISP provides you with an e-mail server that your e-mail client (Eudora, Outlook, or Netscape) uses to send e-mails. Because JavaMail replaces the e-mail client in this application, it too needs access to the e-mail server.

Review the configuration of your e-mail client to determine what your SMTP host is. To be sure, check with your ISP or system administrator.

As a final word of warning, you cannot directly access the SMTP host on some corporate networks. In those cases, because JavaMail absolutely requires the SMTP host, your only hope is to cajole your system administrator into giving you special access.

E-mail remains one of the most popular Internet applications. Although we tend to associate e-mail with e-mail clients such as Eudora, Outlook, or Netscape, many applications can send or retrieve e-mails automatically. Just remember the last time you bought a product online. Chances are that within minutes of completing the order you received a confirmation e-mail. It was sent automatically by the electronic shop. No human was involved, and it did not required an e-mail client such as Eudora.

Recognizing that Java applications often need to send or receive e-mail, Sun developed JavaMail as a standard API for e-mail services. Through JavaMail, Java applications can send e-mails or check a mailbox directly (without going through an e-mail client). Note that JavaMail is a separate download from the Sun JDK (see Resources).

The SendMessage.java listing in Listing 1 is a simple application to illustrate JavaMail. Notice that it imports the javax.mail and javax.mail.internet packages.

To send an e-mail, the first step is to request a Session object through Session.getDefaultInstance(). Session.getDefaultInstance() takes a Properties object that must contain at least the mail.smtp.host property. This property must point to your SMTP host or nothing will work (see the sidebar SMTP host).

Next it creates a Message and sets its various properties, including the addresses of sender and recipients (note that there may be more than one recipient but there is only one sender), subject, date, and the body of the message.

The last step is to use Transport.send() to send the message over the wire. SendMessage.java creates only simple text e-mails. In the next sections, I explain how to create so-called multipart e-mails.

Listing 1. SendMessage.java
package com.psol.xslist;

import java.util.*;
import javax.mail.*;
import javax.mail.internet.*;

public class SendMessage
{
   public static final void main(String[] args)
   {
      try
      {
         if(args.length < 5)
         {
            System.out.println("java com.psol.xslist.SendMessage" +
               " from@domain.com to@domain.com mailhost.domain.com" +
               " subject \"mail content\"");
            return;
         }
         Properties props = System.getProperties();
         props.put("mail.smtp.host",args[2]);
         Session session = Session.getDefaultInstance(props);
         Message message = new MimeMessage(session);
         InternetAddress from = new InternetAddress(args[0]);
         InternetAddress to[] = InternetAddress.parse(args[1]);
         message.setFrom(from);
         message.setRecipients(Message.RecipientType.TO,to);
         message.setSubject(args[3]);
         message.setSentDate(new Date());
         message.setText(args[4]);
         Transport.send(message);
      }
      catch(MessagingException e)
      {
         System.err.println(e.getMessage());
      }
   }
}

Multipart e-mails and configuration files

multipart/alternative

multipart/alternative e-mails is a powerful concept defined by the Internet e-mail standard. These e-mails include several copies of the message, typically in different formats, so that the e-mail client can pick the best one to present to the user.

HTML-capable e-mail clients use an HTML version if one is present, while text-based clients use the text version. This concept is useful for e-zine publishing: Sending both the HTML and text version of the e-zine as a multipart/alternative e-mail ensures compatibility with all e-mail clients.

That's the theory, at least. In practice, most text-based e-mail clients predate the introduction of multipart/alternative and will present both the text and HTML versions to the user. Unfortunately, HTML appears as unreadable garbage to most subscribers. Smart e-zine publishers therefore make sure that the text body appears first in the e-mail.

Armed with JavaMail and the text formatter introduced in Part 1, you are ready to send the e-zine. You could update setText() in SendMessage.java to use the result of the text transformation introduced in Part 1, however you can do better. Specifically you can use multipart/alternative e-mails (see the sidebar multipart/alternative).

I choose to store the e-mail information in an XML configuration file such as config.xml in Listing 2. The structure of this file is as follows:

  • cfg:email is the root of the configuration file. It has one attribute, smtp, with the SMTP host. I have marked the attribute in bold in the listing because you must change this parameter before testing (recall the sidebar SMTP host).
  • cfg:header contains the sender's and recipient's addresses, as well as the e-mail subject as attributes (respectively, from, to and subject). For testing, make sure to use your own e-mail address.
  • cfg:body points to the source XML document in its source attributes.
  • cfg:text and cfg:part, which are enclosed in cfg:body, control how to create the various body parts. cfg:text uses the text conversion introduced in Part 1 to create a text body part whereas cfg:body applies a simple XSLT style sheet to create the HTML version of the body.

All these elements are in the http://www.psol.com/xns/xslist/config namespace. Remember that although an XML namespace is in the form of a URL, it is used only as an identifier. In other words, if you point your browser at that namespace, you won't find any Web site.

Finally, note that the configuration file contains the address of only one recipient. That is because most people distribute mailing lists through special servers, such as Topica or SparkLIST (see Resources), which manage subscription and unsubscription. Sending the e-zine to the mailing-list server will dispatch it to all subscribers.

Be sure to change the smtp, from, and to attributes to your own values before trying this sample code.

Listing 2. config.xml for a hypothetical mailing
<?xml version="1.0"?>
<cfg:email xmlns:cfg="http://www.psol.com/xns/xslist/config"
           smtp="mailandnews.com">
   <cfg:header from="username@mailandnews.com"
               to="nobody@example.com"
               subject="XSL -- First Step in Learning XML"/>
   <cfg:body source="article.xml">
      <cfg:text styleSheet="text.xsl" contentType="text/plain"/>
      <cfg:part styleSheet="html.xsl" contentType="text/html"/>
   </cfg:body>
</cfg:email>

Reading the configuration file

Processing the configuration file is done by ConfigHandler.java in Listing 3. It inherits from a SAX DefaultHandler and implements the interface with the parser. As promised in Part 1, this section reviews SAX event handling. Feel free to skip over this review if you are familiar with SAX event handling.

SAX is an event-based API, meaning that the parser sends events -- similar to AWT events -- to your application as it progresses through the XML file. Your application can either ignore events or process them. Most event handlers process at least the following events:

  • startElement() which the parser calls when it reads a start tag
  • endElement() which the parser calls when it hits a end tag
  • characters() which the parser calls when it encounters character data. ConfigHandler.java is remarkable in that it ignores characters() events

SAX defines many other events, but those are the main ones. See Resources for more details on other SAX events.

To understand SAX, you must remember that an XML document is a hierarchy of elements. In other words, it's a tree. The parser reads the tree and describes it, through events, to your application. By calling startElement(), the parser notifies the application that it has found a new branch. endElement() marks the end of the current branch.

The difficulty in writing SAX handlers is that the code to process a specific element (say cfg:body) is split over several events (such as startElement() and endElement()).

Furthermore the parser provides no contextual information, so it is up to the application to match the events with each other. ConfigHandler.java uses the state variable to track what has been read in the configuration.

As it progresses through the document, ConfigHandler.java builds a multipart Message and creates the different versions of the messages (text and HTML) by applying the appropriate style sheet. At the end the document, ConfigHandler.java sends the message.

This mechanism proves very flexible because you can create as many versions of the body as you want by using different style sheets. If you want a third version, say in WML (Wireless Markup Language), you only need to provide one extra style sheet and adapt the configuration file accordingly.


Putting it all together

config.xml (in Listing 2) references the html.xsl style sheet shown in Listing 4. The style sheet html.xsl is a classic example of a style sheet that converts article.xml, introduced in Part 1, to HTML.

Listing 4. html.xsl converts article.xml (from Part 1 of this series) to HTML
<?xml version="1.0"?>
<xsl:stylesheet
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   version="1.0">

<xsl:output method="html"/>

<xsl:template match="/">
<html>
   <head>
      <title><xsl:value-of
         select="article/articleinfo/title"/></title>
   </head>
   <xsl:apply-templates/>
</html>
</xsl:template>

<xsl:template match="article">
<body>
   <xsl:apply-templates/>
</body>
</xsl:template>

<xsl:template match="articleinfo/title">
   <h1><xsl:apply-templates/></h1>
</xsl:template>

<xsl:template match="sect1/title">
   <h2><xsl:apply-templates/></h2>
</xsl:template>

<xsl:template match="ulink">
   <a href="{@url}"><xsl:apply-templates/></a>
</xsl:template>

<xsl:template match="emphasis">
   <b><xsl:apply-templates/></b>
</xsl:template>

<xsl:template match="para">
   <p><xsl:apply-templates/></p>
</xsl:template>

<xsl:template match="author">
   <p>by <xsl:value-of select="firstname"/>
   <xsl:text> </xsl:text>
   <xsl:value-of select="surname"/></p>
</xsl:template>

</xsl:stylesheet>

The last missing piece is the main() method. It resides in Listing 5 in XslList.java. Because ConfigHandler.java is responsible for creating and sending the e-mail, the main() is very simple: It creates a SAX parser, registers ConfigHandler.java as the event handler, and launches parsing by calling parse().

Of course, many things happen as the parser decodes the document. All along, the parser sends events to ConfigHandler.java, where all the heavy-duty processing (such as creating and sending messages) takes place. When parse() returns, two style sheets have been applied and an e-mail has been sent!

Listing 5. XslList.java
package com.psol.xslist;

import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class XsList
{
   protected static final String
      PARSER_NAME = "org.apache.xerces.parsers.SAXParser";

   public static void main(String[] args)
   {
      try
      {
         if(args.length < 1)
         {
            System.out.println("java com.psol.xslist.XsList input.xml");
            return;
         }
         ConfigHandler configHandler = new ConfigHandler();
         XMLReader parser =
            XMLReaderFactory.createXMLReader(PARSER_NAME);
         parser.setFeature("http://xml.org/sax/features/namespaces",
                           true);
         parser.setContentHandler(configHandler);
         parser.parse(args[0]);
      }
      catch(IOException e)
      {
         System.err.println(e.getMessage());
      }
      catch(SAXException e)
      {
         System.err.println(e.getMessage());
      }
   }
}

Your turn now

As I prepared this article, my goal was threefold. I wanted to build an XML application with no browser. I believe using XML in a browser makes about as much sense as using Java for applets. I hope this article makes the point strongly that useful XML doesn't need a browser.

I also wanted to demonstrate how the combination of XSLT and SAX event handling is greater than the sum of its parts. In Part 1, you saw how XSLT is used to simplify the markup that is being fed to the SAX handler.

Finally, because I know most programmers learn more by studying examples, I wanted to show you a complete application. Incidentally, if you like this approach, you'll love my book Applied XML Solutions (see Resources) which packs in eight more examples.

Now it's your turn. This application serves a specific niche, e-zine publishing, but the XML techniques it demonstrates (XSLT, DocBook, SAX filters, JavaMail and more) are not specific to e-zine publishing. Study the code and see how it can help in your own niche.


Download

DescriptionNameSize
Source code for this articlex-xmlist1-xslist.zip1451 KB

Resources

  • You can download the source code for this project, including an ANT build file.
  • Dr. Ralph Wilson has conducted a survey of e-mail clients. He reports on their support for HTML e-mails.
  • If you need a mailing-list server contractor, check out two popular vendors, Topica and SparkLIST, which manage lists, including subscription and unsubscription. See also David Strom's Web Informant, which includes his most recent comparison of mailing-list services.
  • JAXP, the Java API for XML, integrates SAX (XML parsing) and TrAX (XSLT transforming).
  • Understanding SAX (Nicholas Chase, developerWorks, July 2003): Examine the use of the Simple API for XML version 2.0.x, or SAX 2.0.x in this tutorial as you learn to retrieve, manipulate, and output XML data with SAX.
  • For more details on SAX, turn to David Megginson's Web site. Megginson is the maintainer of the SAX API.
  • JavaMail is the standard Java API for e-mail. (It's a separate download from the JDK.)
  • The first installment in this two-part series discusses transformation from XML to text. It also describes the architecture of this application.
  • If you like this, there are eight more quality examples in Applied XML Solutions from the author of this article.
  • For a basic introduction to XML programming in Java, follow the developerWorks tutorial of the same name.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=11990
ArticleTitle=Managing ezines with JavaMail and XSLT, Part 2
publish-date=10272010