Skip to main content

skip to main content

developerWorks  >  XML | Java technology  >

Scala and XML

XML processing made easy

developerWorks
Document options

Document options requiring JavaScript are not displayed

Sample code


Rate this page

Help us improve this content


Level: Intermediate

Michael Galpin (mike.sr@gmail.com), Software architect, eBay

22 Apr 2008

Scala is a popular new programming language that runs on the Java™ Virtual Machine (JVM.) Scala compiles into byte-code and thus it can leverage the Java programming language. Its syntax, however, makes it a powerful alternative to Java code in certain scenarios. One of those scenarios is XML processing. Scala lets you navigate and process parsed XML in several ways. It also has first class support for XML built right in, so you don't need to create strings of XML or programmatically build DOM trees. In this article, you will see these aspects of Scala in action and see how Scala can make working with XML a joy to do.

This article uses the Scala programming language. Version 2.6.1 is used in the article. Being a young language, it is still fast moving, so check to see what the latest is. Knowledge of Scala is not assumed, and instead this article tries to introduce some of Scala's syntax and idioms. Scala requires a Java virtual machine. This article uses JDK 1.6.0_04, but Scala only requires 1.5 or higher. Familiarity with Java programming is assumed, even though no Java code is written in this article.

Parsing XML

Frequently used acronyms
  • API: application programming interface
  • DOM: Document Object Model
  • HTTP: Hypertext Transfer Protocol
  • JSON: JavaScript Object Notation
  • SAX: Simple API for XML
  • StAX: Streaming API For XML
  • XML: Extensible Markup Language

You will start off by examining how you can parse XML with Scala. Like most programming languages, Scala gives you multiple options for parsing XML. These are the same basic ones: InfoSet/DOM based representations, push (SAX) or pull (StAX) events, or data-binding similar to Java Architecture for XML Binding (JAXB.) You will explore the DOM based manipulation, as it demonstrates many of the benefits of Scala's syntax. Before getting into that, you need to figure out what XML you will parse and what you want to do to it. For that you need to come up with a sample application.



Back to top


Sample application: FriendFeed

FriendFeed is one of the popular new Web services of 2008. It lets users aggregate their activity on other services, such as various blog services, instant messaging services, YouTube, Flickr, and Twitter. It then creates a single data feed from this aggregation. You can do this on an individual basis, that is, get the aggregated activity for a given person. Even more interesting, though probably not as useful, is the FriendFeed public feed. This aggregates all public activities across all FriendFeed users. FriendFeed provides an API for accessing both individual feeds as well as the public feed. You will write an application for accessing and parsing the public feed.



Back to top


Leveraging Java libraries

The first thing you need to do is access the FriendFeed public feed. The URL for it is http://friendfeed.com/api/feed/public. By default, it displays the data in the JSON format, and shows the 30 latest entries. To change the format to XML, add the query string parameter format=xml. To change the number of entries to 100 (for example), add the query string parameter num=100. Now you just need to access this URL. That is easy to do in Java code, and thus it is easy to do in Scala code. Take a look at the code to access the FriendFeed public feed in Listing 1.


Listing 1. Accessing the FriendFeed
                
object FriendFeed {
  import java.net.{URLConnection, URL}
  import scala.xml._ 
  def friendFeed():Elem = {
    val url = new URL("http://friendfeed.com/api/feed/public?format=xml&num=100")
    val conn = url.openConnection
    XML.load(conn.getInputStream)
  }
}

Notice the first thing you do here is import two core Java classes. Scala does not bother with its own APIs for doing things like opening HTTP connections because it can simply leverage Java's APIs for this. Notice that Scala provides some shortcuts for importing more than one class from the same package. The next line imports Scala's core XML classes. The underscore is like an asterisk in Java, i.e., it imports all of the classes in the scala.xml package.

So you use Java's APIs for opening an HTTP connection to FriendFeed. Now you use the XML object from Scala to parse. There are several interesting things to notice here. First, XML is a Scala object. That is, it is a singleton object. Scala does not have static methods, fields, or initializers. Instead you can define some as an object (instead of a class) and it becomes a singleton instance of the class. You can access methods on singleton objects similar to how you would call static methods. That is what is going on with the XML.load statement. Notice that even though this is a method on a Scala object, it is taking a Java object (java.io.InputStream) as a parameter. This just shows the close relationship between Scala and Java. Finally notice that there is no return statement. Return statements are optional in Scala. If there is no return statement, then the evaluation of the last statement in the method is returned (if it can be, if not Scala returns a compilation error.) Now the method in Listing 1 can be accessed very simply, as shown in Listing 2.


Listing 2. Accessing the friendFeed method
                
val feedXml = friendFeed

Notice that you do not have to put parentheses on the call to the friendFeed method. You also make use of Scala's type inference. You do not have to declare what type feedXml is because it is inferred from the return type of the friendFeed method. Look again at Listing 1 and see how it also makes use of these syntactic shortcuts. One last thing to notice is that your parsed XML object is declared as a val. That makes it an immutable object (like a string in Java code), as is common in Scala. There are numerous advantages to having XML as an immutable object, but it can be tricky to get used to it if you are accustomed to using the appendChild APIs in DOM. Now that you have parsed the XML from FriendFeed, you can start ti slice and dice it using Scala.



Back to top


Navigation and pattern matching

Many programming languages represent XML as a DOM tree. This has many advantages but it can be laborious to programmatically traverse a tree to extract data from the XML document. Java technology has libraries that leverage XPath syntax. Scala takes a similar approach, but it has some advantages. Scala has many functional language aspects to it. There are no operators (like + or *) in Scala. Instead symbols like + or * are used to define functions that can do things like normal numerical addition and subtraction. This also means that you can define operators (since they are actually just functions) to any type. It is much more powerful than operator overloading in languages like C++. In the case of XPath, you are able to use certain parts of XPath syntax directly in Scala, as it just gets translated into a function call.

With that in mind, let's take a look at what the XML from FriendFeed looks like. Listing 3 shows a sample.


Listing 3. Sample FriendFeed XML
                
<feed>
    <entry>
        <updated>2008-03-26T05:06:36Z</updated>
        <service>
            <profileUrl>http://twitter.com/karlerikson</profileUrl>
            <id>twitter</id>
            <name>Twitter</name>
        </service>
        <title>Listening to Panic at the Disco on Kimmel</title>
        <link>http://twitter.com/karlerikson/statuses/777188586</link>
        <published>2008-03-26T05:06:36Z</published>
        <id>f18ebf10-06be-98e2-6059-fa78fa44584b</id>
        <user>
            <profileUrl>http://friendfeed.com/karlerikson</profileUrl>
            <nickname>karlerikson</nickname>
            <id>f294a86c-e6f3-11dc-8203-003048343a40</id>
            <name>Karl Erikson</name>
        </user>
    </entry>
    <entry>
        <updated>2008-03-26T05:06:35Z</updated>
        <service>
            <profileUrl>http://twitter.com/asfaq</profileUrl>
            <id>twitter</id>
            <name>Twitter</name>
        </service>
        <title>@ceetee lol</title>
        <link>http://twitter.com/asfaq/statuses/777188582</link>
        <published>2008-03-26T05:06:35Z</published>
        <id>d4099bb0-8186-5aa1-ce1f-672246c0fe9c</id>
        <user>
            <profileUrl>http://friendfeed.com/asfaq</profileUrl>
            <nickname>asfaq</nickname>
            <id>41e24568-ee6b-11dc-a88d-003048343a40</id>
            <name>Asfaq</name>
        </user>
    </entry>
    <entry>
        <updated>2008-03-26T05:06:31Z</updated>
        <service>
            <profileUrl>http://twitter.com/chrisjlee</profileUrl>
            <id>twitter</id>
            <name>Twitter</name>
        </service>
        <title>sleep..</title>
        <link>http://twitter.com/chrisjlee/statuses/777188561</link>
        <published>2008-03-26T05:06:31Z</published>
        <id>8c4ec232-3ad5-28e1-16c0-00a428294c9c</id>
        <user>
            <profileUrl>http://friendfeed.com/chrisjlee</profileUrl>
            <nickname>chrisjlee</nickname>
            <id>5af39ad4-53b6-45d8-ae25-ef7c50fe9568</id>
            <name>Chris</name>
        </user>
    </entry>
    <entry>
        <updated>2008-03-26T05:06:49Z</updated>
        <service>
            <profileUrl>
                http://www.google.com/reader/shared/09566745492004297397
            </profileUrl>
            <id>googlereader</id>
            <name>Google Reader</name>
        </service>
        <title>Poketo First Editions Show!!</title>
        <link>
            http://www.poketo.com/blog/2008/03/24/poketo-first-editions-show/
        </link>
        <published>2008-03-26T05:06:49Z</published>
        <id>4caefceb-d71c-59c9-8199-45c5adbc60f2</id>
        <user>
            <profileUrl>http://friendfeed.com/misterjt</profileUrl>
            <nickname>misterjt</nickname>
            <id>e745cc8a-f9e4-11dc-a477-003048343a40</id>
            <name>Jason Toney</name>
        </user>
    </entry>
</feed>

For your application, you will first get a list of users based on what service. So you will start by filtering the feed to just the service you are interested in. Take a look at Listing 4 to see how you do this in Scala.


Listing 4. Filtering the feed based on service
                
def filterFeed(feed:Elem, feedId:String):Seq[Node] = {
   var results = new Queue[Node]()
   feed\"entry" foreach{(entry) =>
     if (search(entry\"service"\"id" last, feedId)){
       results += (entry\"user"\"nickname").last
     }
   }
   return results
 }
 
 def search(p:Node, Name:String):Boolean = p match {
   case <id>{Text(Name)}</id> => true
   case _ => false
 }

Your function, filterFeed, takes in an XML element (feed) and an ID of a service. First create a Queue of XML Nodes called results. Queue is parameterized, like a List or Map in Java. Scala uses square brackets to denote generic type, instead of the angle brackets used in Java programming. The line feed\"entry" is an XPath-like expression. The backslash is actually a method of the class scala.xml.Elem. It returns all of the child-nodes with the given name, that is, all of the <entry> elements in the feed. This is returned as an instance of the class scala.xml.NodeSeq. This class extends Seq[Node]. Because it is a Seq, it has a foreach method that takes a closure as a parameter.

The notation (entry) => ... indicates a closure that takes a single parameter denoted as entry. In the closure, you once again use an XPath-like expression, entry\"service"\"id" to extract the ID of the service from the entry Node. Pass this to a function called search to compare it against the feed ID passed into the method. You will look at the body of that function shortly. If there is a match, you add the nickname of the user who created the entry to the results queue. Notice how the queue object has what looks like an operator, +=. Once again this is really just a function on the queue object. You also use the XPath-like syntax of Scala one more time to extract the user's nickname Node.

Now look at the search function. This function uses one of the most powerful features of Scala, its pattern matching. In this case, you compare the input node against a node named id that has a child text node made of the Name string passed in to the function. If there is a match, the function returns true. The syntax case _ matches everything else. The _ is again being used a wildcard in Scala. A statement like case _ is similar to a default clause in a case statement in Java or C++ code. This is a pretty simple example of the power of pattern matching in Scala. Next you'll see how to build an XML structure.



Back to top


Using pattern matching to build XML

In your application, you want to build a new XML structure of all of the user nicknames you extracted from the FriendFeed public feed. There are a number of ways to do this, but you will demonstrate how you can once again use pattern matching for this. Take a look at the function shown in Listing 5.


Listing 5. Builder function that uses pattern matching
                
def add(p:Node, newEntry:Node ):Node = p match {
   case <UserList>{ ch @ _* }</UserList> => 
     <UserList>{ ch }{ newEntry }</UserList>
}

This pattern will match a UserList element with any kind of children. It will then return a new UserList element with the same children, plus an additional child that follows the existing children. This is functionally equivalent to the appendChild idiom from the DOM specification. It is fundamentally different, because the original Node is not being changed (since it cannot be, as it is immutable.) Instead a new Node is created and returned. This could use considerably more memory than the equivalent DOM operation. Let's take a look at some alternatives to building XML structures using Scala.



Back to top


Creating XML

Scala's natural XML syntax can be best appreciated when creating new XML documents. For your first example of this, you will take the UserList structures you created and wrap it in a node for the associated service. Listing 6 shows the code.


Listing 6. Creating Service Results
                
def results(name:String, cnt:Int, elements:NodeSeq):Any = {
   if (cnt > 0){
     return <Service id={name}>{elements}</Service>
   } 
 }
 

With Scala's native support for XML, you can use a template-style syntax to insert dynamic data into an XML structure. In this case, you set the id attribute using the name string that is passed in. You take a sequence of passed in elements and make them child nodes of the Service element that you are creating. However, notice that you only do this if the cnt parameter is greater than 0. Notice that if cnt==0 then the function returns nothing. You can get away with this in Scala by saying the function returns Any. The Any class in Scala is primordial, similar to java.lang.Object. Scala has no void type, but there is an equivalent type called Unit. The nice thing is that Unit extends Any, thus allowing a function to sometimes return an object, but other times return nothing.

As you can see, mixing dynamic data within Scala's XML syntax can be very powerful. For another example, you will create stats XML document. This will show XML describing the number of occurrences of each service in the feed. Its code is shown in Listing 7.


Listing 7. Creating Stats XML
                
def stats(map:HashMap[String,Int]):Node = {
   var nodes = new Queue[Node]()
   map.foreach{(nvPair) =>
     nodes += <Service id={nvPair._1} cnt={nvPair._2.toString}/>
   }
   return <Stats>{nodes}</Stats>
}

Your function expects a HashMap whose keys are the names of the services and whose values are the count of how many times the service occurs in the FriendFeed. The function iterates over the HashMap, using the familiar foreach-closure style. It then creates a new Node using the name/value pair from the HashMap, and adds this Node to a Queue of Nodes. You then create your Stats structure and simply drop in the Queue of Nodes as dynamic data that is then evaluated into an XML structure. Now have all of your functions, you just need something to drive the program so that you can test it.



Back to top


Running and testing

Before you can run the program, you need to add code to drive it. You create a main method, like you would do in Java programming as in Listing 8.


Listing 8. FriendFeed main
                
def main(args:Array[String]) = {
    val feedXml = friendFeed
    var map = new HashMap[String,Int]
    args.foreach{(serviceName) =>
      val filteredEntries = filterFeed(feedXml, serviceName)
      var users:Node = <UserList/>
      filteredEntries.foreach{(user) =>
        users = add(users, user)
      }
      map += serviceName -> filteredEntries.length
      println(results(serviceName,filteredEntries.length,users))
    }
    println(stats(map))
}

This method creates the FriendFeed. It takes in command line parameters to determine what services to find users and calculate stats. Notice how the syntax is very similar to Java syntax. The main function takes in an array of Strings called args. The program creates the HashMap for the stats document, as well as UserList documents for each service. It then prints out each UserList as well as the stats document. To run the program, you compiled it with scalac FriendFeed.scala and then scala FriendFeed, as in Listing 9.


Listing 9. Running the program
                
$ scalac FriendFeed.scala
$ scala FriendFeed googlereader flickr delicious twitter blog
<Service id="twitter"><UserList><nickname>ntamaoki</nickname>
<nickname>terrazi</nickname><nickname>ntamaoki</nickname>
<nickname>terrazi</nickname><nickname>ntamaoki</nickname>
<nickname>parodi</nickname><nickname>trevor</nickname>
<nickname>cindy</nickname><nickname>christinelu</nickname>
<nickname>clint</nickname><nickname>savvyauntie</nickname>
<nickname>44gi</nickname></UserList></Service>
<Serviceid="blog"><UserList><nickname>nechipor</nickname>
<nickname>mdolla</nickname><nickname>kyhpudding</nickname>
<nickname>hanayuu</nickname><nickname>hanayuu</nickname>
</UserList></Service><Stats><Service cnt="12" id="twitter">
</Service><Service cnt="0" id="delicious"></Service><Service 
cnt="0" id="flickr"></Service><Service cnt="0" id="googlereader">
</Service><Service cnt="5" id="blog"></Service></Stats>

You can of course pick different service names as command line arguments and so on. Scala also has some pretty-printer classes for printing XML with all of the right spacing, tabs, and formatting. There are also XML writers for writing back to streams, such as files. All of the mundane things you could want to do with Scala are supported, along with the more exotic things that are unique to Scala.



Back to top


Summary

Scala is viewed by many as an evolutionary step for Java programming. XML has become such an important part of technology that it is only natural to use a programming language that has XML support built in to its syntax. That is exactly what you get with Scala. It makes many complex things easier. Look at all of the things done in this article and imagine how many more lines of Java code might be needed to do the same thing.




Back to top


Download

DescriptionNameSizeDownload method
Article example codefriendfeed.example.zip1KBHTTP
Information about download methods


Resources

Learn

Get products and technologies
  • FriendFeed public feed: Access this URL to display the data in the JSON format, and show the 30 latest entries.

  • Scala: Download Scala.

  • IBM trial software: Build your next development project with trial software available for download directly from developerWorks.


Discuss


About the author

Michael Galpin's photo

Michael Galpin has developed Web applications since the late 1990's. He holds a degree in mathematics from the California Institute of Technology and is an architect at eBay in San Jose, CA.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top