 | Level: Intermediate Michael Galpin (mike.sr@gmail.com), Software architect, eBay
22 Apr 2008 Scala is a popular new programming language that runs on the
Java™ Virtual Machine (JVM.) Scala compiles into byte-code and thus it can
leverage the Java programming language. Its syntax, however, makes it a powerful
alternative to Java code in certain scenarios. One of those scenarios is XML
processing. Scala lets you navigate and process parsed XML in several ways. It also
has first class support for XML built right in, so you don't need to create strings of XML or programmatically build DOM trees. In this article, you will see these aspects of Scala in action and see how Scala can make working with XML a joy to do.
This article uses the Scala programming language. Version 2.6.1 is used in the article. Being a young
language, it is still fast moving, so check to see what the latest is. Knowledge of
Scala is not assumed, and instead this article tries to introduce some of Scala's syntax
and idioms. Scala requires a Java virtual machine. This article uses JDK 1.6.0_04, but Scala only
requires 1.5 or higher. Familiarity with Java programming is assumed, even though no Java code is written in this article.
Parsing XML
 |
Frequently used acronyms
- API: application programming interface
- DOM: Document Object Model
- HTTP: Hypertext Transfer Protocol
- JSON: JavaScript Object Notation
- SAX: Simple API for XML
- StAX: Streaming API For XML
- XML: Extensible Markup Language
|
|
You will start off by examining how you can parse XML with Scala. Like most programming
languages, Scala gives you multiple options for parsing XML. These are the same basic
ones: InfoSet/DOM based representations, push (SAX) or pull (StAX) events, or
data-binding similar to Java Architecture for XML Binding (JAXB.) You will explore the
DOM based manipulation, as it demonstrates many of the benefits of Scala's syntax.
Before getting into that, you need to figure out what XML you will parse and what you want to do to it. For that you need to come up with a sample application.
Sample application: FriendFeed
FriendFeed is one of the popular new Web services of 2008. It lets users aggregate
their activity on other services, such as various blog services, instant messaging
services, YouTube, Flickr, and Twitter. It then creates a single data feed from this
aggregation. You can do this on an individual basis, that is, get the aggregated activity for a given person. Even more interesting, though probably not as useful, is the FriendFeed public feed. This aggregates all public activities across all FriendFeed users. FriendFeed provides an API for accessing both individual feeds as well as the public feed. You will write an application for accessing and parsing the public feed.
Leveraging Java libraries
The first thing you need to do is access the FriendFeed public feed. The URL for it is
http://friendfeed.com/api/feed/public. By
default, it displays the data in the JSON format, and shows the 30 latest entries. To
change the format to XML, add the query string parameter format=xml. To change the number of entries to 100 (for example), add
the query string parameter num=100. Now you just need to
access this URL. That is easy to do in Java code, and thus it is easy to do in Scala
code. Take a look at the code to access the FriendFeed public feed in Listing 1.
Listing 1. Accessing the FriendFeed
object FriendFeed {
import java.net.{URLConnection, URL}
import scala.xml._
def friendFeed():Elem = {
val url = new URL("http://friendfeed.com/api/feed/public?format=xml&num=100")
val conn = url.openConnection
XML.load(conn.getInputStream)
}
}
|
Notice the first thing you do here is import two core Java classes. Scala does not bother with its own APIs for doing things like opening HTTP connections because it can simply leverage Java's APIs for this. Notice that Scala provides some shortcuts for importing more than one class from the same package. The next line imports Scala's core XML classes. The underscore is like an asterisk in Java, i.e., it imports all of the classes in the scala.xml package.
So you use Java's APIs for opening an HTTP connection to FriendFeed. Now you use the
XML object from Scala to parse. There are several interesting things to notice here.
First, XML is a Scala object. That is, it is a singleton object. Scala does not have
static methods, fields, or initializers. Instead you can define some as an object
(instead of a class) and it becomes a singleton instance of the class. You can access
methods on singleton objects similar to how you would call static methods. That is what
is going on with the XML.load statement. Notice that even
though this is a method on a Scala object, it is taking a Java object (java.io.InputStream) as a parameter. This just shows the close
relationship between Scala and Java. Finally notice that there is no return statement.
Return statements are optional in Scala. If there is no return statement, then the
evaluation of the last statement in the method is returned (if it can be, if not Scala
returns a compilation error.) Now the method in Listing 1 can be accessed very simply, as shown in Listing 2.
Listing 2. Accessing the friendFeed method
Notice that you do not have to put parentheses on the call to the friendFeed method. You also make use of Scala's type inference. You
do not have to declare what type feedXml is because it is
inferred from the return type of the friendFeed method. Look
again at Listing 1 and see how it also makes use of these syntactic
shortcuts. One last thing to notice is that your parsed XML object is declared as a val.
That makes it an immutable object (like a string in Java code), as is common in Scala.
There are numerous advantages to having XML as an immutable object, but it can be tricky
to get used to it if you are accustomed to using the appendChild APIs in DOM. Now that you have parsed the XML from
FriendFeed, you can start ti slice and dice it using Scala.
Navigation and pattern matching
Many programming languages represent XML as a DOM tree. This has many advantages but it
can be laborious to programmatically traverse a tree to extract data from the XML
document. Java technology has libraries that leverage XPath syntax. Scala takes a similar approach, but it has some advantages. Scala has many functional language aspects to it. There are no operators (like + or *) in Scala. Instead symbols like + or * are used to define functions that can do things like normal numerical addition and subtraction. This also means that you can define operators (since they are actually just functions) to any type. It is much more powerful than operator overloading in languages like C++. In the case of XPath, you are able to use certain parts of XPath syntax directly in Scala, as it just gets translated into a function call.
With that in mind, let's take a look at what the XML from FriendFeed looks like. Listing 3 shows a sample.
Listing 3. Sample FriendFeed XML
<feed>
<entry>
<updated>2008-03-26T05:06:36Z</updated>
<service>
<profileUrl>http://twitter.com/karlerikson</profileUrl>
<id>twitter</id>
<name>Twitter</name>
</service>
<title>Listening to Panic at the Disco on Kimmel</title>
<link>http://twitter.com/karlerikson/statuses/777188586</link>
<published>2008-03-26T05:06:36Z</published>
<id>f18ebf10-06be-98e2-6059-fa78fa44584b</id>
<user>
<profileUrl>http://friendfeed.com/karlerikson</profileUrl>
<nickname>karlerikson</nickname>
<id>f294a86c-e6f3-11dc-8203-003048343a40</id>
<name>Karl Erikson</name>
</user>
</entry>
<entry>
<updated>2008-03-26T05:06:35Z</updated>
<service>
<profileUrl>http://twitter.com/asfaq</profileUrl>
<id>twitter</id>
<name>Twitter</name>
</service>
<title>@ceetee lol</title>
<link>http://twitter.com/asfaq/statuses/777188582</link>
<published>2008-03-26T05:06:35Z</published>
<id>d4099bb0-8186-5aa1-ce1f-672246c0fe9c</id>
<user>
<profileUrl>http://friendfeed.com/asfaq</profileUrl>
<nickname>asfaq</nickname>
<id>41e24568-ee6b-11dc-a88d-003048343a40</id>
<name>Asfaq</name>
</user>
</entry>
<entry>
<updated>2008-03-26T05:06:31Z</updated>
<service>
<profileUrl>http://twitter.com/chrisjlee</profileUrl>
<id>twitter</id>
<name>Twitter</name>
</service>
<title>sleep..</title>
<link>http://twitter.com/chrisjlee/statuses/777188561</link>
<published>2008-03-26T05:06:31Z</published>
<id>8c4ec232-3ad5-28e1-16c0-00a428294c9c</id>
<user>
<profileUrl>http://friendfeed.com/chrisjlee</profileUrl>
<nickname>chrisjlee</nickname>
<id>5af39ad4-53b6-45d8-ae25-ef7c50fe9568</id>
<name>Chris</name>
</user>
</entry>
<entry>
<updated>2008-03-26T05:06:49Z</updated>
<service>
<profileUrl>
http://www.google.com/reader/shared/09566745492004297397
</profileUrl>
<id>googlereader</id>
<name>Google Reader</name>
</service>
<title>Poketo First Editions Show!!</title>
<link>
http://www.poketo.com/blog/2008/03/24/poketo-first-editions-show/
</link>
<published>2008-03-26T05:06:49Z</published>
<id>4caefceb-d71c-59c9-8199-45c5adbc60f2</id>
<user>
<profileUrl>http://friendfeed.com/misterjt</profileUrl>
<nickname>misterjt</nickname>
<id>e745cc8a-f9e4-11dc-a477-003048343a40</id>
<name>Jason Toney</name>
</user>
</entry>
</feed>
|
For your application, you will first get a list of users based on what service. So you
will start by filtering the feed to just the service you are interested in. Take a look
at Listing 4 to see how you do this in Scala.
Listing 4. Filtering the feed based on service
def filterFeed(feed:Elem, feedId:String):Seq[Node] = {
var results = new Queue[Node]()
feed\"entry" foreach{(entry) =>
if (search(entry\"service"\"id" last, feedId)){
results += (entry\"user"\"nickname").last
}
}
return results
}
def search(p:Node, Name:String):Boolean = p match {
case <id>{Text(Name)}</id> => true
case _ => false
}
|
Your function, filterFeed, takes in an XML element (feed)
and an ID of a service. First create a Queue of XML Nodes called results. Queue is
parameterized, like a List or Map in Java. Scala uses square brackets to denote generic
type, instead of the angle brackets used in Java programming. The line feed\"entry" is an XPath-like expression. The backslash is actually a
method of the class scala.xml.Elem. It returns all of the
child-nodes with the given name, that is, all of the <entry> elements in the feed. This is returned as an instance of the class scala.xml.NodeSeq. This class extends Seq[Node]. Because it is a Seq, it has a foreach method that takes a closure as a parameter.
The notation (entry) => ... indicates a closure that takes a single parameter denoted as entry. In the closure, you once again use an XPath-like expression, entry\"service"\"id" to extract the ID of the service from the entry Node. Pass this to a function called search to compare it against the feed ID passed into the method. You will look at the body of that function shortly. If there is a match, you add the nickname of the user who created the entry to the results queue. Notice how the queue object has what looks like an operator, +=. Once again this is really just a function on the queue object. You also use the XPath-like syntax of Scala one more time to extract the user's nickname Node.
Now look at the search function. This function uses one of the most powerful features
of Scala, its pattern matching. In this case, you compare the input node against a node
named id that has a child text node made of the Name string passed in to the function.
If there is a match, the function returns true. The syntax case
_ matches everything else. The _ is again being used a
wildcard in Scala. A statement like case _ is similar to a
default clause in a case statement in Java or C++ code. This is a pretty simple example of
the power of pattern matching in Scala. Next you'll see how to build an XML structure.
Using pattern matching to build XML
In your application, you want to build a new XML structure of all of the user nicknames
you extracted from the FriendFeed public feed. There are a number of ways to do this,
but you will demonstrate how you can once again use pattern matching for this. Take a
look at the function shown in Listing 5.
Listing 5. Builder function that uses pattern matching
def add(p:Node, newEntry:Node ):Node = p match {
case <UserList>{ ch @ _* }</UserList> =>
<UserList>{ ch }{ newEntry }</UserList>
}
|
This pattern will match a UserList element with any kind of children. It will then return a new UserList element with the same children, plus an additional child that follows the existing children. This is functionally equivalent to the appendChild idiom from the DOM specification. It is fundamentally different, because the original Node is not being changed (since it cannot be, as it is immutable.) Instead a new Node is created and returned. This could use considerably more memory than the equivalent DOM operation. Let's take a look at some alternatives to building XML structures using Scala.
Creating XML
Scala's natural XML syntax can be best appreciated when creating new XML documents. For
your first example of this, you will take the UserList structures you created and wrap
it in a node for the associated service. Listing 6 shows the code.
Listing 6. Creating Service Results
def results(name:String, cnt:Int, elements:NodeSeq):Any = {
if (cnt > 0){
return <Service id={name}>{elements}</Service>
}
}
|
With Scala's native support for XML, you can use a template-style syntax to insert
dynamic data into an XML structure. In this case, you set the id attribute using the
name string that is passed in. You take a sequence of passed in elements and make them
child nodes of the Service element that you are creating. However, notice that you only
do this if the cnt parameter is greater than 0. Notice that if cnt==0 then the function
returns nothing. You can get away with this in Scala by saying the function returns Any.
The Any class in Scala is primordial, similar to java.lang.Object. Scala has no void type, but there is an equivalent type called Unit. The nice thing is that Unit extends Any, thus allowing a function to sometimes return an object, but other times return nothing.
As you can see, mixing dynamic data within Scala's XML syntax can be very powerful. For
another example, you will create stats XML document. This will show XML describing the
number of occurrences of each service in the feed. Its code is shown in Listing 7.
Listing 7. Creating Stats XML
def stats(map:HashMap[String,Int]):Node = {
var nodes = new Queue[Node]()
map.foreach{(nvPair) =>
nodes += <Service id={nvPair._1} cnt={nvPair._2.toString}/>
}
return <Stats>{nodes}</Stats>
}
|
Your function expects a HashMap whose keys are the names of the services and whose values are the count of how many times the service occurs in the FriendFeed. The function iterates over the HashMap, using the familiar foreach-closure style. It then creates a new Node using the name/value pair from the HashMap, and adds this Node to a Queue of Nodes. You then create your Stats structure and simply drop in the Queue of Nodes as dynamic data that is then evaluated into an XML structure. Now have all of your functions, you just need something to drive the program so that you can test it.
Running and testing
Before you can run the program, you need to add code to drive it. You create a main method, like you would do in Java programming as in Listing 8.
Listing 8. FriendFeed main
def main(args:Array[String]) = {
val feedXml = friendFeed
var map = new HashMap[String,Int]
args.foreach{(serviceName) =>
val filteredEntries = filterFeed(feedXml, serviceName)
var users:Node = <UserList/>
filteredEntries.foreach{(user) =>
users = add(users, user)
}
map += serviceName -> filteredEntries.length
println(results(serviceName,filteredEntries.length,users))
}
println(stats(map))
}
|
This method creates the FriendFeed. It takes in command line parameters to determine
what services to find users and calculate stats. Notice how the syntax is very similar
to Java syntax. The main function takes in an array of Strings called args. The program
creates the HashMap for the stats document, as well as UserList documents for each
service. It then prints out each UserList as well as the stats document. To run the
program, you compiled it with scalac FriendFeed.scala and then scala FriendFeed, as in Listing 9.
Listing 9. Running the program
$ scalac FriendFeed.scala
$ scala FriendFeed googlereader flickr delicious twitter blog
<Service id="twitter"><UserList><nickname>ntamaoki</nickname>
<nickname>terrazi</nickname><nickname>ntamaoki</nickname>
<nickname>terrazi</nickname><nickname>ntamaoki</nickname>
<nickname>parodi</nickname><nickname>trevor</nickname>
<nickname>cindy</nickname><nickname>christinelu</nickname>
<nickname>clint</nickname><nickname>savvyauntie</nickname>
<nickname>44gi</nickname></UserList></Service>
<Serviceid="blog"><UserList><nickname>nechipor</nickname>
<nickname>mdolla</nickname><nickname>kyhpudding</nickname>
<nickname>hanayuu</nickname><nickname>hanayuu</nickname>
</UserList></Service><Stats><Service cnt="12" id="twitter">
</Service><Service cnt="0" id="delicious"></Service><Service
cnt="0" id="flickr"></Service><Service cnt="0" id="googlereader">
</Service><Service cnt="5" id="blog"></Service></Stats>
|
You can of course pick different service names as command line arguments and so on.
Scala also has some pretty-printer classes for printing XML with all of the right
spacing, tabs, and formatting. There are also XML writers for writing back to streams, such as files. All of the mundane things you could want to do with Scala are supported, along with the more exotic things that are unique to Scala.
Summary
Scala is viewed by many as an evolutionary step for Java programming. XML has become
such an important part of technology that it is only natural to use a programming
language that has XML support built in to its syntax. That is exactly what you get with
Scala. It makes many complex things easier. Look at all of the things done in this
article and imagine how many more lines of Java code might be needed to do the same thing.
Download | Description | Name | Size | Download method |
|---|
| Article example code | friendfeed.example.zip | 1KB | HTTP |
|---|
Resources Learn
-
The busy Java developer's guide to Scala: Don't get thrown for a loop! (Ted Neward, developerWorks, March 2008): Take a peek inside Scala's control structures, and explore the subtle differences between the Java and Scala languages.
-
The busy Java
developer's guide to Scala: Class action (Ted Neward, developerWorks, February 2008):
Impressed by operators-as-functions in Scala? Learn how to use that in your own classes.
-
The busy Java
developer's guide to Scala: Functional programming for the object oriented (Ted
Neward, developerWorks, January 2008): Get an introduction to Scala.
-
The official site
of the Scala Programming Language: Visit the best place for Scala information and documentation.
-
Scala's XML support on XML: Read Burak Emir's online book about Scala's XML data model and the syntax of literal XML markup in Scala code.
-
Invoke dynamic
languages dynamically, Part 1: Introducing the Java scripting API (Tom McQueeney, developerWorks, September 2007): Learn about other alternative languages running on the JVM.
-
Practically Groovy: Mark
it up with Groovy Builders (Andrew Glover, developerWorks, April 2005): Read about
creating XML in Groovy, another promising language that compiles to Java bytecode.
-
XML and Java
technologies (Dennis Sosnoski, developerWorks, January 2003): Learn more about data binding frameworks (included JiBX).
-
Understanding
DOM tutorial (Nicholas Chase, developerWorks, March 2007): Learn about the structure of a DOM document, how to use Java technology to create a Document from an XML file, make changes to it, and retrieve the output.
-
IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
-
XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- The developerWorks Java technology zone: Find articles about every aspect of Java programming.
-
Java technology zone tutorials page Also see a complete listing of free Java-focused tutorials from developerWorks.
-
developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
- The technology
bookstore: Browse for books on these and other technical topics.
-
developerWorks
podcasts: Listen to interesting interviews and discussions for software developers.
Get products and technologies
-
FriendFeed public feed: Access this URL to display the data in the JSON format, and show the 30 latest entries.
-
Scala: Download Scala.
-
IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
Discuss
About the author  | 
|  | Michael Galpin has developed Web applications since the late 1990's. He holds a degree in mathematics from the California Institute of Technology and is an architect at eBay in San Jose, CA. |
Rate this page
|  |