Classify content with XQuery

Categorize unstructured and semi-structured content

With the expanding growth of semi-structured and unstructured data (XML) comes the need to categorize and classify content to make querying easier, faster, and more relevant. In this article, try several techniques using XQuery to automatically tag XML documents with content categorization based on the analysis of their content and structure.

Share:

James R. Fuller (jim.fuller@webcomposite.com), Technical Director, Webcomposite s.r.o.

Photo of Jim FullerJames Fuller has been a professional developer for more than 15 years, working with several software blue-chip companies in both his native USA and the UK. He has co-written a few technology-related books and regularly speaks and writes articles focusing on XML technologies. He is a founding committee member for XML Prague and was in the gang responsible for EXSLT. He spends most of his time working with XML databases and XQuery. You can reach James at jim.fuller@webcomposite.com.



22 March 2011

Also available in Chinese Japanese

Content classification is any process that enriches data to organize it in a manner that makes the data easier to search, archive, manage, and integrate into other processes. Generating such metadata, you can in turn derive more value from existing content.

Frequently used acronyms

  • API: Application programming interface
  • FLWOR: For, Let, Where, Order by, and Return
  • HTML: Hypertext Markup Language
  • HTTP: Hypertext Transfer Protocol
  • NASA: National Aeronautics and Space Administration (USA)
  • SQL: Structured Query Language
  • URL: Uniform Resource Locator
  • XML: Extensible Markup Language

One of the big issues with classification is that people make mistakes and arrive at different classifications based on their own logic. When you define a classification system, take in account all stakeholders' opinions and try to have a consistent approach to categorizing data, even through the many challenges. For example, people in one department might not be aware of what metadata is important to people in another. In addition, training people to understand and consistently apply a classification can be time-consuming.

As ever increasing volumes of data (which some call the digital landfill) are generated, it becomes nearly impossible to classify data manually. You must turn to automated methods of analyzing content across a wide variety of formats and inputs.

Automating classification provides many benefits:

  • You save money.
  • You save time.
  • The classification provides consistency by offering a common mechanism through which metadata is added.
  • Organizations derive greater value from existing content.

Installing and running the code examples

I wrote the code examples in this article for use with the eXist XML Database or Zorba XQuery processor. To use it with the eXist XML Database, you need to have the database installed; otherwise, use the Zorba XQuery processor, which is available through an online sandbox.

Install the eXist XML Database

To install the eXist XML Database, perform the following steps:

  1. Download and extract the example code.
  2. Upload the extracted code directory into the database collection—for example, /db/content-classification.
  3. If you are using Microsoft® Office Access®, run the code example in a browser.

Using the Zorba XQuery processor

Alternately, you can use the online version of Zorba XQuery processor to run code examples by performing the following steps:

  1. Download and extract the example code.
  2. Cut and paste the code example into the Zorba XQuery processor online sandbox at http://try.zorba-xquery.com/.
  3. Click Execute to run the code.

Notice that the difference is small between the eXist and Zorba examples. Sharp-eyed readers, however, will notice one difference in their respective use of the EXPath HTTP Client library: Zorba has this library built in by default, and the eXist database does not, which is why I supply a stand-alone http-client.xqm XQuery library designed specifically for use with eXist. The examples in this article use the EXPath HTTP Client library to access remote data and web services. In the second part of this article, you integrate more advanced processing using the Yahoo! Query Language (YQL) and AlchemyAPI tools.

Note: Be aware that you might be required to sign up to receive an API key before using these services.


Simple classification with XQuery

This first part of the article shows how you can use pure XQuery to start classifying content.

Text analytics: Defining word frequency in unstructured context

The term text analytics (or text mining) defines a set of machine learning and linguistic techniques to extract and model information metadata from textual sources. Text analytics applies natural language processing (NLP) and analytical methods on textual content and extracts useful metadata, such as:

  • Language type. Analysis of character encoding, words, and content style can easily determine with high confidence which language textual data is in.
  • Keywords. Text analysis can extract a set of keywords that characterize the document.
  • Common entities. Algorithms that scan text for common patterns, such as email addresses, phone numbers, and people and place names, are useful for named entity extraction.
  • Semantic relationships. A wide variety of approaches is available for scanning content in the hope of gleaning more compelling and deeper insights.

One such case of text mining is determining the frequency of words contained within a document, the assumption being that the more often a word is used, the more relevant this word is to the entire document.

The most common words could be construed as document keywords, but be aware that the term keywords is usually applied to the output from more sophisticated algorithms, which goes farther than just defining word frequency. For example, keyword analysis typically cross-references common words with synonym lookup tables and can also analyze the distance between words to help determine the importance of the word in context of the entire document.

In any text analysis, the first step is to generate a corpus from the textual content, with the subsequent analysis being applied to the corpus. One of the reasons for generating a corpus is to normalize text and remove anything that isn't relevant.

Listing 1 shows an XQuery program that consumes an HTML page (using the EXPath HTTP Client library) and extracts all paragraph elements from the web page. As you do not care about what case a word is in, you create the corpus out of the content, which is all lowercase.

Listing 1. XQuery program that generates a word frequency list
xquery version "1.0";
import module namespace http = "http://expath.org/ns/http-client";
let $content-url     := 'http://en.wikipedia.org/wiki/Asteroid_impact_avoidance'
let $content-request := 
         <http:request href="{$content-url}" method="get" follow-redirect="true"/>
let $content         := 
         fn:string-join(http:send-request($content-request)[2],' ')
return
let $corpus := for $w in tokenize($content, '\W+') return lower-case($w)
let $wordList := distinct-values($corpus)
return
<words> {
for $w in $wordList
let $freq := count($corpus[. eq $w])
order by $freq descending
return <word word="{$w}" frequency="{$freq}"/>
}</words>

The next step is to derive all unique words from the corpus, for which you use a FLWOR to process each word, generating word count (by referring back to the corpus, which contains all the words), and then output a <word/> element.

Note: I use the same web URL (http://en.wikipedia.org/wiki/Asteroid_impact_avoidance) as the text source for all examples in the article to illustrate how effective each approach is.

The result of running the program in Listing 1 is an XML document that has a <word/> element containing the frequency and word, ordered by the most frequent words contained in the Wikipedia page on asteroid impact avoidance. Listing 2 shows the list.

Listing 2. Word frequency list
<words>
<word word="the" frequency="377"/>
<word word="of" frequency="236"/>
<word word="a" frequency="193"/>
<word word="to" frequency="167"/>
<word word="and" frequency="141"/>
<word word="in" frequency="124"/>
<word word="earth" frequency="121"/>
<word word="â" frequency="109"/>
<word word="asteroid" frequency="102"/>
....
</words>

As you can see, the analysis returned a lot of words, with many frequent words being irrelevant by dint of their common usage in the English language. You can fix this by defining a few simple rules from which to reduce the amount of noise, such as removing all words of three letters or less and removing any words with a frequency of 3 or less.

Listing 3 shows the same code with logic added that tests for word string length and frequency.

Listing 3. Amended XQuery program that generates a word frequency list
xquery version "1.0";
import module namespace http = "http://expath.org/ns/http-client";
let $content-url     := 'http://en.wikipedia.org/wiki/Asteroid_impact_avoidance'
let $content-request := 
<http:request href="{$content-url}" method="get" follow-redirect="true"/>
let $response := http:send-request($content-request)[2]
let $content := fn:string-join($response,' ')
return
let $corpus := for $w in tokenize($content, '\W+') return lower-case($w)
let $wordList := distinct-values($corpus)
return
<words> {
         for $w in $wordList
         let $freq := count($corpus[. eq $w])
         order by $freq descending
         return 
         if(string-length($w) gt 3 and $freq gt 3) then
           <word word="{$w}" frequency="{$freq}"/>
         else
           ()
         }</words>

With different datasets, you might have to adjust or enhance these settings to exclude words of greater length or higher frequencies, but as Listing 4 shows, the minimal settings have omitted a lot of the noise, leaving a much more relevant set of terms.

Listing 4. Revised word frequency list
<words>
<word word="earth" frequency="121"/>
<word word="asteroid" frequency="102"/>
<word word="impact" frequency="58"/>
<word word="near" frequency="56"/>
<word word="with" frequency="55"/>
<word word="that" frequency="53"/>
<word word="space" frequency="49"/>
<word word="nasa" frequency="43"/>
<word word="object" frequency="36"/>
<word word="from" frequency="34"/>
<word word="this" frequency="32"/>
...
</words>

Clearly, this approach has limitations. But it's a good start and shows you that with a small amount of XQuery, it's possible to get a basic set of keywords characterizing a document's textual content.

Adding structure to word frequency

Metadata decisions

One of the first decisions that you typically need to make when generating metadata is whether to keep metadata inside your content XML or store it in a separate meta-document. After deciding where you want to store metadata, you also need to decide in what format to encode metadata. There are many ways to mark up metadata—for example:

  • Darwin Information Typing Architecture. Defines a keyword element that is a good match for your purpose.
  • Microformats. You can use the rel-tag to annotate keyword elements directly in your document.
  • The rest. Many semweb markup formats are available, such as Research Description Framework and Web Ontology Language.

Adopting an existing specific markup language should be encouraged over designing your own. But make sure that you choose a format that maintains simplicity while giving your metadata as much flexibility as possible.

Textual analysis applied to semi-structured documents like HTML or XML provides limited insights if it completely ignores structure. But what if you could make deeper inferences by weighting the importance of textual analysis by relating it to element structure?

In terms of HTML, wouldn't it be nice if you could somehow score words based on where they appear inside a nested structure? For example:

  • Words that appear in <title> elements are more important.
  • Words that appear in <noscript> or <script> elements are less important.
  • Words that appear in <h1> and <h2> elements are more important.

To achieve this structure, add a fitness attribute to each word. This attribute performs a check to see whether the word shows up specifically in any of these elements. Listing 5 shows the added logic, which checks to determine whether the word is contained in any elements deemed important.

Listing 5. Add fitness to the XQuery program that generates your word frequency list
xquery version "1.0";
import module namespace http = "http://expath.org/ns/http-client";
let $content-url := 'http://en.wikipedia.org/wiki/Asteroid_impact_avoidance'
let $content-request := 
    <http:request href="{$content-url}" 
                     method="get" follow-redirect="true"/>
let $response := http:send-request($content-request)[2]
let $content := fn:string-join($response,' ')

let $corpus := for $w in tokenize($content, '\W+') return lower-case($w)
let $wordList := distinct-values($corpus)
return
<words> {
    for $w in $wordList

    let $fitness := if ( $response//*:title[contains(lower-case(.),$w)]) then 
    5 
    else if ($response//*:h1[contains(lower-case(.),$w)]) then
    4
    else if ($response//*:h2[contains(lower-case(.),$w)]) then
    3
    else if ($response//*:h3[contains(lower-case(.),$w)]) then
    2 
    else if ($response//*:noscript[contains(lower-case(.),$w)]) then
    -2
    else if ($response//*:script[contains(lower-case(.),$w)]) then
    -1
    else
    1

    let $freq := count($corpus[. eq $w])
    order by $freq descending
    return 

    if ($freq gt 4 and string-length($w) gt 3) then 
    <word word="{$w}" frequency="{$freq}" fitness="{$fitness}"/> 
    else ()
    }</words>

Now, you have a second metric that you can use to gather more information about the importance of a word:

  • <word word="asteroid" frequency="102" fitness="5"/> appeared in the <title> element.
  • <word word="deflect" frequency="11" fitness="3"/> appeared in an <h2> element.
  • <word word="false" frequency="7" fitness="-1"/> appeared in a <script> element, so you give it a negative fitness.

This fitness metric is simplistic because it might just happen that an important word somehow also appears in a <script> section, or it might be that a word that appears in a <title> element is not as important to the body of the document as your assumption. You can make additional improvements for scoring documents and generating more appropriate keywords, but let's move on to integrating some heavier-duty tools for performing text analysis.


Textual analysis using web services

Many commercial and open source tools are available that perform natural language processing (NLP). Here are some of most popular open source packages:

  • GATE. A natural language processing and engineering tool.
  • Apache Unstructured Information Management Architecture. Originally developed by IBM.
  • RapidMiner. Data and text mining software.
  • Carrot2. Text and search results framework (with clustering).

In addition, several web services provide useful textual analysis. The second half of this article focuses on how to use these services in your XQuery files. You use the EXPath HTTP Client library to access them.

Keyword extraction using YQL

YQL is a SQL-like language that lets you query data across a range of Yahoo! web services. Yahoo! is used to expose a lot of its data and services using a suite of web services; now, it uses different endpoints and methods for accessing these services through a single interface: YQL.

With YQL, you can now access data across the Internet through one simple language, eliminating the need to learn how to call different APIs. One such service is search.termextract, which extracts common terms from a set of textual content. You can try it out through the browser by using the online YQL console:

http://developer.yahoo.com/yql/console/
?q=select%20*%20from%20search.termextract%20where%20
context%3D%22Italian%20sculptors%20and%20painters%20of
%20the%20renaissance%20favored%20the%20Virgin%20Mary%20for%20inspiration%22

The operative YQL statement declares selecting from a table called search.termextract on text supplied from the context variable.

select * from search.termextract where context=

Click Test to generate resultant XML containing a <query/> element, with the results and some diagnostics as in Listing 6.

Listing 6. YQL result
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
    yahoo:count="5" yahoo:created="2010-12-05T14:36:25Z" yahoo:lang="en-US">
    <diagnostics>
         <publiclyCallable>true</publiclyCallable>
         <user-time>14</user-time>
         <service-time>11</service-time>
         <build-version>9962</build-version>
    </diagnostics> 
    <results>
         <Result xmlns="urn:yahoo:cate">italian sculptors</Result>
         <Result xmlns="urn:yahoo:cate">virgin mary</Result>
         <Result xmlns="urn:yahoo:cate">painters</Result>
         <Result xmlns="urn:yahoo:cate">renaissance</Result>
         <Result xmlns="urn:yahoo:cate">inspiration</Result>
    </results>
</query>

As it's easy to use the EXPath HTTP Client library from within XQuery, let's use it to access the YQL web service within your own content classification processes. Listing 7 shows how you can call this web service from within XQuery.

Listing 7. Access the YQL web service from XQuery
xquery version "1.0";
import module namespace http = "http://expath.org/ns/http-client";
let $content-url     := 'http://en.wikipedia.org/wiki/Asteroid_impact_avoidance'
let $content-request := 
       <http:request href="{$content-url}" 
                        method="get" follow-redirect="true"/>
let $response := http:send-request($content-request)[2]
let $content := fn:string-join(subsequence(($response//*:title,$response//*:p),1,10),' ')

let $query  := fn:concat("select * from search.termextract where context=",$content," ")
let $query  := 
    fn:encode-for-uri(
        fn:concat("select * from search.termextract where context='",$content,"'")
        )
let $yahoo-url     :='http://query.yahooapis.com/v1/public/yql?diagnostics=true&q='
let $term-extraction-url     := fn:concat($yahoo-url,$query) 
let $term-extraction-request := <http:request href="{$term-extraction-url}" method="get"/>
return
http:send-request($term-extraction-request)[2]

The above XQuery code takes care to encode your query string using the fn:encode-for-uri() function.

YQL analysis generates a much higher-quality set of terms, as Listing 8 shows.

Listing 8. YQL term results
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="20"
yahoo:created="2010-12-05T20:14:37Z" yahoo:lang="en-US">
<diagnostics>
       <publiclyCallable>true</publiclyCallable>
       <url execution-time="433"
       >http://search.yahooapis.com/ContentAnalysisService/V1/termExtraction
       </url>
       <javascript execution-time="436" instructions-used="0"
              table-name="search.termextract"/>
       <user-time>437</user-time>
       <service-time>433</service-time>
       <build-version>9962</build-version>
</diagnostics>
<results>
       <Result xmlns="urn:yahoo:cate">tertiary extinction event</Result>
       <Result xmlns="urn:yahoo:cate">shoemaker levy 9</Result>
       <Result xmlns="urn:yahoo:cate">spaceguard survey</Result>
       <Result xmlns="urn:yahoo:cate">near earth objects</Result>
       <Result xmlns="urn:yahoo:cate">period comet</Result>
       <Result xmlns="urn:yahoo:cate">nasa report</Result>
       <Result xmlns="urn:yahoo:cate">extinction level event</Result>
       <Result xmlns="urn:yahoo:cate">deep impact probe</Result>
       <Result xmlns="urn:yahoo:cate">inner solar system</Result>
       <Result xmlns="urn:yahoo:cate">mitigation strategies</Result>
       <Result xmlns="urn:yahoo:cate">65 million years</Result>
       <Result xmlns="urn:yahoo:cate">material composition</Result>
       <Result xmlns="urn:yahoo:cate">impact winter</Result>
       <Result xmlns="urn:yahoo:cate">chicxulub crater</Result>
       <Result xmlns="urn:yahoo:cate">impact speed</Result>
       <Result xmlns="urn:yahoo:cate">catastrophic impact</Result>
       <Result xmlns="urn:yahoo:cate">catastrophic damage</Result>
       <Result xmlns="urn:yahoo:cate">planetary defense</Result>
       <Result xmlns="urn:yahoo:cate">impact events</Result>
       <Result xmlns="urn:yahoo:cate">astronomical events</Result>
</results>
</query>

YQL also has limitations. For example, you must ensure that content passed to YQL does not go past request limits. Because these requests are sent as HTTP GET requests, they must be correctly encoded.


Textual analysis with AlchemyAPI

AlchemyAPI is a company that provides an interesting set of content analysis tools (see Resources). All of the company's tools are available as a suite of web services. In this article, you use their term and named entity extraction services to perform text analysis.

Keyword extraction with Alchemy

AlchemyAPI provides a web service for extracting topic keywords from any publicly accessible web page. Using a straightforward HTTP GET request, you access the AlchemyAPI web service, instruct it to retrieve a particular URL, and extract topic keywords. As a bonus, AlchemyAPI URL processing calls automatically fetch the desired web page, normalize and clean it (removing ads, navigation links, and other unimportant content), and extract topic keywords. Listing 9 shows how this is done.

Listing 9. URL for accessing the AlchemyAPI topic-extraction web service
http://access.alchemyapi.com/calls/url/URLGetRankedKeywords?
apikey=PLACE_YOUR_APIKEY_HERE&
    url=http://en.wikipedia.org/wiki/Asteroid_impact_avoidance

AlchemyAPI requires two URL parameters:

  • A URL on which to make the analysis
  • An apikey, which is required for any call made on the web service

You can obtain an AlchemyAPI apikey through a registration form from the AlchemyAPI site.

As AlchemyAPI gets the URL for you, calling the web service from XQuery is slightly simpler than the previous examples invoking YQL. Listing 10 shows the code.

Listing 10. XQuery generating keywords using AlchemyAPI
xquery version "1.0";
import module namespace http = "http://expath.org/ns/http-client";
let $url    := 'http://en.wikipedia.org/wiki/Asteroid_impact_avoidance'
let $apikey := 'PLACE_YOUR_APIKEY_HERE'
let $alchemey_uri := 'http://access.alchemyapi.com/calls/url/URLGetRankedKeywords?'
let $href    := fn:concat($alchemey_uri,'&apikey=',$apikey,'&url=',$url)
let $content-request := <http:request href="{$href}" method="get" follow-redirect="true"/>
return
http:send-request($content-request)[2]

Listing 11 shows the result containing keywords for your test web page.

Listing 11. Result from the topic-extraction web service
<results>
     <status>OK</status>
     <usage>By accessing AlchemyAPI or using information 
     generated by AlchemyAPI, you are agreeing to be bound by 
     the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html</usage>
     <url>http://en.wikipedia.org/wiki/Asteroid_impact_avoidance</url>
     <language>english</language>
     <keywords>
          <keyword>
               <text>asteroid</text>
               <relevance>0.983321</relevance>
          </keyword>
          <keyword>
               <text>NASA</text>
               <relevance>0.376168</relevance>
          </keyword>
          <keyword>
               <text>comet</text>
               <relevance>0.370371</relevance>
          </keyword>
          <keyword>
               <text>near-earth object</text>
               <relevance>0.363529</relevance>
          </keyword>
          <keyword>
               <text>survey program</text>
               <relevance>0.3417</relevance>
          </keyword>
     .... more keywords ....
     </keywords>
</results>

Because the keywords come with a relevance score (and a lot more relevant results), the output from the AlchemyAPI web service is better than YQL in terms of quality.

Entity extraction with AlchemyAPI

You can step up a level in sophistication by using the AlchemyAPI named entity extraction web service, which is capable of identifying people, companies, organizations, cities, geographic features, and other typed entities within your content. Some heavy-duty NLP occurs here to extract entities with meaning.

As with the topic keyword web service, all you have to do is supply an apikey and URL that contains the content you want to analyze, as in Listing 12.

Listing 12. URL for accessing the AlchemyAPI named entity extraction web service
http://access.alchemyapi.com/calls/url/URLGetRankedNamedEntities?
    apikey=PLACE_YOUR_APIKEY_HERE&
    url=http://en.wikipedia.org/wiki/Asteroid_impact_avoidance

You do exactly the same thing in terms of calling the web service from XQuery, as Listing 13 shows.

Listing 13. XQuery generating entities using AlchemyAPI
xquery version "1.0";
import module namespace http = "http://expath.org/ns/http-client";
let $url    := 'http://en.wikipedia.org/wiki/Asteroid_impact_avoidance'
let $apikey := 'PLACE_YOUR_APIKEY_HERE'
let $alchemey_uri := 'http://access.alchemyapi.com/calls/url/URLGetRankedNamedEntities?'
let $href    := fn:concat($alchemey_uri,'&apikey=',$apikey,'&url=',$url)
let $content-request := <http:request href="{$href}" method="get" follow-redirect="true"/>
return
http:send-request($content-request)[2]

The result of the textual analysis is quite lengthy and, as Listing 14 shows, compelling.

Listing 14. Result from named entity extraction web service
<results>
    <status>OK</status>
    <usage>By accessing AlchemyAPI or using information generated by AlchemyAPI, 
    you are agreeing to be bound by the AlchemyAPI Terms of 
    Use: http://www.alchemyapi.com/company/terms.html</usage>
    <url>http://en.wikipedia.org/wiki/Asteroid_impact_avoidance</url>
    <language>english</language>
    <entities>
         <entity>
              <type>GeographicFeature</type>
              <relevance>0.667231</relevance>
              <count>44</count>
              <text>Earth</text>
         </entity>
         <entity>
              <type>Organization</type>
              <relevance>0.472053</relevance>
              <count>25</count>
              <text>NASA</text>
              <disambiguated>
                   <name>NASA</name>
                   <subType>Company</subType>
                   <subType>GovernmentAgency</subType>
                   <subType>AirportOperator</subType>
                   <subType>AwardPresentingOrganization</subType>
                   <subType>SoftwareDeveloper</subType>
                   <subType>SpaceAgency</subType>
                   <subType>SpacecraftManufacturer</subType>
                   <geo>38.88305555555556 -77.01638888888888</geo>
                   <website>http://www.nasa.gov/home/index.html</website>
                   <dbpedia>http://dbpedia.org/resource/NASA</dbpedia>
                   <umbel>http://umbel.org/umbel/ne/wikipedia/NASA</umbel>
                   <yago>http://mpii.de/yago/resource/NASA</yago>
              </disambiguated>
         </entity>

         .... entities ....
    </entities>
</results>

The AlchemyAPI named entity extraction web service has identified all kinds of things. For example, it knows that:

  • Earth is a geographical feature.
  • NASA is an organization and provides several related links.
  • The United States is a country.
  • Representative George E. Brown is a person and identifies him as a politician.

In this sense, textual mining almost seems magical with respect to what can be gleaned from the content, but it's best to keep an eye on the relevance scoring. No system is 100 percent accurate, and you will find that certain content responds better than others for textual analysis.


Conclusion

This article covers a number of techniques for beginning to classify your own documents. The first attempts were focused on how to build your own XQuery text-mining techniques based on determining word frequencies. I then showed you how to integrate powerful external web services, provided by Yahoo! and AlchemyAPI, for text analysis.

Clearly, text analysis that the web services provided were higher in quality, but even with primitive word frequency XQuery examples, it's possible to use pure XQuery to get useful inferences from your data.

All the methods presented have some limitations. For example, only one document was analyzed. Performing textual analysis across a set of related documents can result in higher-quality categorization, as you can cross-reference from a larger corpus and glean deeper relations between documents. Overall, I hope that this article has shown you how powerful XQuery is for automating content categorization, and I would love to hear your feedback on your own attempts to apply XQuery in the same manner.


Download

DescriptionNameSize
Sample scripts for this articlecontent_catigorisation_src.zip20KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Web development
ArticleID=642279
ArticleTitle=Classify content with XQuery
publish-date=03222011