Before explaining the buzzword in the title of this article, it's worth explaining another buzzword: blogosphere.
The term blogosphere is used by journalists and computer geeks alike. It refers to a specific subset of Web pages in which owners of those Web pages (hereafter, bloggers) express their ideas, thoughts, passions, random musings, and links to other Web pages. The root word, blog, is a concatenation of the expression "Web log."
Some Web sites actually enable people who are not tech savvy to host their own, albeit modest, blogs. WordPress enables bloggers who are not software developers by trade to create some reasonably sophisticated blogs through the use of widgets, themes, and templates. The result has been an explosion of bloggers and blogging in general. As of last year, blogherald.com reported close to 200 million blogs worldwide. As of this writing, the blogosphere is the primary source of information about news and events that occur in many countries.
With so much unique information contained in the blogosphere, one is tempted to ask: Where is this information is cataloged, tracked, tagged, and available for search?
Enter Technorati. In its own words: "Technorati collects, organizes, and distributes the global online conversation." You can think of it as a Google specifically for the blogosphere. Or, as Time magazine puts it: "If Google is the Web's reference library, Technorati is becoming its coffee house." (See Resources for links to Technorati Media and the Time article.)
You can visit Technorati at http://technorati.com. You'll see a pretty search bar in shamrock-green at the top that reads Search the blogosphere... inside. Click in that box and type Obama. Then click the magnifying glass next to it. You'll quickly see the featured blog articles that discuss the President of the United States.
Feel free to search the blogosphere all you want using the Technorati Web page. However, as a Web application developer, you might want to automate that search or enable your Web page visitors to view information retrieved from the blogosphere based on their own search criteria.
To make that happen, use the Technorati API. Like many APIs on the Internet, the Technorati API uses REST.
REST is an acronym for Representational State Transfer. The full explanation of everything entailed in a proper REST definition is outside of the scope of this article; however, it is available elsewhere on IBM developerWorks (see the links provided in Resources). For the subject covered here, it is sufficient to state that REST enables developers to access information and resources using a simple HTTP invocation.
Think of REST this way: To obtain domain-specific data, you simply point a URL to a specific location. For the purposes of this article, that's really all it is. You can also think of it as a simplified Web service, but if you say that too loudly around the wrong people, you might find yourself in the middle of a debate.
In reference to the subject at hand, the Technorati API is a REST service that enables users to point to a specific URL and retrieve a variety of articles from the blogosphere that meet the criteria specified in the URL. This enables you, as a developer, to accept input within a Web application and dynamically query the blogosphere based on that input using a simple URL that encodes the input into a format the API understands.
Getting started: A simple example
Consider the example in Listing 1:
Listing 1. A simple search
http://api.technorati.com/search?key=xxxx&query=Obama |
This is a fairly simple URL with only two request parameters.
Note that the actual Technorati API function is the word that follows the final slash (search). This indicates, unsurprisingly, that this REST invocation will perform a search against the blogosphere.
The first parameter is the key. The actual key used varies from user to user and is not
really the xxxx character string. To obtain the key that you will use, you need to register with Technorati and request a key.
Fortunately this is easy and free. Unfortunately, this means that you cannot simply copy
and paste the URLs from this article into a browser and see the results. You have to
substitute your own key for this xxxx string.
The second request parameter is the actual query. Just like in the manual example, the
search uses the keyword Obama.
After you substitute your own key for the xxxx string, you can then plug that URL into a Web browser and see what results are returned. Your results will vary depending on your Web browser brand and version. Whatever the results on the screen, it's best to right click on the page and select View Source to view the actual XML that is returned.
While the actual contents will also vary based on when your query is executed, the results should resemble Listing 2.
Listing 2. Output from a simple search (partial output)
<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Technorati API version 1.0 /search" -->
<!DOCTYPE tapi PUBLIC "-//Technorati, Inc.//DTD TAPI 0.02//EN"
"http://api.technorati.com/dtd/tapi-002.xml">
<tapi version="1.0">
<document>
<result>
<query>Obama</query>
<querycount>2270581</querycount>
<rankingstart></rankingstart>
</result>
<item>
<weblog>
<name>Critica Pura</name>
<url>http://criticapura.com</url>
<rssurl>http://criticapura.com/feed/</rssurl>
<atomurl></atomurl>
<inboundblogs>7</inboundblogs>
<inboundlinks>10</inboundlinks>
<lastupdate>2009-06-21 17:13:23 GMT</lastupdate>
</weblog>
<title>Jib Jab Obama</title>
<excerpt>Try JibJab Sendables</excerpt>
<created>2009-06-21 17:13:23 GMT</created>
<permalink>http://criticapura.com/2009/06/jib-jab-obama/</permalink>
</item>
...
|
Interestingly enough, the first query result as of this writing is a foreign language blog entry (at least, foreign to those who speak English).
The result element provides metadata information about the query results. The query child provides the actual query keyword. The querycount child provides the number of articles from the blogosphere
that matched the query.
Many item elements follow the result element. Each item element corresponds to a blog article that matched the search criteria.
The weblog element provides information about the blog itself. This is information about the entire blog as opposed to just the article that matched the criteria. Table 1 describes the weblog child elements.
Table 1.
weblog child elements| Element | Description |
|---|---|
name | Actual name of the blog itself |
url | URL of the blog |
rssurl | URL of the Really Simple Syndication (RSS) feed for that blog |
atomurl | URL of the Atom feed for that blog |
inboundblogs | Number of blogs that link to that blog |
inboundlinks | Number of external sites that link back to that blog |
lastupdate | Date and time the blog was last updated |
The elements described in Table 2 are children of item as opposed to weblog. These children refer to the article itself.
Table 2.
item child elements| Element | Description |
|---|---|
title | Actual title of the blog article |
excerpt | Synopsis of the blog article |
created | Date and time the article was written |
permalink | URL for the blog article |
Basic Technorati API functions
Although the Technorati API provides a powerful search function, it's worth noting that the API also provides other functions you might find useful as well.
The cosmos function is not at all intuitively named. It allows you to search for blogs linking to a base URL. Suppose, for example, you want to find all blogs that link back to a blog article
found on the following URL: http://nicole-rensmann.bookola.de/blog. For that, you would invoke the following REST invocation: http://api.technorati.com/cosmos?key=xxxx&url=http://nicole-rensmann.bookola.de/blog.
If you plug
that into your browser (allowing for the usual substitution with the key), you should get something similar to Listing 3.
Listing 3. Output from a
cosmos function (abbreviated)
<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Technorati API version 1.0" -->
<!DOCTYPE tapi PUBLIC "-//Technorati, Inc.//DTD TAPI 0.02//EN"
"http://api.technorati.com/dtd/tapi-002.xml">
<tapi version="1.0">
<document>
<result>
<url>http://nicole-rensmann.bookola.de/blog</url>
<weblog>
<name>Nicole Rensmanns kleine Welt</name>
<url>http://nicole-rensmann.bookola.de/blog</url>
<rssurl>http://nicole-rensmann.bookola.de/blog/?feed=rss2</rssurl>
<atomurl>http://nicole-rensmann.bookola.de/blog/?feed=atom</atomurl>
<inboundblogs>6</inboundblogs>
<inboundlinks>6</inboundlinks>
<lastupdate>2009-06-21 17:10:52 GMT</lastupdate>
<rank>575630</rank>
</weblog>
<inboundlinks>7</inboundlinks>
<rankingstart>1</rankingstart>
</result>
<item>
<weblog>
<name>Das Datenschutz-Blog</name>
<url>http://www.datenschutzbeauftragter-online.de</url>
<rssurl>http://www.datenschutzbeauftragter-online.de/feed/</rssurl>
<atomurl>http://www.datenschutzbeauftragter-online.de/feed/atom/</atomurl>
<inboundblogs>83</inboundblogs>
<inboundlinks>343</inboundlinks>
<lastupdate>2009-06-20 07:22:20 GMT</lastupdate>
</weblog>
<nearestpermalink>http://www.datenschutzbeauftragter-online.de</nearestpermalink>
<title>Uberblick zum Thema Netzsperren</title>
<excerpt>der Ursula von der Leyen Sachliche Debatte uber das Thema</excerpt>
<linkcreated>2009-05-11 04:20:01 GMT</linkcreated>
<linkurl>http://nicole-rensmann.bookola.de/blog/?p=3293</linkurl>
</item>
...
|
The output in XML format looks strikingly similar to what you saw in Listing 2 with some notable exceptions. The weblog element here provides information about the blog with the inbound links. Note that
the url child element directly corresponds to the url request parameter.
Once again, you see several item elements. Each one of these item elements contains information about the blog linking back to the blog you queried for.
The tag function allows you to search for blog articles with a particular tag. Technorati uses tags to categorize blog articles. Blog article authors are allowed to place one or more tags associating their
articles with particular subject matter.
To search the blogosphere for articles about fishing, you use the following URL: http://api.technorati.com/tag?key=xxxx&tag=fishing. Again, you need to use your own API key in lieu of the xxxx in
the URL. If you plug that into your browser, you should see something similar to Listing 4.
Listing 4. Output from a
tag function (abbreviated)<?xml version="1.0" encoding="utf-8"?> <!-- generator="Technorati API version 1.0" --> <!DOCTYPE tapi PUBLIC "-//Technorati, Inc.//DTD TAPI 0.02//EN" "http://api.technorati.com/dtd/tapi-002.xml"> <tapi version="1.0"> <document> <result> <query>fishing</query> <postsmatched>43655</postsmatched> <blogsmatched></blogsmatched> <start>1</start> <limit>20</limit> <querytime>3.126</querytime> </result> <item> <weblog> <name>Travel and Leisure Articles</name> <url>http://www.toptravelarticles.com</url> <rssurl>http://www.toptravelarticles.com/feed</rssurl> <atomurl>http://www.toptravelarticles.com/feed/atom</atomurl> <inboundlinks>40</inboundlinks> <inboundblogs>19</inboundblogs> <lastupdate>2009-06-21 17:06:01</lastupdate> <hasphoto></hasphoto> </weblog> <title>Visiting Ghana?</title> <excerpt>If you want to experience the culture up close</excerpt> <created>2009-06-21 17:06:01</created> <postupdate>2009-06-21 17:06:01</postupdate> <permalink>http://www.toptravelarticles.com/visiting-ghana.html</permalink> </item> ... |
Again, the output resembles what you saw before with other Technorati API functions.
The basic difference is that, in this case, you see blog articles tagged with fishing.
One particularly interesting Technorati API function is toptags. The toptags function displays the
most popular tags in use when the function is executed. Plug the following
URL (making the usual key substitution) into your browser: http://api.technorati.com/toptags?key=xxxx. You will see something similar to Listing 5.
Listing 5. Output from a
toptags function (abbreviated)<?xml version="1.0" encoding="utf-8"?> <!-- generator="Technorati API version 1.0 /topptags" --> <!DOCTYPE tapi PUBLIC "-//Technorati, Inc.//DTD TAPI 0.02//EN" "http://api.technorati.com/dtd/tapi-002.xml"> <tapi version="1.0"> <document> <result> <limit>20</limit> </result> <item> <tag>Weblog</tag> <posts>9578863</posts> </item> <item> <tag>Life</tag> <posts>7355121</posts> </item> <item> <tag>News</tag> <posts>4638644</posts> </item> ... |
The output here is easy to parse. Each tag is listed, and the number of blog articles containing that tag is listed in the next element.
Technorati is a Web site that maintains information about articles published in the blogosphere. Using Technorati you can query for blog articles based on a specific set of criteria.
In compliance with unwritten rules of the information superhighway, Technorati also provides an API so people can programmatically search for blog articles based on a specific set of criteria. The API operates using a REST invocation.
Using the Technorati REST API enables Web application developers to automate blog searches. Developers can implement it so that their Web application users can search the blogosphere for articles that match their own specific interests.
Learn
- Technorati API Documentation: Learn more about the Technorati API.
- RESTful Web services: The basics (Alex Rodriguez, developerWorks, November 2008): Learn about the basic principles of REST in an excellent overview.
- Technorati tags: In this blog, get a good overview and practical introduction to the basics of Technorati tags and how to use them in your own blog posts.
- How Many Blogs Are There? Is Someone Still Counting? (Anne Helmond, the Blog Herald, November 2008): Read about the number of blogs on the Internet.
- Technorati Media: Read about Technorati, founded as the first blog search engine, and what it offers.
- Searchlight for the Blogosphere (Jeremy Caplan, Time, December 2006): Read the article that compares Technorati to a coffee house for the Web where you can find out who is saying what.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks
podcasts: Listen to interesting interviews and discussions for software developers.
Get products and technologies
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- XML zone discussion forums: Participate in any of several XML-related discussions.
- developerWorks blogs: Check out these blogs and get involved in the developerWorks community.





