Using the Twitter Search API

Create automated tweet searches

Twitter is undoubtedly one of the most recent and successful examples of social networking to appear on the World Wide Web. Twitter also has its own search engine, which enables users to search for "tweets" by keyword or category, with an API to facilitate programmatic searches, act as a REST service, and return searches in Atom format. Discover the basics of using the Twitter Search API.

Brian M. Carey, Information Systems Consultant, Triangle Information Solutions

Photo of Brian CareyBrian Carey is an information systems consultant who specializes in the architecture, design, and implementation of Java enterprise applications. You can follow Brian on Twitter at http://twitter.com/brianmcarey, and his tweets are publicly available.



04 August 2009

Also available in Japanese Portuguese

Twitter is a Web-based social media site that lets you communicate with followers through stories known as tweets through the Twitter GUI. Tweets are limited to a maximum of 140 characters, a limitation based on the state of mobile devices at the time Twitter was developed. But it is a welcome enforcement, as it prevents unnecessary spam and verbal clutter within a single tweet.

Frequently used acronyms

  • API: Application programming interface
  • GUI: Graphical user interface
  • HTTP: Hypertext Transfer Protocol
  • REST: Representational State Transfer
  • RSS: Really Simple Syndication
  • URL: Uniform Resource Locator

Now that you are familiar with Twitter, it's time to move to the next level by familiarizing yourself with Twitter Search.

As I said, Twitter is an online organization filled with tweets, or brief statements that users make to their followers. With that in mind, wouldn't it be great if you could find a bunch of tweets related to a specific subject?

Good news. With Twitter Search, you can. You can search by keywords, topic, author, language, and a variety of other criteria. Head on over to http://search.twitter.com to see this in action. Type a keyword that you want to search for (for example, Java™), and then click Search. Voilà! A series of tweets, newest to oldest, appears on the screen.

But how can you search by topic instead of by keyword? Keep in mind that tweets specific to a particular topic contain the topic name preceded by the hash symbol/pound sign (#). For example, a Star Trek enthusiast might tweet something about the new movie and within that tweet include #startrek to let people know that this particular tweet is about Star Trek.

To search for tweets by topic, simply include the topic name (including the hash symbol/pound sign) in your keyword search. Following the previous example, simply go to the Twitter Search page, type #startrek, then click Search. You will see a list of tweets specific to Star Trek.

The Twitter Search API

The Twitter Search API is great for manual searches as a user. But wouldn't it be great if you, as an outstanding software developer, could programmatically search for tweets based on topic or keyword?

More fabulous news: You can.

Like many other great Web applications, Twitter Search provides a REST API so that you can search for tweets in an automated fashion. Before delving too fully into the API, however, it is probably best to cover the REST concept first for those unfamiliar with it.

What is REST?

REST, for purposes of this article, enables developers to access information and resources using a simple HTTP invocation. Think of REST this way: You can obtain domain-specific data simply by pointing a URL to a specific location. You can also think of it as a simplified Web service, but if you say that too loudly around the wrong people, you might find yourself in the middle of a debate.

So, the Twitter Search API is a REST service that enables users to point to a specific URL and retrieve a variety of tweets that meet the criteria specified in the URL. This enables you, as a developer, to accept input within a Web application and dynamically query Twitter based on that input, using a simple URL that encodes the input into a format that the API understands.

Getting started: A simple example

Consider the example in Listing 1.

Listing 1. A simple example of a Twitter search
http://search.twitter.com/search.atom?q=java

This query is very easy to parse. The domain is intuitive: search.twitter.com. This is where the Search API resides. After the first slash is the service that you are executing—in this case, the word search. It might seem peculiar to even require the word search here, as the word is already in the domain name, but this is an attempt by the good folks at Twitter to keep things consistent with the basic Twitter API, which uses numerous functions.

Following search is an .atom extension. This simply means that the results of the search will be returned in Atom format. Additional formats that you can use are RSS (.rss) and JavaScript Object Notation (JSON—.json).

Next comes the only request parameter, q, which is short for query. And following that is the value of that parameters—in this case, java.

To summarize, the URL in the code example above tells the Twitter Search API to search for all recent tweets containing the word java (case is insensitive) and return the results in Atom format.

Parsing the output

Now, point your browser to the address in Listing 1. The actual output returned to your screen will vary depending on which browser you use and which version it is. To keep things consistent when you view the source, right-click the screen, and then click View Source. You should see something similar to Listing 2.

Listing 2. Output from a simple search (partial output)
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:google="http://base.google.com/ns/1.0" xml:lang="en-US" 
xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" 
xmlns="http://www.w3.org/2005/Atom" 
xmlns:twitter="http://api.twitter.com/">
  <id>tag:search.twitter.com,2005:search/java</id>
  <link type="text/html" rel="alternate" 
href="http://search.twitter.com/search?q=java"/>
  <link type="application/atom+xml" rel="self" 
href="http://search.twitter.com/search.atom?q=java"/>
  <title>java - Twitter Search</title>
  <link type="application/opensearchdescription+xml" 
rel="search" href="http://search.twitter.com/opensearch.xml"/>
  <link type="application/atom+xml" rel="refresh" 
href="http://search.twitter.com/search.atom?q=java&since_id=1990561514"/>
  <twitter:warning>since_id removed for pagination.</twitter:warning>
  <updated>2009-06-01T12:11:26Z</updated>
  <openSearch:itemsPerPage>15</openSearch:itemsPerPage>
  <link type="application/atom+xml" rel="next" 
href="http://search.twitter.com/search.atom?max_id=1990561514&page=2&q=java"/>
  <entry>
    <id>tag:search.twitter.com,2005:1990561514</id>
    <published>2009-06-01T12:11:26Z</published>
    <link type="text/html" rel="alternate" 
href="http://twitter.com/GailR/statuses/1990561514"/>
    <title>D/L latest upgrade for Google's Chrome 
Browser & like it. Faster, esp w Java</title>
    <content type="html">D/L latest upgrade for Google's Chrome 
Browser &amp; like it. Faster, esp w <b>Java</b></content>
    <updated>2009-06-01T12:11:26Z</updated>
    <twitter:source><a href="http://twitter.com/">web</a></twitter:source>
    <twitter:lang>en</twitter:lang>
    <author>
      <name>GailR (Gail R)</name>
      <uri>http://twitter.com/GailR</uri>
    </author>
  </entry>
...
]>

Note: Your output will look totally different in content but identical in structure. This because I ran my search at a completely different time than you are running yours, so the recent tweets for me will be different than the recent tweets for you. Recall that the default search sorts by tweets newest to oldest.

Here's a breakdown of the code:

  • Note that the root element is feed. This is standard according to the Atom specification (see Resources for links to more information about Atom). The namespace that Twitter uses is http://www.w3.org/2005/Atom, as specified as an attribute in the root element.
  • The title element provides a synopsis of the query—useful if you are simply parsing the output but might not have been the one who created the query.
  • The link elements provide the URL for the query itself. You can plug those into your browser and get the same results.
  • The entry stanza represents a tweet. Although for the sake of brevity only one is listed, in reality, there will be many of these in your output. Notice that title and content are the same in actual content. This is because tweets have no titles, so it makes sense that the title is the actual tweet itself.

    Recall that Atom is designed for article-type documents, which usually have a headline, then a main body. Because that is not the case with tweets, the two elements contain identical content.

  • The id element is required by Atom and is a globally unique identifier (GUID) for this particular tweet. All tweets across the universe of Twitter will have unique IDs so they can be referenced individually.
  • The published and updated date and times are also identical. This makes sense, because the tweet was never updated.
  • The first link element provides a link to this single tweet. Paste http://twitter.com/GailR/statuses/1990561514 into a browser, and you'll see that same tweet.
  • The source element (specified as twitter:source because it is in the twitter namespace) provides a link to the source of the information that the API returned. In this case, the source is Twitter itself, and the source is encoded as a URL link.
  • The lang element (specified as twitter:lang because it is in the twitter namespace) provides a reference to the language used in this tweet. In this case, the tweet is in English, so the International Organization for Standardization (ISO) 639-1 code of en is used.
  • The author stanza provides information about the Twitter user.

Crafting more complex searches

The example provided so far is fairly rudimentary: It is a search for one word only. However, the Twitter Search API provides a powerful set of criteria parameters and supports complex queries.

Suppose, for example, you want to search for tweets directed to a specific user. In tweet language, users direct their tweets to a specific user by prepending an at sign (@) to the user's screen name (for example, @johnqpublic). For searching purposes, you can disregard the @ sign and simply search for a tweet directed at a specific user. See Listing 3.

Listing 3. Searching for tweets directed to a specific user
http://search.twitter.com/search.atom?q=to%3Ajohnqpublic

Note the %3A in the middle of the URL, just in front of the user's name: that is the URL encoding for a colon (:). It follows the to prefix, so you can read it as follows: to:johnqpublic.

If you want to search for tweets from a particular user instead of to a particular user, simply substitute the word from for to in Listing 3.

If you want to search for a specific topic, you simply need to encode the hash tag/pound sign for the URL. That code is %23, so an API search for #startrek would look like Listing 4.

Listing 4. Searching for tweets by topic
http://search.twitter.com/search.atom?q=%23startrek

Note that, like so many other search engines, you can use AND and OR within your Twitter searches. This is accomplished by placing +AND+ and +OR+ between the query values, respectively. For an example, see Listing 5, which returns all recent tweets containing either#startrek or #americanidol.

Listing 5. Searching for tweets containing either "#startrek" or "#americanidol"
http://search.twitter.com/search.atom?q=%23startrek+OR+%23americanidol

You also have the option of specifying the lang parameter so that your query only returns results in a specific language. The value of the lang parameter must match one of the language codes included in the ISO 639-1 specification. See an example in Listing 6.

Listing 6. Searching for Star Trek tweets in English
http://search.twitter.com/search.atom?lang=en&q=%23startrek

You can also restrict the search results based on date. Use the parameters since and until to return tweets no older than a certain date or no later than a certain date, respectively. An example is found in Listing 7.

Listing 7. Searching for Star Trek tweets since 1 May 2009
http://search.twitter.com/search.atom?q=%23startrek&since=2009-05-01

Conclusion

Twitter is a social networking phenomenon that facilitates microblogging among interested parties. It has skyrocketed in popularity just over the past year as everyone from postal workers to celebrities find themselves tweeting on a regular basis.

In compliance with unwritten rules of the information superhighway, Twitter also provides a Search utility so that people can search for tweets based on a specific set of criteria. The search utility enables searches to be performed either manually using a Web page (the same way many people use Google, for example) or through a REST invocation.

Using the Twitter Search API enables Web application developers to automate Twitter searches. Developers can use it to display up-to-the-minute, content-specific tweets within their own (or their client's) Web applications. It is an outstanding utility for those interested in gleaning even more domain-specific information from the Internet.

Resources

Learn

Get products and technologies

  • The Twitter site: Explore the Twitter service. Try it and be connected with friends, family, and co–workers as you exchange short messages about what you do.
  • IBM product evaluation versions: Get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=417562
ArticleTitle=Using the Twitter Search API
publish-date=08042009