The reason people go for an operating system like Linux is the sum of its parts—its total usefulness. It is stable, affordable, fast, and runs on all kinds of hardware. It is also extremely flexible right out of the box, largely because of its powerful command-line interface (CLI), or shell.
This article puts two such tools—GNU Wget and cURL—in the spotlight. You learn how to use these two tools to send status updates to the social networking site Twitter without the use of a Twitter desktop application, and how to follow feeds from both Twitter and FriendFeed right from the command line.
Need API details? This article does not delve into the specifics of API use. Both Twitter and FriendFeed have such an API, which is easily accessible through a Representational State Transfer (REST)-ful interface.
The history of GNU Wget
GNU Wget is a flexible piece of software that retrieves data (such as files, mp3s, and images) from servers. Its non-interactive, robust, and recursive nature makes it extremely versatile, and it's mostly used in crawling Web sites for content or offline reading of HTML files. (Links in an HTML page are adjusted automatically to support this functionality.)
For example, to retrieve the page found at a particular URL, use this command:
This command downloads the Wikipedia home page found at that URL onto your computer with the file name index.html, because that's the page GNU Wget found. The tool doesn't follow any links found on that page, but it's easy enough to have it do so:
wget –r http://wikipedia.org/
In this command, the
-r switch tells GNU Wget to
recursively follow all links found on that page, so the tool will crawl
the entire site. You wouldn't want to use this switch for a site like
Wikipedia, however, because you could end up downloading their entire
database for easy local access, and that could take a very long time
depending on available bandwidth. But you get the point.
The history of cURL
Client URL (cURL) fills a different niche from GNU Wget: It was designed primarily to feed currency exchange rates into Internet Relay Chat (IRC) environments. cURL is a power tool for performing URL manipulations and for transferring files with URL syntax, which means that you can transfer most types of files over HTTP, HTTPS, FTP, FTPS, and most other protocols.
The cURL application is most used for Web scraping and automating Web site
interactions such as form submissions (either using
commands). For example, the command:
outputs the result of the request to your terminal window. In essence, cURL does the same in this case as your browser, only your browser renders the result, and cURL just spits out whatever it has found, which in many cases is HTML but can be anything.
Note: To see the request that cURL makes, add the
-v switch (for verbose output), which makes the
request but also returns any HTTP request that cURL makes to fetch the
With that background out of the way, let's move on to more ambitious tasks.
Adding a tweet using GNU Wget and cURL
Twitter is a social networking and micro-blogging service that allows you answer the question, "What are you doing?" by sending short text messages (140 characters in length), called tweets, to your friends, or followers. To help you better understand the power of GNU Wget and cURL, let's start by using them to add tweets to the Twitter timeline. There are a couple of ways to add tweets: You could use either the Web site or a client application such as GtkTwitter, Spaz, or twhirl, which is actually an Adobe® Air application.
You can script your own full-fledged Twitter client, which in turn would make it possible to automate tasks like twittering your current system usage or availability (for example, with a message such as "server@servername is currently experiencing heavy load"). You could also script an automated notification system. The possibilities are endless.
To see how this technology works, from the command line, type:
wget --keep-session-cookies --http-user=youremail --http-password=yourpassw \ --post-data="status=hello from the linux commandline" \ http://twitter.com:80/statuses/update.xml
This code might look a bit daunting if you haven't used the command-line interface much. But don't worry: It's actually logical in format. Let's look at the elements of the command:
wgetruns the GNU Wget application.
--keep-session-cookiessaves the session cookies instead of keeping them in memory, which is useful on sites that require access to other pages.
--http-userrepresents your user name.
--http-passwordis your password.
--post-datais the data you send to Twitter on which you will perform an action.
status=tells you that this is a status update.
You can perform the same task using cURL. To do so, type:
curl -u youremail:yourpassw -d status=”text” http://twitter.com/statuses/update.xml
This command does basically the same thing as the previous
wget command but with a slightly different and
friendlier syntax. The difference between the two applications in this
case is the way they behave by default.
Doing things the way I describe here using GNU Wget forces the download of a file called update.xml to your local machine. This download can be useful, but it's hardly necessary. In contrast, cURL sends the resulting output to standard output (stdout).
Finding the Twitter public timeline
Before you can access the Twitter public timeline, you must find it. In other words, you must find the endpoint you'll be using to access the public feed on Twitter. (See Resources later in this article for links to information about the Twitter API.) The most common and easiest-to-use endpoint is the public timeline, which you can access from http://twitter.com/statuses/public_timeline.rss. The endpoint for the FriendFeed public timeline resides in the Google code repository (again, see Resources below for a link).
The FriendFeed API takes simple
POST requests. For simplicity,
you'll work with the public endpoint, as well, which is available
You'll work with the XML later.
Accessing the Twitter public timeline
So, now that you have the Twitter public timeline endpoint, how do you access it?
Type the following address in your browser, or better yet, use
curl from the command line:
Now, you might have noticed from the result and from the way the endpoint is built up that you are looking at RSS-formatted output. By peering into the API documentation, you can see that other formats are available as well. By changing the file name extension to either .xml or .json, you can change the output format.
grep command, you can filter the
result and retrieve just the parameters you want:
curl http://twitter.com/statuses/public_timeline.xml | grep 'text'
Examine the output: You need whatever is between the
<text> tags. However, if you want
to get rid of the tags surrounding the tweets, you can use the
sed command. (Details on the
sed command are beyond the scope of this
article, but for more information about this amazing tool, see
curl http://twitter.com/statuses/public_timeline.xml | sed -ne '/<text/s<\/*text>//gp'
Now, to get rid of the progress meter, which adds unnecessary information
to the timeline, add the
curl -s http://twitter.com/statuses/public_timeline.xml | sed -ne '/<text/s<\/*text>//gp'
Finding the FriendFeed public timeline
You've used cURL to get the public timeline for Twitter. Now, you want to do the same thing for FriendFeed. In this case, the FriendFeed API endpoint for the public feed is http://friendfeed.com/api/feed/public?format=xml. However, following the public feed for FriendFeed is like following water drops in a river, so narrow the scope a bit to just your friends' feeds.
Look again at the API documentation. It takes a bit of searching, but you're are looking for the home feed, which is http://friendfeed.com/api/feed/home. Of course, you must authenticate this feed, and you need to sign on before feed/home knows who you are. Luckily, cURL makes this process easy with the authentication option:
But you don't use your user name and password in FriendFeed. Instead, the site requires a nickname and authentication remote key. So, you must go to the FriendFeed site at http://friendfeed.com/account/api and get them. After going to that URL, log in and get your nickname and remote key.
With your nickname and remote key pair, issue the command:
curl -u "nickname:key" http://friendfeed.com/api/feed/home
where nickname:key is your nickname and key.
Notation (JSON). To get XML, you must add the
format parameter. Because this is a
get request, you can just add it to the end of
curl -u "nickname:key" http://friendfeed.com/api/feed/home?format=xml
Parsing the output
So, from parsing the Twitter feed, you know that you need to pipe it
sed first to get a real, legible
result. XML is read easily enough, but after examining the result, you
conclude that you need to parse everything between tags. However, there's
a snag. The XML doesn't contain any new line or CR codes, so it's just one
big long string of XML.
How would you parse it, then? You must choose a different output format. The available formats are JSON, XML , RSS, or Atom. For this example, go for RSS, because it's the cleanest and contains the line feeds you need.
Examine the result from the RSS feed. You know that you need whatever is
between tags, so pipe the output through a modified
curl -s -u "nickname:key" http://friendfeed.com/api/feed/home?format=rss | sed -ne '/<ff:body/s/<\/*ff:body>//gp'
There you have it! All the entries from your FriendFeed.
Putting it together
Running the commands from the command line by hand is not really the way to follow the feeds.
After all, you can do that by pressing the F5 key on the sites themselves. So, to keep it as close to the command line as possible, script it using shell script. You could of course use Python, Perl, or any of the scripting languages available on the platform, but running things from the command line provides a fitting end to the example.
You script the Twitter stream by creating a script aptly named lintweet. Of course, you're free to use whatever name you choose. Listing 1 shows this script.
Listing 1. Lintweet.sh
!/bin/bash while : do curl -s http://twitter.com/statuses/public_timeline.xml | sed -ne '/<text/s<\/*text>//gp' sleep 10 done exit Next, make this script executable. Then, run it using the command: ./lintweet
Every 10 seconds, the window is updated with the latest tweets. Because, in
the case of Twitter, the terms of service (TOS) don't limit the rate at
which the public feed is hit, you could update this setting every second
sleep to 1. But you should always be
nice to servers, so leave it set at 10. (There really isn't much you can
follow if you were to set sleep to 1 anyway, because the result would be a
fast-flowing river of updates.)
Where to go from here
Now you know how to use two tools available on most Linux distributions—cURL and GNU Wget—to retrieve tweets from the Linux command line. You can also follow feeds from Twitter and FriendFeed manually and by using a simple shell script.
You can extend the shell script by filtering for certain keywords, so that it shows only the status updates that have certain words or phrases in them. Or, you can save the script in a file for easy retrieval of archived Twitter and FriendFeed updates. You can even automate Twitter updates by hooking your script up to a notification system like Growl, if you're running Mac OS X (see Resources). The possibilities are endless.
- Learn more at the cURL site and GNU Wget site.
- Find more information about Twitter and FriendFeed.
Get more information and a tutorial
- In "Create fancy on-screen displays with Ghosd and Perl" (developerWorks, February 2007), learn now to set up a notification system.
- In the developerWorks Linux zone, find more resources for Linux developers (including developers who are new to Linux), and scan our most popular articles and tutorials.
- See all Linux tips and Linux tutorials on developerWorks.
- Stay current with developerWorks technical events and Webcasts.
Get products and technologies
- Download the Twitter API.
- Download the FriendFeed API.
- With IBM trial software, available for download directly from developerWorks, build your next development project on Linux.
- Get involved in the developerWorks community through blogs, forums, podcasts, and spaces.
Dig deeper into Linux on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Keep up with the best and latest technical info to help you tackle your development challenges.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.