Contents


Conversing through the Internet with cURL and libcurl

Using libcurl with C and Python

Comments

Development of applications that rely on application-layer protocols like HTTP and FTP is not overly complex, but it's also not trivial. Further, it's not really the focus of the application because in the majority of cases, what's above the protocol is what's actually important. That's what makes libcurl so interesting, as it places the focus on the application instead of the variable aspect of your development. Note that few applications develop their own TCP/IP stack, so in the same vein, reusing what you can minimizes your development schedule and increases the reliability of your application.

This article begins with a short introduction to application-layer protocols, then jumps into cURL, libcurl, and an exploration of their use.

Web protocols

Building applications today is considerably different from the recent past. Today, applications are expected to communicate over a network or the Internet — present a network API or interface for human consumption — and also to be flexible through user scripting. Modern applications commonly export a Web interface using HTTP and provide notification of alarms through Simple Mail Transport Protocol (SMTP). These protocols allow you to point a Web browser at the device for configuration or status and receive standard e-mail from the device to your typical e-mail client (HTTP and SMTP, respectively).

These Web services are typically built on top of the socket layer of the networking stack (see Figure 1). The socket layer implements an API that originated in the Berkeley Software Distribution (BSD) operating system and abstracts the details of the underlying transport and networking-layer protocols.

Figure 1. Networking stack and libcurl
Figure 1. Networking stack and libcurl
Figure 1. Networking stack and libcurl

Web services occur in protocol conversations between a client and a server. In the context of HTTP, the server is the end device and the client is the browser at the endpoint. For SMTP, the server is the mail gateway or endpoint user, and the client is the end device. In some cases, the protocol conversation occurs in two steps (request and response), but in others, there's substantially more traffic to negotiate and communicate. This negotiation can create a considerable amount of complexity, which can be abstracted through an API, such as libcurl.

Introduction to cURL

cURL was originally designed as a way to move files between endpoints using different protocols, such as FTP, HTTP, SCP, and others. It started as a command-line utility but is now also a library with bindings to more than 30 languages. So now instead of just using cURL from the shell, you can build applications that incorporate this important functionality. The libcurl library is also portable, supporting Linux®, IBM® AIX® operating system, BSD, Solaris, and many other UNIX® variants.

Getting and installing cURL/libcurl

Getting and installing libcurl is simple, depending upon what Linux distribution you run. If you run Ubuntu, you can easily install these packages with apt-get. The two following lines illustrate how to install libcurl and the Python bindings for libcurl:

$ sudo apt-get install libcurl3
$ sudo apt-get install python-pycurl

The apt-get utility ensures that any dependencies are satisfied in the process.

cURL on the command line

cURL began as a command-line tool for performing data transfer using Uniform Resource Locator (URL) syntax. Given its popularity on the command line, a library to integrate the behavior into applications was created. Today, the command-line cURL is a wrapper over the cURL library. This article starts by exploring cURL on the command line, then digs into its use as a library.

Two of the typical uses of cURL are file transfers using the HTTP and FTP protocols. cURL provides a simple interface to these protocols and others. To get a file from a Web site using HTTP, you simply tell cURL a local file name into which you want the Web page to be written and a URL for the Web site and file to be retrieved. That's a lot of words for the simple command line shown in Listing 1.

Listing 1. Example use of cURL to retrieve a file from a Web site
$ curl -o test html www.exampledomain.com
  % Total    % Received % Xferd  Average Speed    Time    Time     Time    Current
                                 Dload  Upload    Total   Spent    Left    Speed
100 43320  100 43320    0     0  55831       0 --:--:-- --:--:-- --:--:--  89299
$

Note that because I specify the domain, but not a file, I'll get the root file (index.html). To move this file to an FTP site using cURL, specify the file to upload using the -T option, then provide a URL for the FTP site and path to file.

Listing 2. Example use of cURL to upload a file to an FTP site
$ curl -T test.html ftp://user:password@ftp.exampledomain.com/ftpdir/
  % Total    % Received % Xferd  Average Speed    Time    Time     Time    Current
                                 Dload  Upload    Total   Spent    Left    Speed
100 43320    0     0  100 43320      0  38946   0:00:01 0:00:01  --:--:--    124k
$

Could it be much simpler? After you learn some of the patterns, cURL is fairly easy to use. But the breadth of options available to you is large — requesting help from the command line from cURL (using --help) results in 129 lines of options. While that's not huge, there are a large number of options controlling anything from verbosity to security and a variety of protocol-specific configurable items.

From a developer's perspective, this isn't the most exciting aspect of cURL. Dig into the cURL library to see how you can add these file transfer protocols to your applications.

cURL as a library

If you've watched scripting languages over the past 10 years, you've noticed a distinct change in their makeup. Scripting languages like Python, Ruby, Perl, and many others include not only a sockets layer, which you can find in C or C++, but also application-layer protocol APIs. These scripting languages incorporate higher-level functionality that make creating an HTTP server or client, for example, trivial. The libcurl library adds similar functionality to languages like C and C++, but it does so in a way that's portable across myriad languages. You'll find roughly equivalent behaviors for libcurl in all the languages supported by it, though because these languages can differ greatly (consider C and Scheme), the way they provide the behaviors can also differ.

The libcurl library encapsulates the behavior illustrated in Listings 1 and 2) in an API form so it can be used by high-level languages (more than 30 today). This article provides two examples of libcurl. The first explores a simple HTTP client in C (suitable for building Web spiders), and the second is a simple HTTP client in Python.

C-based HTTP client

The C API provides two APIs over the libcurl functionality. The easy interface is a simple API that's synchronous (meaning when you call libcurl with your request, it satisfies it until complete or an error occurs). The multi-interface provides more control over libcurl, allowing your application to perform multiple simultaneous transfers and to control where and when libcurl moves data.

This example uses the easy interface. This API still gives you some control over the data movement process (using callbacks), but lives up to its name. Listing 3 provides the C language example for HTTP.

Listing 3. C HTTP client using libcurl's easy interface
#include <stdio.h>
#include <string.h>
#include <curl/curl.h>

#define MAX_BUF	65536

char wr_buf[MAX_BUF+1];
int  wr_index;

/*
 * Write data callback function (called within the context of 
 * curl_easy_perform.
 */
size_t write_data( void *buffer, size_t size, size_t nmemb, void *userp )
{
  int segsize = size * nmemb;

  /* Check to see if this data exceeds the size of our buffer. If so, 
   * set the user-defined context value and return 0 to indicate a
   * problem to curl.
   */
  if ( wr_index + segsize > MAX_BUF ) {
    *(int *)userp = 1;
    return 0;
  }

  /* Copy the data from the curl buffer into our buffer */
  memcpy( (void *)&wr_buf[wr_index], buffer, (size_t)segsize );

  /* Update the write index */
  wr_index += segsize;

  /* Null terminate the buffer */
  wr_buf[wr_index] = 0;

  /* Return the number of bytes received, indicating to curl that all is okay */
  return segsize;
}


/*
 * Simple curl application to read the index.html file from a Web site.
 */
int main( void )
{
  CURL *curl;
  CURLcode ret;
  int  wr_error;

  wr_error = 0;
  wr_index = 0;

  /* First step, init curl */
  curl = curl_easy_init();
  if (!curl) {
    printf("couldn't init curl\n");
    return 0;
  }

  /* Tell curl the URL of the file we're going to retrieve */
  curl_easy_setopt( curl, CURLOPT_URL, "www.exampledomain.com" );

  /* Tell curl that we'll receive data to the function write_data, and
   * also provide it with a context pointer for our error return.
   */
  curl_easy_setopt( curl, CURLOPT_WRITEDATA, (void *)&wr_error );
  curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, write_data );

  /* Allow curl to perform the action */
  ret = curl_easy_perform( curl );

  printf( "ret = %d (write_error = %d)\n", ret, wr_error );

  /* Emit the page if curl indicates that no errors occurred */
  if ( ret == 0 ) printf( "%s\n", wr_buf );

  curl_easy_cleanup( curl );

  return 0;
}

At the top are the necessary include files, including the cURL root file. Next, I define a couple variables for the transfer. The first, wr_buf, represents the buffer where the incoming data will be written. wr_index represents the current write index to the buffer.

Jump down to the main function, which performs the setup using the easy API. All cURL calls operate through a handle that maintains state for the particular request. This is defined as a CURL pointer reference. This example also creates a special return code called CURLcode. Before using any libcurl functions, you need to call curl_easy_init to get the CURL handle. Next, notice a number of curl_easy_setopt calls. These configure the handle for a particular operation. For these calls, you provide the handle, a command, and an option. First, this example uses CURLOPT_URL to specify the URL to retrieve. Next, it uses CURL_WRITEDATA to provide a context variable (in this case, it's the internal write error variable). Finally, it uses CURLOPT_WRITEFUNCTION to specify the function that should call when data is available. The API will call this function one or more times with data it has read after you instruct it to start.

To kick off the transfer, call curl_easy_perform. Its job is to perform the transfer given the prior configuration. When you call this function, it will not return until the transfer is satisfied or an error occurs. The final elements of main are to emit the return statuses, emit the page read, and, finally, clean up using curl_easy_cleanup (when you're done with the handle).

Now look at the write_data function. This function is a callback called as data is received for the particular operation. Note that while you're reading data from the Web site, the data is written to you (write_data). The callback is provided with a buffer (containing the data available), the number of members and their size (the product being the total data available in the buffer), and the context pointer. The first task is to ensure that the buffer (wr_buf) has sufficient space for the write data. If not, it sets the context pointer and returns zero, indicating that there was a problem. Otherwise, it copies the data from the cURL buffer into your buffer and increments the index to point to the next location in which to write. This example also terminates the string so you can use printf on it later. Finally, it returns the number of bytes that were operated on to libcurl. This tells libcurl that the data was ingested, and it can discard that data. And that's it — a relatively simple way to read a file from a Web site into memory.

Python-based HTTP client

This section provides an example similar to the C-based HTTP client but in Python. Python is a useful object-oriented scripting language that's great for prototyping and building production software. The example assumes you have some familiarity with Python, but uses very little of it, so not much is expected.

The simple Python HTTP client using pycurl is shown in Listing 4.

Listing 4. Python HTTP client using libcurl's pycurl interface
import sys
import pycurl

wr_buf = ''

def write_data( buf ):
	global wr_buf
	wr_buf += buf

def main():
	c = pycurl.Curl()
	c.setopt( pycurl.URL, 'http://www.exampledomain.com' )
	c.setopt( pycurl.WRITEFUNCTION, write_data )

	c.perform()

	c.close()

main()
sys.stdout.write(wr_buf)

This one is considerably simpler than the C version. It begins by importing the necessary modules (sys for standard system module and the pycurl module). Next, it defines the write buffer (wr_buf). As in the C program, I declare a write_data function. Note that this function takes a single argument: the data buffer read from the HTTP server. I simply take that buffer and concatenate it to the global write buffer. The main function starts by creating a Curl handle, then uses the setopt methods to define the URL and WRITEFUNCTION for the transaction. It calls the perform method to start the transfer and closes the handle. Finally, it calls the main function and emits the write buffer to stdout. Note that in this case, you don't need the error-context pointer because you're using Python string concatenation, which means you don't use a statically sized string.

Going further

This article hardly scratches the surface of libcurl, given the vast number of protocols and languages it supports. But hopefully, this shows how simple it is to build applications that use application-layer protocols like HTTP. The libcurl Web site (see Related topics) provides a large number of examples and a considerable amount of useful documentation. So next time you're developing a Web browser, spider, or other application that has application-layer protocol requirements, give libcurl a try. It will certainly cut down your development time and bring joy back to coding.


Downloadable resources


Related topics

  • cURL is a command-line tool and library that implements a variety of client-side protocols. It supports more than 12 protocols including FTP, HTTP, Telnet, and their secure variants. You'll find cURL on a number of platforms, including Linux, AIX, BSD, and Solaris, supporting more than 30 languages.
  • PycURL is a thin layer over the libcurl API. As a thin layer, PycURL is extremely fast. With PycURL, you can develop Python applications using the libcurl library.
  • Speaking of application flexibility, you can learn more about integration of scripting capabilities into your application in "Scripting with Guile."
  • Follow developerWorks on Twitter.
  • Start developing with product trials, free downloads, and IBM Bluemix services.

Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=425561
ArticleTitle=Conversing through the Internet with cURL and libcurl
publish-date=09082009