Simplifying scalable cloud software development with Apache Thrift

Apache Thrift is a framework that enables scalable cross-language development, resulting in unambiguous communication among components in cloud environments. This article introduces the ideas around Thrift (an interface definition for remote procedure call with multilanguage bindings), and then demonstrates Thrift in a multilanguage client and server application.

M. Tim Jones, Independent author, Consultant

Photo of M.Tim JonesM. Tim Jones is an embedded firmware architect and the author of Artificial Intelligence: A Systems Approach, GNU/Linux Application Programming (now in its second edition), AI Application Programming (in its second edition), and BSD Sockets Programming from a Multilanguage Perspective. His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and networking protocols development. Tim is a platform architect with Intel and author in Longmont, Colo.



12 November 2013

Also available in Russian Japanese Portuguese

Developing applications in cloud environments tends to result in a similar set of issues as distributed systems development involving cross-node communication, protocols, and exposed services. This development can be further complicated by the presence of multiple programming languages, each with its own type system, requiring translation of types between clients and servers to ensure that all entities in the group speak the same formats. Even when a system is developed in the same programming language, interactions require that the objects be communicated in an efficient and unambiguous way. Add one or more languages, and the complexity of the system and scope of work increase dramatically.

A key problem in distributed systems development is that of communication among the software elements. A simplification of this type of service allows software to abstract distributed interactions underneath procedure calls. One of the first implementations of what is called remote procedure calls (RPCs) was introduced by Xerox in 1981 in a protocol called Courier. This protocol provided primitives to allow applications written in the Mesa programming language to communicate across a network, but these applications required manual intervention to serialize and deserialize the calls. This concept has been widely applied since then in many applications. Other examples of RPCs include the Common Object Request Broker Architecture, the Google Protocol Buffers (protobufs) package, Apache Avro, and many others.

In 2007, Facebook released into the open source domain a new RPC implementation with mostly seamless multilanguage support that has now become an Apache project called Thrift. Thrift supports a large number of programming languages (with variable levels of language support), including Python, Ruby, Smalltalk, Haskell, Erlang, Java™, Node.js, PHP, Go, D, and C/C++.

Origin of Apache Thrift

Before becoming an Apache open source project, Thrift was a software library and set of code-generation tools used internally at Facebook. It's goal was to enable efficient communication across programming languages using a transparent high-performance software bridge.

Thrift is a framework for scalable cross-language services development that provides not only code-generation support for a multitude of languages but a software stack that simplifies the development of network-based services. Let's explore some of the underlying ideas in Thrift and work toward a demonstration of its capabilities.


Apache Thrift as a lingua franca

Apache Thrift implements a software stack that helps to simplify the development of communicating multilanguage applications. Figure 1 provides a simple illustration of the software stack. At the bottom of the stack is the physical interface (which could be a network interface or something as simple as a file). That interface influences higher levels of the stack.

Figure 1. The Thrift software stack
Image showing the Thrift software stack

Thrift provides several transport and protocol layers that define how data is moved and what format it takes (using a Processor, which encapsulates input and output streams). For example, TBinaryProtocol defines an efficient binary protocol for communication that can be used over the TSocket transport to communicate between network end points. Alternatively, you can use TJSONProtocol to communicate using the JavaScript Object Notation (JSON) format with the TMemoryTransport for shared-memory communication.

At the top level are server types that you can implement with the aid of Thrift. This configuration can include a single-threaded server for debugging (TSimpleServer), an HTTP-based server that can provide Representational State Transfer-like URLs (THttpServer), or a multiprocess server that forks a process for each request received (TForkingServer). Thrift is written not only to simplify communication across a variety of approaches (protocols and transports) but also to simplify server development using a variety of server styles.

Not shown in Figure 1 is a type system that permits communication across the languages that Thrift supports. This system supports types such as byte, short, int double, string, and more advanced types such as containers (lists, maps) and structures. This generic type system provides a common basis for communication of data.


Installing Thrift

You can install Thrift from source, which is a relatively simple process, or through a package manager such as apt or yum. To install from source, simply download the gzipped tarball, configure it, and then build with make. When Thrift is built, you can easily install it using make's install target. Depending on your language choice, more packages may be necessary (for example, C++ clients and servers require the Boost package).

If you're running a recent version of a Linux® distribution, you may be able to install the Thrift compiler through your package manager. For example, in Ubuntu, you could try the following:

$ sudo apt-get install thrift-compiler

See the installation resource for details.


RPC, type-systems, and interface definition

From the software stack shown in Figure 1, the solution appears to be quite complicated. Although there is some complication in Thrift (as there would be in any multilanguage RPC scheme), the reality of using Thrift is much simpler.

The first step in using Thrift to build an RPC application is to define an interface definition file (see Figure 2). This file allows you to define the types and services that you'll expose. This file is language independent and uses Thrift types and definitions. Using this file, the Thrift compiler generates source files (for both the client and the server) in the language of your choice.

Figure 2. sing the Thrift compiler for source generation
Image showing how to use the Thrift compiler for source generation

Let's look at a simple example that illustrates this concept, and then move to a working example. The general use of the Thrift compiler is:

$ thrift --gen <language> >.thrift file>

Consider a simple application of a networked arithmetic service. This service provides fundamental math services to remote clients through a simple application programming interface. To define this service, I create a Thrift file as shown in Listing 1. This file provides name spaces for the various language users (py representing Python and rb representing Ruby). I then define the service that I want to expose. Using the Thrift definition, I use the service keyword and a name to represent the service (in this case, MyMath). Define your services in a language-independent way. This service consists of four exposed functions, each of which takes a set of input values and returns a result.

Listing 1. Sample .thrift file defining your interface (proj.thrift)
# proj.thrift

namespace py demoserver
namespace rb demoserver

/* All operands are 32-bit integers called a Value */
typedef i32 Value
typedef i32 Result

/* Math service exposes an some math function */
service MyMath
{
  Result add( 1: Value op1, 2: Value op2 ),
  Result mul( 1: Value op1, 2: Value op2 ),
  Result min( 1: Value op1, 2: Value op2 ),
  Result max( 1: Value op1, 2: Value op2 )
}

To compile this interface definition into code, I define my desired programming language (in this case, Python) to Thrift as:

$ thrift --gen py proj.thrift

This process results in a subdirectory called gen-py that contains a number of files defining the types, constants, and code. The full contents of this subdirectory of generated code is shown below. Of interest is the code generated for the types (ttypes.py) and software stack for the service (MyMath.py). Also notice MyMath-remote, which I explore shortly as a way to test the new service.

$ ls gen-py/*
gen-py/__init__.py

gen-py/demoserver:
constants.py  __init__.py  __init__.pyc  MyMath.py  MyMath.pyc  MyMath-remote  
ttypes.py  ttypes.pyc
$

To fully demonstrate Thrift and appreciate what it provides, however, you need more than one language. Consider a client written in Ruby. To support a Ruby client, you must generate a Thrift code set for Ruby using the Thrift compiler:

$ thrift --gen rb proj.thrift

If you peek inside the newly created gen-rb subdirectory, you'll find the files for a Ruby implementation. Each language provides a different implementation of the Thrift software stack, and as you see below, the file results differ while still providing the types, constants, and service implementation.

$ ls gen-rb/
my_math.rb  proj_constants.rb  proj_types.rb
$

With the Thrift job complete on the service interface definition file, I'm now ready to implement a server that uses the service implementation in Python.


Building a Thrift server

Based on the service definition in Listing 1, I implement those services within Python. As shown in Listing 2, my Python server is quite simple. I first expose the Python modules by appending the path to the Thrift-generated code, and then import the necessary modules that I'll use. Note that these modules are the software stack as shown in Figure 1. I define my derived class of the math implementation based on the base class built from Thrift (demoserver.MyMath.Iface). This class defines the implementation for my services, simply organized in Python.

The final part of Listing 2 is the main portion of the script, which assembles the components of the Thrift stack. I create the processor, initializing it with my MathImpl class. I also create the protocol and transport elements of the stack and assemble the stack within the TThreadedServer. A final call to the serve() method of the server instance enables processing of threaded requests.

Listing 2. Python Thrift server (server.py)
#!/usr/bin/python

import sys

sys.path.append('./gen.py')

from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from thrift.server import TServer

import demoserver.MyMath

class MathImpl( demoserver.MyMath.Iface ):
	def add( self, op1, op2 ):
		return op1 + op2
	def mul( self, op1, op2 ):
		return op1 * op2
	def max( self, op1, op2 ):
		return max([op1, op2])
	def min( self, op1, op2 ):
		return min([op1, op2])

if __name__ == '__main__':

	processor = demoserver.MyMath.Processor( MathImpl() )
	transport = TSocket.TServerSocket( port = 18181 )
	tbfactory = TTransport.TBufferedTransportFactory()
	pbfactory = TBinaryProtocol.TBinaryProtocolFactory()

	server = TServer.TThreadedServer( processor, transport, tbfactory, pbfactory )

	print 'Starting the Math Server...'

	server.serve();

Recall the MyMath-remote utility built for the Python service. Thrift generated this file to permit testing of my Python server. To use this test application, I first need to expose the generated files into the Python path:

$ export PYTHONPATH=$PYTHONPATH:./gen-py

With this defined, I can now test my server using this command-line utility. Listing 3 presents a sample use of this application along with a look at the help information that the utility exposes. The utility expects a definition of the host (where the server resides and its port number), the function to be invoked, and a set of arguments. Because my server runs on the same node as the client, I use the localhost and my defined port number along with the desired test function and arguments.

Listing 3. Using the provided test application
$ ./gen-py/demoserver/MyMath-remote --help

Usage: ./gen-py/demoserver/MyMath-remote [-h host[:port]] 
		[-u url] [-f[ramed]] function [arg1 [arg2...]]

Functions:
  Result add(Value op1, Value op2)
  Result mul(Value op1, Value op2)
  Result min(Value op1, Value op2)
  Result max(Value op1, Value op2)

$ ./gen-py/demoserver/MyMath-remote -h localhost:18181 max 5 7
7
$ ./gen-py/demoserver/MyMath-remote -h localhost:18181 mul 9 8
72
$

The auto-generated test application provides a nice way to test my server before the client is ready. Speaking of client, let's now look at a corresponding Thrift client implemented in Ruby.


Building a Thrift client

To verify the operation of my server, I next build a client using Thrift and the Ruby language. This process will help illustrate the differences in using Thrift with multiple languages (Python and Ruby, in this case) and provide a nice multilanguage test environment.

Listing 4 provides the full source for a simple Thrift client in Ruby. I begin by making visible the Thrift modules by defining the load path for my generated Thrift Ruby code. Similar to the Python server, my Ruby client builds its software stack from the generated code in the Ruby modules. I create a socket (initialized with the server's node and port location) and connect it to my transport layer. This socket is initialized into my protocol instance, which is further initialized with the services.

After creating a connection to the server (via the open method to the transport instance), I'm able to make calls to the service functions that are serialized to the server beneath the covers. When that's complete, I close the transport, which releases the thread at the server.

Listing 4. Ruby Thrift client (client.rb)
# Make thrift-generated code visible
$:.push('./gen-rb')

require 'thrift'
require 'my_math'

begin

	# Build up the Thrift stack
	transport = Thrift::BufferedTransport.new(Thrift::Socket.new('localhost', 18181))
	protocol = Thrift::BinaryProtocol.new(transport)
	client = Demoserver::MyMath::Client.new(protocol)

	transport.open()

	# Try an add operation
	result = client.add( 1, 5 )
	puts result.inspect

	# Try a max operation
	result = client.max( 9, 7 )
	puts result.inspect

	transport.close()

end

Executing my sample client against the running Python server results in the output shown in Listing 5. That's a small amount of code for the language-sensitive abstractions that have been presented. The Thrift-generated code shows the stack services being used (MyMath.py, my_math.rb), which for a simple service like this are considerable.

Listing 5. Testing the Ruby client
$ ruby client.rb 
6
9
$

As demonstrated, Thrift makes it easy to build client and server applications that expose services. Thrift makes it even easier to build such applications in multiple languages through a unifying interface definition format.


Securing Thrift communication

For production applications or Web-facing applications, security is key. Thrift can support secure communication through the transport layer using the TSSLTransportFactory class. This transport implementation wraps the TSocket and TServerSocket implementations with Secure Socket Layer (SSL) capabilities. Coupled with an RSA key pair, you can safely and securely communicate between a Thrift client and server.


Going further

Although Thrift is a great way to bridge multi-application environments and their communication, it's not for the faint of heart. Scarce and out-of-date documentation and differing support in each language make for a sometimes frustrating experience. The Thrift documentation lacks examples for many of the supported languages, requiring deep inspection of the generated code to understand how to make it work.

But looking beyond some of the immaturity of Thrift, the utility automates much of the detailed code for communication, serialization, and service construction. This behind-the-scenes work allows you to focus on your applications and quickly build multilanguage distributed applications.

Resources

Learn

  • The Apache Thrift website is the primary source for information on Thrift. You can get the latest release of Thrift and learn about how to use Thrift from its documentation and tutorials.
  • Installing Thrift is a relatively painless procedure. If you run into issues or are unsure of the requirements for using Thrift, check out this list of build requirements.
  • Thrift: Scalable Cross-Language Services Implementation by Mark Slee, Aditya Agarwal, and Marc Kwiatkowski of Facebook provides a great introduction to Thrift, it's motivation, and some of the interesting design trade-offs and implementation details.
  • Meet the Extensible Messaging and Presence Protocol (XMPP) (M. Tim Jones, developerWorks, September 2009) introduces XMPP and its application as an instant messaging protocol. XMPP uses XML under the covers to manage serialization and communication between clients and servers. Although it doesn't address the identical requirements of Thrift, XMPP is another example of communication between applications with multilanguage support.
  • In the developerWorks cloud developer resources, discover and share knowledge and experience of application and services developers building their projects for cloud deployment.
  • Follow developerWorks on Twitter. You can also follow this author on Twitter at M. Tim Jones.
  • Watch developerWorks demos ranging from product installation and setup demos for beginners to advanced functionality for experienced developers.

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Cloud computing
ArticleID=952222
ArticleTitle=Simplifying scalable cloud software development with Apache Thrift
publish-date=11122013