Introduction to programming in Erlang, Part 2: Use advanced features and functionality

Erlang is a multi-purpose programming language that is primarily used for developing concurrent and distributed systems. Part 1 of this series introduced Erlang and how its functional programming style compares with other programming paradigms such as imperative, procedural, and object-oriented programming. In Part 2 you will use some of the advanced features and functionality, starting with basic functions and moving on to concurrent programming, processes, and messaging. These work together to support distributed programming, a powerful feature of Erlang.

Share:

Martin Brown (mc@mcslp.com), Professional Writer, Freelance Developer

author photoMartin Brown has been a professional writer for over eight years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms —Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux®, BeOS, Mac OS/X and more— as well as web programming, systems management and integration. Martin is a regular contributor to ServerWatch.com, LinuxToday.com and IBM developerWorks, and a regular blogger at Computerworld, The Apple Blog and other sites, as well as a Subject Matter Expert (SME) for Microsoft®. He can be contacted through his website at http://www.mcslp.com.



17 May 2011

Also available in Chinese Japanese

Concurrent programming

Implementing the ability to execute multiple processes simultaneously within a single program (as opposed to running multiple programs simultaneously) has always required changes to the way people program in traditional functional programming languages. This issue with concurrency (or multi-threading) is ensuring that threads that you want to update the information with only affect the data and information you are processing. For example, you would not want a process that updates a single file to be executed multiple times, because you would risk the file getting corrupted in the process.

Erlang takes the approach that all programs are concurrent, and that components (functions and modules) never share data. Instead, data is exchanged between components using messages. Using messages helps to eliminate the concurrent data modification issue by limiting how individual components can modify data. Instead of changing data directly, a message is sent to update the data, making isolation and concurrent updates more difficult. Erlang also works on the principle that operations will fail and therefore has a system in place to be able to handle errors and if necessary recover from them.

Internally Erlang handles the concurrency issue by creating small lightweight executions called processes. Unlike other major languages such as C, these processes are not built on top of the native operating system processes or even thread model, but are created and managed internally by the Erlang virtual machine. This allows each process to be significantly more lightweight in terms of memory and CPU requirements than a native OS thread. Furthermore, because Erlang operates by creating and using many of these small processes automatically, it is common to run many thousands or even millions of processes, even with a relatively simple program.

The concurrency model is built into Erlang, and therefore the management of such large numbers of processes is a key part of how Erlang works. The messaging system, which is used to send data between processes, is also built-in and designed to distribute messages to any processes very efficiently, regardless of the current number of running processes. As a side benefit of the message implementation, messages can be sent not only internally but also across the network, allowing Erlang to support distributed programming across machines by sharing the messages across multiple instances.


Processes

Because the Erlang process system is so lightweight, the creation of processes is very straightforward and easy, because there is no overhead and therefore little reason to worry about the implications. This means that you can create a new process for any reason to help your application.

In practice you create a process by calling the built-in function spawn(), which has the following calling structure: PID = spawn(module, function, arguments) where module is the module name, function is the function name, and arguments is a list of the arguments to be supplied to the function. The return value (PID in the above example) is a process identifier.

For example, in the previous part of this series the Fibonacci function that we created could be called as a new process using the following format: PID = spawn(fib,printfibo,[10]).

Note that the last argument to spawn() is a list with only one element, not the single argument that you would use when calling the function directly. The result is a new process that will execute the function as if the function had been called: fib:printfibo(10).

The new process created by spawn() will continue to execute until it terminates, either normally (that is, with no error) or abnormally (some fault occurred).

The actual spawning process itself will never fail, even if the function that you are calling does not exist. This reduces the requirement to test the spawning process in your code. Errors in processes beyond those in the Erlang shell are handled and recorded by the error logger, a built-in process that handles the error reporting.

In typical usage, processes are used to support concurrency. In the Fibonacci sample given above, running the printfibo() function in another process produced no useful return value. However, what if you want to spawn a new process, such as the underlying fibo() function, and both send and receive information both to and from the process?

The built-in messaging system handles this interaction.


Messaging

The messaging system in Erlang is another built-in part of the Erlang execution environment and works in combination with the process system to allow for the efficient exchange of data and messages.

Each process is provided with a 'mailbox' into which another process can send messages. Messages are stored in the order they are sent, which means that if you send sent two messages, A and then B from one process to another, the messages appears in the mailbox with message A first, and message B second. Because of the concurrent nature of the process system, the ordering of messages from multiple processes to a single one are not ordered in any specific way, beyond the individual order of messages from the same process.

To send a message to a process, you need to know the process ID of the process with which you want to communicate. You then use the construct: Pid ! Message where Pid is the process ID, and Message is any Erlang data type.

To receive a message from within a process, you use the receive statement. Within the receive statement you use pattern matches to determine what to do based on the message content. If the match is successful, the message is received from the mailbox, the message arguments are made available by binding them to the variables in the match, and the corresponding clause is executed.

For example, matching against a message supplying an atom of 'store' and a value might look like Listing 1.

Listing 1. Using the receive statement to receive a message from within a process
receive
    {store, Value} -> store(Value),
    {get, Value} -> get(Value)
end

In this example, the code has used pattern matching to match the atom and variable on the left hand side and then perform the operation on the right, in this case storing a value and getting a value based on the content of the message.

The receive statement pauses execution in the process, ensuring that the process is now waiting until a new message comes in before performing an operation. A typical example of this is the basic operation of storing a variable that might be shared among multiple processes.

Typically within an application, you will use the receive statement as part of a loop to progressively read new messages sent to the process and perform individual operations. Erlang doesn't have loops in the traditional sense, as seen in the Fibonacci example given in Part 1 (see Resources); instead, you write a function that calls itself to process the next message (see Listing 2).

Listing 2. Function that calls itself to process the next message
dbrequest() ->
    receive
        {store, Value} -> store(Value),
        {get, Value} -> get(Value)
    end

With a traditional concurrent environment, you might use a solution such as a semaphore that allows processes to determine whether a variable is 'in use' or can be updated. In many environments, the semaphore introduces a wait for each process trying to update the same value, which can delay the execution of the program. Within Erlang, you can surround the individual operations within a process and then use messaging to handle the updates. Because messages are received in a sequence, each operation can be performed individually by processing each message in order, as shown in Figure 1.

Figure 1. Handling updates on a single value using messaging
Diagram shows the flow of data: Store value, then read value then update value then update value which flows into receive which interacts with data by creating, retrieving, updating or deleting.

This basic concurrency and messaging structure is the basis of a number of different applications. For example, Facebook uses the message environment for Facebook messaging. CouchDB, a document based database that provides its interface over the web using MochiWeb as the basis, makes use of the processes and messaging system to ensure that database updates and responses are correctly handled without suffering the normal concurrent update issues that other databases often suffer from.

Messaging, especially when combined with concurrent processing, enables programs to process information sequentially, even if there may be multiple requests coming from different locations. This gets by one of the main issues with typical concurrent programming in other languages in that data and operations can be shared without worrying about corrupting or destroying the information in the process. This eliminates one of the main pain points experienced by most concurrent programming problems.

Concurrency, however, only gets you so far when it comes to solving the problem of scaling up your solution and improving performance, particularly in modern network and web applications. Eventually you get to the point of requiring more than one server. Fortunately, Erlang has a solution for this distributed programming problem too.


Distributed programming

Distributed programming in Erlang is built on a combination of a simple network server and the messaging system that we have already seen earlier in this article to provide the mechanisms for sending and receiving messages and, more importantly, for supporting the kind of remote procedure call supported by environments such as the native RPC and web services.

It's worth noting that distributed does not necessarily mean different machines, it could be two different Erlang applications that want to talk to each other and share information or operations. Erlang makes no distinction about the local or remote nature of the systems during general use, except in identifying the systems that you are communicating.

To start using distributed programming, first you need to start Erlang and give each instance of Erlang a unique name. This will be used during identification so that you can send messages to a named instance of Erlang. To set the name when using Erlang from the command line you can use the sname command line option (see Listing 3).

Listing 3. Using the sname command line option
$ erl -sname one
Erlang R13B04 (erts-5.7.5) [source] [64-bit] [rq:1] [async-threads:0]

Eshell V5.7.5  (abort with ^G)
(one@mammoth)1>

Note how the prompt has changed to provide the name and hostname, this is the unique identifier for the node and can be used within Erlang to identify and communicate between nodes.

If you now start up another instance of the Erlang shell, you can set a different a different name, as shown in Listing 4.

Listing 4. Setting a different name
$ erl -sname two 
Erlang R14B (erts-5.8.1) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe] 
[kernel-poll:false]

Eshell V5.8.1  (abort with ^G)
(two@mammoth)1>

With both nodes running, you can test if one node can communicate with the other using the net_adm:ping() (see Listing 5).

Listing 5. Using net_adm:ping() to test if one node can communicate with the other
(one@mammoth)3> net_adm:ping('two@mammoth').
pong

This shows that instance one can communicate with instance two.

To send messages between the two processes you use a modification of the messaging operator that includes the node name in addition to process id of the recipient. For example, if a process had registered with the name basic, you would send a message using the code in Listing 5. You can type this into the shell for the first instance of Erlang you created: { basic, 'two@mammoth'} ! { self(), "message" }.

On the second host, if you register the current process (that is, the shell), as 'basic', and then retrieve the message you can output message data. You will need to register the current process before sending the message in instance one (see Listing 6).

Listing 6. Registering the current process
(two@mammoth)1> register(basic,self()).
true

Now create the receive statement to output the message (see Listing 7).

Listing 7. Create the receive statement to output the message
(two@mammoth)2> receive                             
(two@mammoth)2> {From,Message} -> io:format(Message ++ "~n")
(two@mammoth)2> end.
something
ok

You've successfully sent a message between two instances of Erlang. They could have just as easily been on different sides of the world, rather than the same machine. Once you can send messages between the two machines, sending and receiving any kind of data becomes a simple case of knowing the process ID and the node name.

For more familiar remote procedure calls, Erlang also supports the rpc:call() function: rpc:call(Node, Module, Function, Arguments).

This calls the remote node and executes the specific module and function with the supplied arguments, returning the result to the caller. The rpc module includes extensions that allow for synchronous, asynchronous, and blocking calls. This is a direct function call and therefore may be subject concurrency issues if many clients run it at the same time, and therefore may not be ideal when compared to the messaging model, but it will depend on your application.


Using MochiWeb

MochiWeb is an entire HTTP web stack built on top of Erlang. It makes use of many of the features of the Erlang that have been covered in this article, including the use messaging and processes to provide high performance with a high level of concurrency. MochiWeb makes use of the process system to support the concurrency, and uses messages to help process the requests and accumulate the results.

MochiWeb itself is a combination of the code and a suite of scripts that enable you to quickly create a basic framework from which you can build and extend your own application. In this last section, we'll examine how to use MochiWeb, setup a new web server application, and how you can extend this to support your own applications.

The easiest way to get MochiWeb is to get the sources from GitHub. You can do this either by using the git command, or by using the downloadable package from the GitHub website (see Resources.

To get the sources using git, use: $ git clone https://github.com/mochi/mochiweb.git.

This will create a directory called mochiweb in your current directory. To use MochiWeb, you need to build the sources. This prepares MochiWeb so that you can create a new MochiWeb application.

To do this, first run make in the mochiweb directory, as shown in Listing 8.

Listing 8. Building the resources
$ cd mochiweb $ make
==> mochiweb (get-deps)
==> mochiweb (compile)
Compiled src/mochiweb_sup.erl
Compiled src/mochifmt.erl
Compiled src/mochiweb_charref.erl
Compiled src/mochiweb_request_tests.erl
Compiled src/mochifmt_records.erl
Compiled src/mochiweb_socket.erl
Compiled src/mochiweb_app.erl
Compiled src/mochiweb_io.erl
Compiled src/mochifmt_std.erl
Compiled src/mochiglobal.erl
Compiled src/mochiweb_socket_server.erl
Compiled src/mochijson.erl
Compiled src/mochihex.erl
Compiled src/mochiweb_html.erl
Compiled src/mochiweb_multipart.erl
Compiled src/mochilogfile2.erl
Compiled src/mochiweb_cover.erl
Compiled src/mochiweb_util.erl
Compiled src/mochitemp.erl
Compiled src/reloader.erl
Compiled src/mochinum.erl
Compiled src/mochiweb_headers.erl
Compiled src/mochiweb_skel.erl
Compiled src/mochiutf8.erl
Compiled src/mochiweb_echo.erl
Compiled src/mochiweb_acceptor.erl
Compiled src/mochiweb_http.erl
Compiled src/mochijson2.erl
Compiled src/mochiweb_cookies.erl
Compiled src/mochiweb.erl
Compiled src/mochiweb_mime.erl
Compiled src/mochilists.erl
Compiled src/mochiweb_response.erl
Compiled src/mochiweb_request.erl

The mochiweb sources are now compiled. To create a sample application framework that we can use to build our own web server, you use make again to build a new project directory. The PROJECT becomes the project and directory name, and PREFIX becomes the directory name where the new PROJECT directory will be created. For example, to create a project called mywebserver: $ make app PROJECT=mywebserver PREFIX=../.

The above line will create a new MochiWeb application within the parent directory (that is, at the same level as mochiweb).

The newly created directory is a basic web server that will listen on port 8080 (on all interfaces) by default. To build the application, run make again to ensure that the MochiWeb components and individual application are compiled:

$ cd ../mywebserver
$ make

Finally, run the start-dev.sh script to run the basic application. This will generate a lot of output, all of which is the 'progress report', indicating the individual processes being created to handle the web server. If there are no errors reported in the output, then your web server is up and running (see Listing 9).

Listing 9. Web server is up and running
Erlang R13B04 (erts-5.7.5) [source] [64-bit] [rq:1] [async-threads:0]

=PROGRESS REPORT==== 7-Apr-2011::11:40:36 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.42.0>},
                       {name,alarm_handler},
                       {mfa,{alarm_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 7-Apr-2011::11:40:36 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.43.0>},
                       {name,overload},
                       {mfa,{overload,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 7-Apr-2011::11:40:36 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.41.0>},
                       {name,sasl_safe_sup},
                       {mfa,
                           {supervisor,start_link,
                               [{local,sasl_safe_sup},sasl,safe]}},
                       {restart_type,permanent},
                       {shutdown,infinity},
                       {child_type,supervisor}]

=PROGRESS REPORT==== 7-Apr-2011::11:40:36 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.44.0>},
                       {name,release_handler},
                       {mfa,{release_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 7-Apr-2011::11:40:36 ===
         application: sasl
          started_at: mywebserver_dev@localhost
Eshell V5.7.5  (abort with ^G)
(mywebserver_dev@localhost)1> ** Found 0 name clashes in code paths 

=PROGRESS REPORT==== 7-Apr-2011::11:40:36 ===
          supervisor: {local,crypto_sup}
             started: [{pid,<0.54.0>},
                       {name,crypto_server},
                       {mfa,{crypto_server,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 7-Apr-2011::11:40:36 ===
         application: crypto
          started_at: mywebserver_dev@localhost
** Found 0 name clashes in code paths 

=PROGRESS REPORT==== 7-Apr-2011::11:40:36 ===
          supervisor: {local,mywebserver_sup}
             started: [{pid,<0.59.0>},
                       {name,mywebserver_web},
                       {mfa,
                           {mywebserver_web,start,
                               [[{ip,{0,0,0,0}},
                                 {port,8080},
                                 {docroot,
                                     "/root/mybase/mywebserver/priv/www"}]]}},
                       {restart_type,permanent},
                       {shutdown,5000},
                       {child_type,worker}]

=PROGRESS REPORT==== 7-Apr-2011::11:40:36 ===
         application: mywebserver
          started_at: mywebserver_dev@localhost

=PROGRESS REPORT==== 7-Apr-2011::11:40:36 ===
          supervisor: {local,kernel_safe_sup}
             started: [{pid,<0.77.0>},
                       {name,timer_server},
                       {mfa,{timer,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,1000},
                       {child_type,worker}]

You can try accessing the new web server by opening your web browser. If it is on the same machine, you will open http://localhost:8080/. If everything is working correctly you should get a page with a title of 'It Worked' and the message 'webserver running.'.

If you want to modify the new server, you can edit the src/mywebserver_web.erl file within the application directory. This contains the core code for running and supporting the web service.

The core of the process is the loop() function that is called each time a request is received by the main MochiWeb system. The function is provided with two arguments, the request structure (which includes the request type, path and any body data), and the DocRoot. The latter is required because by default the server will provide the file requested from the filesystem within the specified document root.

The process of the request happens in two phases, first the request type (GET, POST, and so on) is extracted using a case statement. Then, a second case statement is used to identify the path of the request.

You can use the pattern matching in Erlang so that a path on one side triggers a specific response. For example, the code can be modified so that accessing the path /hello on the server returns the phrase 'Hello world', as shown in Listing 10.

Listing 10. Pattern matching in Erlang
loop(Req, DocRoot) ->
    "/" ++ Path = Req:get(path),
    try
        case Req:get(method) of
            Method when Method =:= 'GET'; Method =:= 'HEAD' ->
                case Path of
                    "congrat" ->
                        Req:ok({"text/html", [],["<h1>Congratulation
</h1>"]});
                    "hello" ->
                        Req:ok({"text/plain",[],["Hello world"]});
                    _ ->
                        Req:serve_file(Path, DocRoot)
                end;
            'POST' ->
                case Path of
                    _ ->
                        Req:not_found()
                end;
            _ ->
                Req:respond({501, [], []})
        end
    catch
        Type:What ->
            Report = ["web request failed",
                      {path, Path},
                      {type, Type}, {what, What},
                      {trace, erlang:get_stacktrace()}],
            error_logger:error_report(Report),
            %% NOTE: mustache templates need \ because they are not awesome.
            Req:respond({500, [{"Content-Type", "text/plain"}],
                         "request failed, sorry\n"})
    end.

Once you have edited the file, make sure you run make to rebuild your application, and then restart the application using the start-dev.sh script.

Now if you access the URL of your web server using http://localhost:8080/hello you should get the hello world message.

Although this is a basic example, you can see that adding new functionality, such as supporting a basic REST service, would be easily achieved by looking for POST or PUT requests, processing the document body and then storing the information. Using a spawned process and messaging you could queue the requests from the server to store, update and retrieve information.


Conclusion

Erlang has its history in the environment of telephone switches, and this has meant that the core functionality of the language has been developed within a completely different environment than most other languages. The problems of running and creating multiple processes to handle a number of concurrent operations (for example, phone calls) are built into the language.

Creating new processes and communicating between them in a way that is not either destructive, or does not involve complex semaphore systems for sharing information among processes has been simplified with the use of a in-built messaging system. Because the messaging is sequential, it makes communication easy to handle between each process. Furthermore, the message system operates across Erlang instances and over a network, making cross machine communication simple and straightforward.

MochiWeb combines much of this functionality together to produce a high-performance and very scalable web server solution that can also be easily expanded and extended to support additional functionality with very little work.

Resources

Learn

Get products and technologies

  • MochiWeb: GitHub hosts this Erlang library for building lightweight HTTP servers.
  • Erlang: Download Erland programming language.
  • The Apache CouchDB Project: CouchDB, which is written in Erlang and makes use of the MochiWeb HTTP server can be obtained from Apache.
  • IBM trial software: Innovate your next open source development project using trial software, available for download or on DVD.

Discuss

  • developerWorks community: Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, XML
ArticleID=658299
ArticleTitle=Introduction to programming in Erlang, Part 2: Use advanced features and functionality
publish-date=05172011