Transparent network acceleration for Java-based workloads in the cloud

Introducing the Java Sockets over RDMA library

Java™ Sockets over RDMA (JSOR) is a new communication library in the IBM Java 7 SDK for Linux® platforms. JSOR can improve throughput and reduce latency for client-server applications in cloud environments by exploiting RDMA-capable high-speed network adapters. Learn about the technology underlying JSOR, find out how to use the library, and compare JSOR performance with solutions based on other communication protocols.

Sivasakthi Thirugnanapandi, Java Developer, IBM

Sivasakthi Thirugnanapandi, a developer at the IBM Java Technology Center in India since 2006 has worked on Java components, plugins, Web Start, automation/tooling/scripting, and tools and infrastructure frameworks for the IBM Java class library team. Interests include Java and C coding, listening to podcasts, and reading articles and blogs.



Sreedhar Kodali (srkodali@in.ibm.com), Software Engineering Researcher, IBM

For the past 13 years of his 15-year IT career, Sreedhar Kodali has been involved in developing products on various platforms (Linux, UNIX, Windows, z/OS) and architectures (Intel, Power, System z) in a multitude of domains. He has been with IBM since 2007 and contributed to the implementation of the X10 language, the IBM J9 virtual machine, and the IBM Java SDK in the areas of networking, performance, I/O, runtime, virtualization, tools, and frameworks.



Neil Richards, Advisory Software Engineer, OpenJDK Developer, IBM

Neil Richards is currently a contributor and committer to the OpenJDK Java 8 and Java 7 update projects. He initially joined the IBM Java Technology Center to work on Java 1.1.4 and has been a senior developer in many areas for the IBM Java JVM and class libraries, including networking, serialization, RMI, and CORBA.



Tim Ellison, Senior Technical Staff Member, IBM

Over a 20-year span, Tim Ellison contributed to the commercial implementation of Smalltalk, IBM VisualAge Micro Edition, Eclipse, and the Java SDK. He has broad knowledge of high-performance runtimes, open source methodologies, and development environments, and he presents regularly on these topics at technical conferences.



Xiaoqiao Meng, Research Staff Member, IBM

Xiaoqiao MengXiaoqiao Meng is a research staff member at the IBM T.J. Watson Research Center. He received a bachelor's degree from the University of Science and Technology of China and a doctorate in computer science from UCLA. He is currently working on middleware software, cloud computing, and performance evaluation.



Indrajit Poddar, Senior Software Engineer, IBM

Indrajit Poddar's photoIndrajit Poddar is a software architect for IBM Software Group Technical Strategy, where he leads incubation projects in strategic technology areas such as distributed cloud computing, DevOps, and software-defined infrastructure. He is an IBM Master Inventor and a recipient of three IBM Outstanding Technical Achievement awards.



28 January 2014

Also available in Chinese Japanese

Currently, communication logic among distributed Java application components deployed in the cloud is implemented through TCP/IP socket programming techniques. With the adoption of higher-speed networks (such as 10/40/100Gbps Ethernet) in cloud data centers, it's now possible to use faster network-communication techniques such as Remote Direct Memory Access (RDMA). RDMA programs are typically written in C/C++ using low-level APIs such as OpenFabrics Alliance (OFA) verbs or high-performance computing tools such as the Message Passing Interface (MPI). Accessing such low-level APIs in Java-based applications through the Java Native Interface (JNI) adds to programming complexity and performance overheads. Sockets Direct Protocol (SDP), a comparable approach available in Java 7, hasn't shown performance advantages for many workloads. RDMA Sockets (R-Sockets), another comparable approach, is only available for C/C++ programs.

In this article, we introduce a new Java-specific Linux socket-compatible RDMA communication library called Java Sockets over RDMA (JSOR) — part of the IBM Java SDK 7SR6 on Linux/AMD64 and Linux/Intel platforms. We demonstrate JSOR's use and benefits with a simple Java client-server program (see Download), written using the Java sockets interface, that can be executed without any code changes on RDMA-capable cloud infrastructure.

Background

In a typical Java client-server scenario deployed a cloud environment, the response time for a service request is often constrained by the response time of the network connection between the requester and the service-offering host. The network-communication logic that enables interaction between remote endpoints commonly uses the Java sockets interface for connection establishment and data transfer. The Java sockets interface, by default, is implemented based on the POSIX sockets API. Each network operation must pass through the underlying operating system before it reaches the network interface. This requirement results in costly OS context switches and multiple buffer copies between software layers.

Dedicated TCP/IP protocol offloading engines that are part of special-purpose network interface cards (NICs) can be used to reduce these network-processing overheads. However, such offloading techniques still require some buffer-copying steps. As cloud adoption becomes widespread, many enterprise data centers are starting to migrate to 40Gbps Ethernet from 10Gbps Ethernet in their network links to address cloud computing's increasing bandwidth needs.

RDMA is a hardware-based protocol-offloading technology — originally proposed for high-performance network fabrics such as InfiniBand and high-speed Ethernet — that directly transfers data between two remote application memories without any involvement of either host processor. RDMA potentially eliminates costly OS context switches, saving significant numbers of CPU cycles. Because this message-based protocol is purposely defined for high-performance networks, the applications can take advantage of increased network speeds to achieve latencies below 10 microseconds.

With the advent of the RDMA over Converged Ethernet (RoCE) standard, the RDMA protocol can now be used directly on top of existing high-speed 10/40Gbps Ethernet infrastructure. So, by moving from the traditional TCP/IP stack to RDMA-based network processing, some cloud-based applications can see latency and throughput benefits while using fewer CPU resources.

SDP is a standard wire-based protocol defined for RDMA-supported network fabrics such as InfiniBand and RoCE to accelerate stream socket-based applications transparently. Starting with Java 7, the JDK ships with support for SDP on Linux and Solaris platforms. However, SDP is a kernel-based implementation that negatively impacts performance because of buffer-copying and context-switching overheads.

In the following sections, we introduce and describe JSOR, a completely user-space solution that can bypass the kernel to achieve performance comparable to that of similar native RDMA-based solutions.


About JSOR

JSOR is a cloud network acceleration feature that transparently enables RDMA communication for Java stream sockets when the underlying infrastructure supports RDMA. JSOR incorporates a high-performance RDMA wire protocol within the standard Java socket libraries. Currently, support is provided for the java.net.Socket and java.net.ServerSocket APIs along with the associated input and output streams — so most existing Java client-server applications can benefit out of the box from the improved performance. (See Choosing JSOR, later in this article.)

JSOR design in brief

In a traditional cloud networking scenario, any interaction between access and service nodes ends up as packets flowing on the wire through one or more Ethernet switches, as illustrated in Figure 1.

Figure 1. Traditional cloud networking
Image shows traditional cloud networking

Each network operation — whether it is connection- or data-transfer related — results in the invocation of one or more Java socket calls. Any socket operation performed at the Java level invokes a corresponding native (C or C++) library operation through the JNI layer. A certain amount of pre- and post-processing happens at the Java level before and after the call is executed by the JNI layer. Because the TCP/IP protocol is processed by an OS kernel stack, ultimately all JNI socket-specific methods result in context switches. Transferring or receiving also requires multiple buffer copies at the Java, OS, and NIC levels. Network-processing overheads such as multiple buffer copies and CPU context switches result in higher network latency and poorer throughput.

The JSOR library is compatible with the R-Sockets protocol, which is provided by the R-Sockets library included in the Open Fabric Enterprise Distribution (OFED). The JSOR library includes modifications to make it suitable for common Java application needs. It provides significant scalability, reliability, and serviceability improvements.

Compared to SDP and TCP/IP over InfiniBand (IPoIB), JSOR generally yields higher performance. In our experiments with microbenchmarks, JSOR could give as much as 50-percent higher throughput than SDP and more than 100-percent higher throughput than IPoIB. The better performance is primarily attributable to the fact that JSOR, as part of the standard Java class library, can optimize the Java socket implementation. For example, JSOR avoids data copies across the JNI boundary, better supports various socket semantics, and automatically tunes RDMA parameters based on socket usage patterns. And, whereas IPoIB and SDP are kernel-based transport solutions, JSOR is entirely in user space, so it can shorten the data path and reduce overhead in the kernel.

As Figure 2 illustrates, JSOR intercepts Java socket calls at the Java level and routes them through the underlying RDMA infrastructure. A JSOR enablement property must be specified during Java execution that points to an appropriate configuration file. When the switchover from TCP/IP to RDMA happens, all of the application's interactions with its remote counterpart flow through the underlying RDMA hardware.

Figure 2. Accelerated cloud networking
Image shows accelerated cloud networking

Using JSOR

Before you use JSOR, a few prerequisites in the cloud execution environment must be met:

  • The underlying host should have an appropriate host channel adapter (HCA) or RDMA NIC and be interconnected to the remote host by a high-performance InfiniBand or Ethernet switched fabric.
  • Each participating host should have the OFED 1.5.1 or higher base runtime libraries installed. Specifically, JSOR looks for the libibverbs.so and librdmacm.so libraries at execution time for dynamically loading function pointers.
  • Your user account should be entitled to adequate (preferably unlimited) lockable memory based on your application needs. JSOR socket buffers are memory-pinned by default, so the OS can't swap them out during the critical phases of data transfer. On Linux, use the
    ulimit -l shell command to display the maximum locked memory setting.

When these base requirements are met, a configuration file in plain text format is needed at both the client and server endpoints. Each record or line in the configuration file specifies an accept, bind, or connect rule and should contain a minimum of four fields separated by white space:

  • The first field indicates the type of network provider. Currently, only the rdma provider is available.
  • The second field specifies an accept, bind, or connect keyword, depending on which rule you are specifying.
  • The third field specifies a local IP address if the rule specified is accept or bind, or a remote IP address if the rule specified is connect.
  • The fourth field specifies a port or set of ports on which the RDMA traffic is allowed. Basically, the third and fourth fields together define a set of socket endpoints for RDMA-specific connection establishment and data transfer.
  • The fifth and subsequent fields apply only to an accept rule that specifies a list of client IP addresses while accepting incoming RDMA connection requests.

The configuration for the service (passive) side should have accept or bind entries, whereas the client (active) side configuration should have connect or bind entries.

For instance, to accept RDMA connections from the clients 192.168.1.3 and 192.168.1.4 on the service host 192.168.1.1 through port 65444, the following rule is needed in the Java application server's configuration file (which we'll call rdma_server.conf):

rdma    accept    192.168.1.1    65444    192.168.1.3    192.168.1.4

Similarly, to request an RDMA connection from either of the clients to the service host 192.168.1.1 listening on port 65444, the following rule is needed in the Java client application's configuration file (which we'll call rdma_client.conf):

rdma    connect    192.168.1.1    65444

Unless you explicitly bind to a specific local address, an ephemeral port will be used on the client side to establish connection with the service end. In the following example, a bind rule is added to the rdma_client.conf file to establish its end of an RDMA connection on port 65333:

rdma    connect    192.168.1.1    65444
rdma    bind       0.0.0.0        65333

The third field (0.0.0.0) in the bind rule refers to the null address, and it defaults to the first available InfiniBand address on the local host.

When you have the configuration file ready, specify it as the value of the com.ibm.net.rdma.conf property during Java command execution. For example, on the passive (service) side:

java -Dcom.ibm.net.rdma.conf=rdma_server.conf SampleServer args

On the active (client) side:

java -Dcom.ibm.net.rdma.conf=rdma_client.conf SampleClient args

Listing 1 shows the portion of a SampleServer class that creates a server socket and waits for a connection from the remote end. When the connection is established, the server receives the specified number of bytes from the client and sends back the same number of bytes to the client in a single iteration. This receive/send step is repeated the specified number of times.

Listing 1. SampleServer.java
 // Create server socket to listen on x.x.x.x address and x port
 ServerSocket server = new ServerSocket(Integer.parseInt(args[1]), 0, InetAddress.getByName(args[0]));
 ...
 Socket client = server.accept();
 ...
 // Receive and send message specified number of times
 for (int i = 0; i < xferCount; i++) {
     in.read(msgBuf, 0, msgSize);
     out.write(msgBuf, 0, msgSize);
}

Listing 2 shows the portion of a SampleClient class that requests a connection with the remote service host. When the connection is established, the client sends the specified number of bytes to the server and receives the same number of bytes back from the server in a single iteration. This send/receive step is repeated the specified number of times.

Listing 2. SampleClient.java
// Create client socket to connect x.x.x.x address and x port
 Socket client = new Socket(InetAddress.getByName(args[0]), Integer.parseInt(args[1]));
 ...
long startTime = System.nanoTime();
for (int i = 0; i < xferCount; i++) {
    out.write(msgBuf, 0, msgSize);
    in.read(msgBuf, 0, msgSize);
}
 long endTime = System.nanoTime();

In SampleClient.java, the whole send/receive sequence is timed so we can compute the round-trip time (RTT) for the total number of bytes.

Sample runs

We performed the following sample runs to compare the RTT for various protocols with a message size of 4KB and repetition count of 1,000. These sample runs were made on a test bed consisting of two IBM HS22 blade servers interconnected by a Voltaire 40Gbps InfiniBand switch. Each server runs Red Hat Enterprise Linux (RHEL) v61 and is powered by an 8-core Intel Xeon CPU L5609 @ 1.87GHz with 148GB memory plus a Mellanox MT26428 ConnectX VPI PCIe card.

JSOR — SampleClient log
$ cat rdma_client.conf
rdma connect 7.7.12.10 65444
$ java - Dcom.ibm.net.rdma.conf=rdma_client.conf SampleClient 7.7.12.10 65444 1000 4096
Client Ready>
Local: /7.7.12.9:40563 Remote: /7.7.12.10:65444
SBuf: 32768 bytes RBuf: 45056 bytes
Round trip time of 4096000 bytes: 27313 usec
JSOR — SampleServer log
$ cat rdma_server.conf
rdma accept 7.7.12.10 65444 7.7.12.9
$ java -Dcom.ibm.net.rdma.conf=rdma_server.conf SampleServer 7.7.12.10 65444 1000 4096
Server Ready>
Local: /7.7.12.10:65444 Remote: /7.7.12.9:40563
SBuf: 32768 bytes RBuf: 45056 bytes
Received/Sent 4096000 bytes
SDP — SampleClient log
$ cat sdp_client.conf
bind * *
connect 7.7.12.10 65444
$ java -Dcom.sun.sdp.conf=sdp_client.conf 
-Djava.net.preferIPv4Stack=true SampleClient 7.7.12.10 65444 1000 4096
Client Ready>
Local: /7.7.12.9:39156 Remote: /7.7.12.10:65444
SBuf: 8388608 bytes RBuf: 8388608 bytes
Round trip time of 4096000 bytes: 33836 usec
SDP — SampleServer log
$ cat sdp_server.conf
bind * *
connect 7.7.12.10 65444
$ java -Dcom.sun.sdp.conf=sdp_server.conf 
-Djava.net.preferIPv4Stack=true SampleServer 7.7.12.10 65444 1000 4096
Server Ready>
Local: /7.7.12.10:65444 Remote: /7.7.12.9:39156
SBuf: 8388608 bytes RBuf: 8388608 bytes
Received/Sent 4096000 bytes
IPoIB — SampleClient log
$ java SampleClient 7.7.12.10 65444 1000 4096
Client Ready>
Local: /7.7.12.9:40666 Remote: /7.7.12.10:65444
SBuf: 99000 bytes RBuf: 174752 bytes
Round trip time of 4096000 bytes: 98848 usec
IPoIB — SampleServer log
$ java SampleServer 7.7.12.10 65444 1000 4096
Server Ready>
Local: /7.7.12.10:65444 Remote: /7.7.12.9:40666
SBuf: 99000 bytes RBuf: 174752 bytes
Received/Sent 4096000 bytes
TCP/IP over Ethernet — SampleClient log
$ java SampleClient 9.42.84.20 65444 1000 4096
Client Ready>
Local: /9.42.84.26:48729 Remote: /9.42.84.20:65444
SBuf: 32768 bytes RBuf: 43690 bytes
Round trip time of 4096000 bytes: 194224 usec
TCP/IP over Ethernet — SampleServer log
$ java SampleServer 9.42.84.20 65444 1000 4096
Server Ready>
Local: /9.42.84.20:65444 Remote: /9.42.84.26:48729
SBuf: 32768 bytes RBuf: 43690 bytes
Received/Sent 4096000 bytes

Table 1 shows the RTTs for each of the protocols that we tested.

Table 1. Round trip times for sample runs
ProtocolTotal bytes sent/receivedRTT (usec)
JSOR4,096,00027,313
SDP4,096,00033,836
IPoIB4,096,00098,848
TCP/IP4,096,000194,224

As Table 1 shows, JSOR performs better than the other protocols.


Tracing JSOR

When you run your cloud-based Java applications in JSOR mode, it's always important to verify that your application has chosen the RDMA path for connection establishment and data transfer. Because the JSOR enablement is intended to be transparent for the application, there's no straightforward way to check in the normal mode. However, you can enable a service-level view by turning on the IBM JDK's trace option. Preferably, you turn on both Java method tracing and JSOR/NET native tracing to get the complete picture. The typical way to invoke the trace option on your JSOR-enabled application is with the following invocation:

java -Dcom.ibm.net.rdma.conf=config_file 
   -Xtrace:methods={java/net/RDMA*.*},iprint=mt,iprint=NET,iprint=JSOR main_class args

For example, we can rerun the SampleClient and SampleServer applications in JSOR mode with tracing enabled.

The SampleClient trace invocation is:

java -Dcom.ibm.net.rdma.conf=rdma_client.conf 
   Xtrace:methods={java/net/RDMA*.*},iprint=mt,iprint=NET,iprint=JSOR 
   SampleClient 7.7.12.10 65444 1000 4096

The SampleServer trace invocation is:

java -Dcom.ibm.net.rdma.conf=rdma_server.conf 
   -Xtrace:methods={java/net/RDMA*.*},iprint=mt,iprint=NET,iprint=JSOR 
   SampleServer 7.7.12.10 65444 1000 4096

Listing 3 shows a portion of the generated trace logs for the two invocations.

Listing 3. JSOR sample trace log
04:26:27.500 0x21e3e100  mt.0 >java/net/RDMANetworkProvider.initialize()V Bytecode method, This=21e02468
04:26:27.500 0x21e3e100  mt.2  >java/net/RDMANetworkProvider.initialize0()I Native method, This=21e02468
04:26:27.501 0x21e3e100   NET.440  >initialize0(env=0000000021E3E100, obj=0000000021E73B40)
04:26:27.502 0x21e3e100  JSOR.0     >RDMA_Init()
04:26:27.502 0x21e3e100  JSOR.39    >initverbs()
04:26:27.502 0x21e3e100  JSOR.43     <initverbs(rc=0)
04:26:27.502 0x21e3e100  JSOR.46    >initjsor()
04:26:27.502 0x21e3e100  JSOR.47    <initjsor(rc=0)
04:26:27.502 0x21e3e100  JSOR.3    <RDMA_Init(rc=0)
04:26:27.502 0x21e3e100  NET.441 <initialize0(rc=0)
04:26:27.502 0x21e3e100  mt.8  <java/net/RDMANetworkProvider.initialize0()I Native method
04:26:27.502 0x21e3e100  mt.6  <java/net/RDMANetworkProvider.initialize()V Bytecode
method

JSOR has more than 200 trace hooks at the native level, so you could easily end up with large trace files even for small applications. For instance, when SampleServer and SampleClient are run in tracing mode, the trace logs are approximately 7.5MB.


Choosing JSOR

A caveat is that only network I/O-intensive and latency-critical workloads can benefit from RDMA. We encourage you to estimate the end-to-end latencies for your workloads before deciding to use JSOR. Two types of applications are likely to see more benefits:

  • Applications that transfer large amounts of data over long-running connections between distributed components. The time taken to establish a connection is somewhat longer, and the amount of off-heap lockable memory required is significantly greater in JSOR compared to traditional TCP/IP sockets.
  • Applications that do not allocate data buffers dynamically for each network communication. JSOR needs explicit buffer management — unlike TCP/IP, whereby buffers can be allocated dynamically as needed. If variations in application message sizes are minimal and the maximum size of a message is known in advance, then JSOR can allocate buffers statically.

Conclusion

This article introduced an IBM JDK feature called Java Sockets over RDMA (JSOR), available in the IBM Java 7SR6 for Linux/AMD64 and Linux/Intel platforms. We discussed the technology behind JSOR and how it compares with existing solutions based on TCP/IP and SDP protocols. We outlined the procedure for using the JSOR in a cloud-based environment with sample client-server programs and associated configuration files. By running the sample programs on our local test bed, we demonstrated that JSOR can offer better round-trip time compared to the SDP, IPoIB, and TCP/IP protocols. We also described service-level tracing options that can be used to verify whether the application endpoints are using the RDMA path for communication. Finally, we discussed for guidelines choosing applications that can benefit from JSOR.


Download

DescriptionNameSize
Sample codeSample_Code.zip3KB

Resources

Learn

Get products and technologies

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, Cloud computing
ArticleID=960994
ArticleTitle=Transparent network acceleration for Java-based workloads in the cloud
publish-date=01282014