Transparent network acceleration for Java-based workloads in the cloud

Introducing the Java Sockets over RDMA library


Currently, communication logic among distributed Java application components deployed in the cloud is implemented through TCP/IP socket programming techniques. With the adoption of higher-speed networks (such as 10/40/100Gbps Ethernet) in cloud data centers, it's now possible to use faster network-communication techniques such as Remote Direct Memory Access (RDMA). RDMA programs are typically written in C/C++ using low-level APIs such as OpenFabrics Alliance (OFA) verbs or high-performance computing tools such as the Message Passing Interface (MPI). Accessing such low-level APIs in Java-based applications through the Java Native Interface (JNI) adds to programming complexity and performance overheads. Sockets Direct Protocol (SDP), a comparable approach available in Java 7, hasn't shown performance advantages for many workloads. RDMA Sockets (R-Sockets), another comparable approach, is only available for C/C++ programs.

In this article, we introduce a new Java-specific Linux socket-compatible RDMA communication library called Java Sockets over RDMA (JSOR) — part of the IBM Java SDK 7SR6 on Linux/AMD64 and Linux/Intel platforms. We demonstrate JSOR's use and benefits with a simple Java client-server program (see Download), written using the Java sockets interface, that can be executed without any code changes on RDMA-capable cloud infrastructure.


In a typical Java client-server scenario deployed a cloud environment, the response time for a service request is often constrained by the response time of the network connection between the requester and the service-offering host. The network-communication logic that enables interaction between remote endpoints commonly uses the Java sockets interface for connection establishment and data transfer. The Java sockets interface, by default, is implemented based on the POSIX sockets API. Each network operation must pass through the underlying operating system before it reaches the network interface. This requirement results in costly OS context switches and multiple buffer copies between software layers.

Dedicated TCP/IP protocol offloading engines that are part of special-purpose network interface cards (NICs) can be used to reduce these network-processing overheads. However, such offloading techniques still require some buffer-copying steps. As cloud adoption becomes widespread, many enterprise data centers are starting to migrate to 40Gbps Ethernet from 10Gbps Ethernet in their network links to address cloud computing's increasing bandwidth needs.

RDMA is a hardware-based protocol-offloading technology — originally proposed for high-performance network fabrics such as InfiniBand and high-speed Ethernet — that directly transfers data between two remote application memories without any involvement of either host processor. RDMA potentially eliminates costly OS context switches, saving significant numbers of CPU cycles. Because this message-based protocol is purposely defined for high-performance networks, the applications can take advantage of increased network speeds to achieve latencies below 10 microseconds.

With the advent of the RDMA over Converged Ethernet (RoCE) standard, the RDMA protocol can now be used directly on top of existing high-speed 10/40Gbps Ethernet infrastructure. So, by moving from the traditional TCP/IP stack to RDMA-based network processing, some cloud-based applications can see latency and throughput benefits while using fewer CPU resources.

SDP is a standard wire-based protocol defined for RDMA-supported network fabrics such as InfiniBand and RoCE to accelerate stream socket-based applications transparently. Starting with Java 7, the JDK ships with support for SDP on Linux and Solaris platforms. However, SDP is a kernel-based implementation that negatively impacts performance because of buffer-copying and context-switching overheads.

In the following sections, we introduce and describe JSOR, a completely user-space solution that can bypass the kernel to achieve performance comparable to that of similar native RDMA-based solutions.

About JSOR

JSOR is a cloud network acceleration feature that transparently enables RDMA communication for Java stream sockets when the underlying infrastructure supports RDMA. JSOR incorporates a high-performance RDMA wire protocol within the standard Java socket libraries. Currently, support is provided for the and APIs along with the associated input and output streams — so most existing Java client-server applications can benefit out of the box from the improved performance. (See Choosing JSOR, later in this article.)

JSOR design in brief

In a traditional cloud networking scenario, any interaction between access and service nodes ends up as packets flowing on the wire through one or more Ethernet switches, as illustrated in Figure 1.

Figure 1. Traditional cloud networking
Image shows                     traditional cloud networking
Image shows traditional cloud networking

Each network operation — whether it is connection- or data-transfer related — results in the invocation of one or more Java socket calls. Any socket operation performed at the Java level invokes a corresponding native (C or C++) library operation through the JNI layer. A certain amount of pre- and post-processing happens at the Java level before and after the call is executed by the JNI layer. Because the TCP/IP protocol is processed by an OS kernel stack, ultimately all JNI socket-specific methods result in context switches. Transferring or receiving also requires multiple buffer copies at the Java, OS, and NIC levels. Network-processing overheads such as multiple buffer copies and CPU context switches result in higher network latency and poorer throughput.

The JSOR library is compatible with the R-Sockets protocol, which is provided by the R-Sockets library included in the Open Fabric Enterprise Distribution (OFED). The JSOR library includes modifications to make it suitable for common Java application needs. It provides significant scalability, reliability, and serviceability improvements.

Compared to SDP and TCP/IP over InfiniBand (IPoIB), JSOR generally yields higher performance. In our experiments with microbenchmarks, JSOR could give as much as 50-percent higher throughput than SDP and more than 100-percent higher throughput than IPoIB. The better performance is primarily attributable to the fact that JSOR, as part of the standard Java class library, can optimize the Java socket implementation. For example, JSOR avoids data copies across the JNI boundary, better supports various socket semantics, and automatically tunes RDMA parameters based on socket usage patterns. And, whereas IPoIB and SDP are kernel-based transport solutions, JSOR is entirely in user space, so it can shorten the data path and reduce overhead in the kernel.

As Figure 2 illustrates, JSOR intercepts Java socket calls at the Java level and routes them through the underlying RDMA infrastructure. A JSOR enablement property must be specified during Java execution that points to an appropriate configuration file. When the switchover from TCP/IP to RDMA happens, all of the application's interactions with its remote counterpart flow through the underlying RDMA hardware.

Figure 2. Accelerated cloud networking
Image shows                     accelerated cloud networking
Image shows accelerated cloud networking

Using JSOR

Before you use JSOR, a few prerequisites in the cloud execution environment must be met:

  • The underlying host should have an appropriate host channel adapter (HCA) or RDMA NIC and be interconnected to the remote host by a high-performance InfiniBand or Ethernet switched fabric.
  • Each participating host should have the OFED 1.5.1 or higher base runtime libraries installed. Specifically, JSOR looks for the and libraries at execution time for dynamically loading function pointers.
  • Your user account should be entitled to adequate (preferably unlimited) lockable memory based on your application needs. JSOR socket buffers are memory-pinned by default, so the OS can't swap them out during the critical phases of data transfer. On Linux, use the
    ulimit -l shell command to display the maximum locked memory setting.

When these base requirements are met, a configuration file in plain text format is needed at both the client and server endpoints. Each record or line in the configuration file specifies an accept, bind, or connect rule and should contain a minimum of four fields separated by white space:

  • The first field indicates the type of network provider. Currently, only the rdma provider is available.
  • The second field specifies an accept, bind, or connect keyword, depending on which rule you are specifying.
  • The third field specifies a local IP address if the rule specified is accept or bind, or a remote IP address if the rule specified is connect.
  • The fourth field specifies a port or set of ports on which the RDMA traffic is allowed. Basically, the third and fourth fields together define a set of socket endpoints for RDMA-specific connection establishment and data transfer.
  • The fifth and subsequent fields apply only to an accept rule that specifies a list of client IP addresses while accepting incoming RDMA connection requests.

The configuration for the service (passive) side should have accept or bind entries, whereas the client (active) side configuration should have connect or bind entries.

For instance, to accept RDMA connections from the clients and on the service host through port 65444, the following rule is needed in the Java application server's configuration file (which we'll call rdma_server.conf):

rdma    accept    65444

Similarly, to request an RDMA connection from either of the clients to the service host listening on port 65444, the following rule is needed in the Java client application's configuration file (which we'll call rdma_client.conf):

rdma    connect    65444

Unless you explicitly bind to a specific local address, an ephemeral port will be used on the client side to establish connection with the service end. In the following example, a bind rule is added to the rdma_client.conf file to establish its end of an RDMA connection on port 65333:

rdma    connect    65444
rdma    bind        65333

The third field ( in the bind rule refers to the null address, and it defaults to the first available InfiniBand address on the local host.

When you have the configuration file ready, specify it as the value of the property during Java command execution. For example, on the passive (service) side:

java SampleServer args

On the active (client) side:

java SampleClient args

Listing 1 shows the portion of a SampleServer class that creates a server socket and waits for a connection from the remote end. When the connection is established, the server receives the specified number of bytes from the client and sends back the same number of bytes to the client in a single iteration. This receive/send step is repeated the specified number of times.

Listing 1.
 // Create server socket to listen on x.x.x.x address and x port
 ServerSocket server = new ServerSocket(Integer.parseInt(args[1]), 0, InetAddress.getByName(args[0]));
 Socket client = server.accept();
 // Receive and send message specified number of times
 for (int i = 0; i < xferCount; i++) {, 0, msgSize);
     out.write(msgBuf, 0, msgSize);

Listing 2 shows the portion of a SampleClient class that requests a connection with the remote service host. When the connection is established, the client sends the specified number of bytes to the server and receives the same number of bytes back from the server in a single iteration. This send/receive step is repeated the specified number of times.

Listing 2.
// Create client socket to connect x.x.x.x address and x port
 Socket client = new Socket(InetAddress.getByName(args[0]), Integer.parseInt(args[1]));
long startTime = System.nanoTime();
for (int i = 0; i < xferCount; i++) {
    out.write(msgBuf, 0, msgSize);, 0, msgSize);
 long endTime = System.nanoTime();

In, the whole send/receive sequence is timed so we can compute the round-trip time (RTT) for the total number of bytes.

Sample runs

We performed the following sample runs to compare the RTT for various protocols with a message size of 4KB and repetition count of 1,000. These sample runs were made on a test bed consisting of two IBM HS22 blade servers interconnected by a Voltaire 40Gbps InfiniBand switch. Each server runs Red Hat Enterprise Linux (RHEL) v61 and is powered by an 8-core Intel Xeon CPU L5609 @ 1.87GHz with 148GB memory plus a Mellanox MT26428 ConnectX VPI PCIe card.

JSOR — SampleClient log
$ cat rdma_client.conf
rdma connect 65444
$ java - SampleClient 65444 1000 4096
Client Ready>
Local: / Remote: /
SBuf: 32768 bytes RBuf: 45056 bytes
Round trip time of 4096000 bytes: 27313 usec
JSOR — SampleServer log
$ cat rdma_server.conf
rdma accept 65444
$ java SampleServer 65444 1000 4096
Server Ready>
Local: / Remote: /
SBuf: 32768 bytes RBuf: 45056 bytes
Received/Sent 4096000 bytes
SDP — SampleClient log
$ cat sdp_client.conf
bind * *
connect 65444
$ java -Dcom.sun.sdp.conf=sdp_client.conf SampleClient 65444 1000 4096
Client Ready>
Local: / Remote: /
SBuf: 8388608 bytes RBuf: 8388608 bytes
Round trip time of 4096000 bytes: 33836 usec
SDP — SampleServer log
$ cat sdp_server.conf
bind * *
connect 65444
$ java -Dcom.sun.sdp.conf=sdp_server.conf SampleServer 65444 1000 4096
Server Ready>
Local: / Remote: /
SBuf: 8388608 bytes RBuf: 8388608 bytes
Received/Sent 4096000 bytes
IPoIB — SampleClient log
$ java SampleClient 65444 1000 4096
Client Ready>
Local: / Remote: /
SBuf: 99000 bytes RBuf: 174752 bytes
Round trip time of 4096000 bytes: 98848 usec
IPoIB — SampleServer log
$ java SampleServer 65444 1000 4096
Server Ready>
Local: / Remote: /
SBuf: 99000 bytes RBuf: 174752 bytes
Received/Sent 4096000 bytes
TCP/IP over Ethernet — SampleClient log
$ java SampleClient 65444 1000 4096
Client Ready>
Local: / Remote: /
SBuf: 32768 bytes RBuf: 43690 bytes
Round trip time of 4096000 bytes: 194224 usec
TCP/IP over Ethernet — SampleServer log
$ java SampleServer 65444 1000 4096
Server Ready>
Local: / Remote: /
SBuf: 32768 bytes RBuf: 43690 bytes
Received/Sent 4096000 bytes

Table 1 shows the RTTs for each of the protocols that we tested.

Table 1. Round trip times for sample runs
ProtocolTotal bytes sent/receivedRTT (usec)

As Table 1 shows, JSOR performs better than the other protocols.

Tracing JSOR

When you run your cloud-based Java applications in JSOR mode, it's always important to verify that your application has chosen the RDMA path for connection establishment and data transfer. Because the JSOR enablement is intended to be transparent for the application, there's no straightforward way to check in the normal mode. However, you can enable a service-level view by turning on the IBM JDK's trace option. Preferably, you turn on both Java method tracing and JSOR/NET native tracing to get the complete picture. The typical way to invoke the trace option on your JSOR-enabled application is with the following invocation:

   -Xtrace:methods={java/net/RDMA*.*},iprint=mt,iprint=NET,iprint=JSOR main_class args

For example, we can rerun the SampleClient and SampleServer applications in JSOR mode with tracing enabled.

The SampleClient trace invocation is:

   SampleClient 65444 1000 4096

The SampleServer trace invocation is:

   SampleServer 65444 1000 4096

Listing 3 shows a portion of the generated trace logs for the two invocations.

Listing 3. JSOR sample trace log
04:26:27.500 0x21e3e100  mt.0 >java/net/RDMANetworkProvider.initialize()V Bytecode method, This=21e02468
04:26:27.500 0x21e3e100  mt.2  >java/net/RDMANetworkProvider.initialize0()I Native method, This=21e02468
04:26:27.501 0x21e3e100   NET.440  >initialize0(env=0000000021E3E100, obj=0000000021E73B40)
04:26:27.502 0x21e3e100  JSOR.0     >RDMA_Init()
04:26:27.502 0x21e3e100  JSOR.39    >initverbs()
04:26:27.502 0x21e3e100  JSOR.43     <initverbs(rc=0)
04:26:27.502 0x21e3e100  JSOR.46    >initjsor()
04:26:27.502 0x21e3e100  JSOR.47    <initjsor(rc=0)
04:26:27.502 0x21e3e100  JSOR.3    <RDMA_Init(rc=0)
04:26:27.502 0x21e3e100  NET.441 <initialize0(rc=0)
04:26:27.502 0x21e3e100  mt.8  <java/net/RDMANetworkProvider.initialize0()I Native method
04:26:27.502 0x21e3e100  mt.6  <java/net/RDMANetworkProvider.initialize()V Bytecode

JSOR has more than 200 trace hooks at the native level, so you could easily end up with large trace files even for small applications. For instance, when SampleServer and SampleClient are run in tracing mode, the trace logs are approximately 7.5MB.

Choosing JSOR

A caveat is that only network I/O-intensive and latency-critical workloads can benefit from RDMA. We encourage you to estimate the end-to-end latencies for your workloads before deciding to use JSOR. Two types of applications are likely to see more benefits:

  • Applications that transfer large amounts of data over long-running connections between distributed components. The time taken to establish a connection is somewhat longer, and the amount of off-heap lockable memory required is significantly greater in JSOR compared to traditional TCP/IP sockets.
  • Applications that do not allocate data buffers dynamically for each network communication. JSOR needs explicit buffer management — unlike TCP/IP, whereby buffers can be allocated dynamically as needed. If variations in application message sizes are minimal and the maximum size of a message is known in advance, then JSOR can allocate buffers statically.


This article introduced an IBM JDK feature called Java Sockets over RDMA (JSOR), available in the IBM Java 7SR6 for Linux/AMD64 and Linux/Intel platforms. We discussed the technology behind JSOR and how it compares with existing solutions based on TCP/IP and SDP protocols. We outlined the procedure for using the JSOR in a cloud-based environment with sample client-server programs and associated configuration files. By running the sample programs on our local test bed, we demonstrated that JSOR can offer better round-trip time compared to the SDP, IPoIB, and TCP/IP protocols. We also described service-level tracing options that can be used to verify whether the application endpoints are using the RDMA path for communication. Finally, we discussed for guidelines choosing applications that can benefit from JSOR.

Downloadable resources

Related topics


Sign in or register to add and subscribe to comments.

Zone=Java development, Cloud computing
ArticleTitle=Transparent network acceleration for Java-based workloads in the cloud