Common JSOR problems (Linux only)
Use Java™ Sockets over Remote Direct Memory Access (JSOR) to take advantage of high performance networking infrastructures such as InfiniBand. To use JSOR you must set up, configure, and tune various resources. If not done correctly, issues can occur.
The RDMA implementation, which was
previously deprecated is removed from IBM® SDK, Java Technology Edition, Version 8.
RDMA socket or thread creation failed
A JSOR problem can occur where a thread or Remote Direct Memory Access (RDMA) socket cannot be created. This problem can be caused by running concurrent connections over RDMA transport.
Possible causes
- RDMA socket buffers are by default pinned, or memory locked. A restricted memlock setting in your environment can result in a failure to create or register new RDMA sockets.
- When you are running concurrent connections, each RDMA socket implicitly uses a file descriptor for event tracking. If the maximum user open files limit is too low, socket creation can fail.
- When you are running concurrent connections, thread creation failure can be caused by a maximum user process limit that is too low. For more information, see the following technote: java.lang.OutOfMemoryError while creating new threads.
Mitigation
- To avoid socket creation failures, check your
ulimit -lsetting and change your memlock setting to an appropriate value based on the usage of the socket buffers. - To avoid socket creation failures when you are running concurrent connections, check your
ulimit -nsetting and change your nofile setting to an appropriate value based on the scalability requirements of the application. - To avoid thread creation failures when you are running concurrent connections, check your
ulimit -usetting and change your nproc setting to an appropriate value based on the scalability requirements of the application.
RDMA Network provider initialization failure
A JSOR problem can occur where the Remote Direct Memory Access (RDMA) network provider initialization fails on a 64-bit Linux operating system when you are running a 32-bit JVM.
During the RDMA network initialization stage, the JSOR runtime environment checks for the availability of compatible OFED runtime libraries. If the runtime environment cannot locate and load the librdmacm.so and libibverbs.so 32-bit libraries, you might see this problem. To avoid the problem, install the 32-bit OFED runtime libraries alongside the usual 64-bit libraries on a 64-bit Linux® machine.
RDMA connection failed
A JSOR problem can occur where a Remote Direct Memory Access (RDMA) client fails to connect to an RDMA server.
Client and server on the same host
If the client and server are on the same host, this behavior is expected because there is currently no support for RDMA loop back. For a successful connection, both client and server should be on different hosts connected by an InfiniBand switch through the RDMA network interface adapters.
Client and server on different subnets
The RDMA client and server should be on the same network, connected by a common InfiniBand switch and managed by a single subnet manager. If your RDMA client and server must be on different subnets, ensure that inter-network switching and packet forwarding is enabled at the hardware and software levels.
Client and server on the same subnet
If the client and server are on the same subnet, a connection failure could be caused by incorrect client or server configuration files, or an incorrect InfiniBand setup on one or both hosts.
Ensure that the rule entries in your configuration files are defined correctly, as described in -Dcom.ibm.net.rdma.conf (Linux only).
- Ensure that each host that is involved in the communication has an appropriate InfiniBand host channel adapter or RDMA network interface card with valid InfiniBand addresses (interfaces that begin with the prefix ib).
- Ensure that each InfiniBand port is active and that the maximum transfer unit is properly set. To check the maximum transfer unit, run one the following OFED runtime commands: ibstat or ibv_devinfo.
- Ensure that the ifconfig command lists all the InfiniBand interfaces, and that each interface has a valid IP address.
- Choose two valid InfiniBand addresses that are registered with the subnet manager for framing JSOR configuration rules, then verify that basic RDMA communication is possible between the host and client machines by running the rping command with your chosen InfiniBand addresses.
- Similarly, run the ibv_rc_pingpong command.
- Similarly, run the ib_read_bw and ib_write_bw commands.
RDMA connection reset exceptions
Concurrent Remote Direct Memory Access (RDMA) clients that try to send small chunks of data millions of times to a single RDMA server can throw connection reset exceptions.
Java Sockets over Remote Direct Memory Access (JSOR) employs the R-Sockets protocol as the basis for implementing socket-level APIs on top of RDMA. The R-Sockets protocol uses the send and receive queue sizes as a basis for implementing data flow and event control between sender and receiver. When several parallel clients try to send small amounts of data million of times, they might experience connection reset exceptions due to insufficient queue sizes. This behavior is because queue sizes dictate the amount of work that can be queued up on either side.
Because the default queue sizes are large (see JSOR environment settings (Linux only)), the tuning of queue sizes is necessary only in rare cases. You should determine the queue sizes based on the workload characteristics of your application. The maximum number and frequency of send and receive operations is particularly important. There is no general formula for determining optimal queue sizes.
RDMA communication appears to hang
The Remote Direct Memory Access (RDMA) communication between client and server appears to hang when you are running RPC-based workloads with unpredictable message sizes.
Java Sockets over Remote Direct Memory Access (JSOR) employs the R-Sockets protocol as the basis for implementing socket-level APIs on top of RDMA. To transfer data properly, the R-Sockets protocol requires both the sender and receiver to be coordinated. The receiver must be ready with a receive buffer available for the sender to put data in. This behavior differs from TCP/IP where buffers are allocated dynamically as required. RDMA receive operations fail if sufficient receive buffers are not available in advance. For more information, see the flow control section of the IETF draft of Remote Direct Memory Access Transport for Remote Procedure Call.
The JSOR implementation by default provides small send and receive buffers, which are less than 50 KB in size. When an RDMA client or server tries to send a large payload, for example 2 MB or 4 MB, in chunks of say 1 KB, in one direction without synchronized data flow between end points, the receive buffers can be exhausted, resulting in the hung situation. The R-Socket protocol tries to recycle the receive buffers, but if the rate of replenishment is less than the data send rate, progress is impossible. These effects are more pronounced when hundreds of parallel clients try to do the same operations on the same RDMA transport, because the clients compete for the same set of physical network resources. The R-Sockets protocol takes a long time to recover from this situation because it relies on retries and receiver-not-ready negative acknowledgements to make progress. In the worst case scenario, this behavior can result in a deadlock situation between end points.
Similarly, the size of the send buffer should be sufficient to transfer the data to the corresponding receive buffer.
Mitigation
For an Java RPC application, tune the buffer sizes before you deploy the application in a production environment. Set the buffer sizes according to the workload characteristics and maximum payload size of the application. There is no general formula for determining optimal buffer sizes. For more information, see JSOR environment settings (Linux only).
Enable application or runtime data transfer time outs that allow the client to cancel and try the data transfer again, with increased buffer sizes if necessary. See the following APAR for an example: PM52124: OutOfMemoryError errors on eXtreme Scale clients can cause the grid to fail. In this example, a lack of memory caused the server thread to get stuck in a socketWrite() method. The suggested resolution is to set the com.ibm.CORBA.SocketWriteTimeout property.
Problems encountered with the zero copy function
Java applications hang when the zero copy function is enabled
Due to the internal synchronization that is required between the data source and the data sink when you use the zero copy function, a client or server application might hang if you enable the zero copy function for only one endpoint.
- Avoid using the same socket descriptor for parallel read and write operations when you use the zero copy function.
- Ensure that the zero copy function is enabled on both endpoints. For more information, see -Dcom.ibm.net.rdma.zeroCopy (Linux only).
- Ensure that the same zero copy threshold values are set on both endpoints. For more information, see -Dcom.ibm.net.rdma.zeroCopyThreshold (Linux only).
Java applications are not using the zero copy function
Java applications might not use the zero copy function even after you specify the -Dcom.ibm.net.rdma.zeroCopy=true parameter.
- You specified the -Dcom.ibm.net.rdma.zeroCopy=true parameter.
- The buffer sizes that are passed inside Java read and write calls exceed the value that is specified by the -Dcom.ibm.net.rdma.zeroCopyThreshold parameter. For more information about this parameter, see -Dcom.ibm.net.rdma.zeroCopyThreshold (Linux only).
- socketRead0Direct
- socketWrite0Direct
- RDMA_ReadDirect
- RDMA_SendDirect
Java applications do not scale
The zero copy function is designed for large data transfers, a few at a time. Due to internal synchronization, and resource allocation and management overheads, the scalability is restricted by the size of the data that is transferred.
Ensure that the usage scenario is close to that of file transfer (FTP) type of data transfers in zero copy mode.
Problems encountered with fork compatibility mode
Several problems can be associated with operating in fork compatibility mode between Java clients and native forked servers.
Native server error message librdmacm: Fatal: unable to open RDMA
device
- POWER® PC systems with a Mellanox RDMA over converged ethernet (RoCE) MT26448 adapter
- Redhat Enterprise Linux (RHEL) v6.4
- MLNX_OFED_LINUX-2.0-3.0.0
If you encounter this issue, upgrade to the latest version of the operating system and OFED software. If the problem persists, consider upgrading to the latest version of the Mellanox RoCE adapter.
Java clients hang
In fork compatibility mode, Java clients can hang when systems connect to native forked servers. This problem is associated with the RSocket preloading library. Internally the library creates a named semaphore, /rsocket_fork, when processing fork() support. However, when complete, the R-Socket library does not remove the semaphore, which persists until the system is rebooted. Any stale link or value for this named semaphore from a previous invocation blocks the native server from accepting remote client connections.
To work around this problem, use the rm command to unlink the /rsocket_fork named semaphore before fork() preloading begins. On Red Hat Enterprise Linux (RHEL), you can find the named semaphores in the directory /dev/shm. These files have a prefix of sem..
Java clients do not scale
The RSockets protocol currently offers fork preloading support only for simple applications that run under ideal conditions. While preloading a forked process, the RSockets library uses blocking semantics to migrate a connection to RDMA..
The current support for the fork() method in the RSocket is therefore inherently non-scalable. Java multithreaded clients that try to connect to native forked servers by using the native interoperability function might experience a large number of failed connections. To mitigate this problem, increase the client connection retry count to more than one.