Java Incremental Deserialization/reserialization (JID) provides a near-ideal solution for updating serialized data structures. On an Intel I7, an entry can be inserted in the middle of the byte array of a serialized list with 100,000 entries in 2.4 * 10^-4 seconds (240 microseconds or about a quarter of a millisecond). And an entry can be updated in the byte array of a serialized map with 100,000 entries in 4.8 * 10^-4 seconds (480 microseconds or about half a millisecond).
To achieve this level of performance, JID is not reflection based, nor does it support cyclic data structures. JID instead requires that serializable data structures be built with objects which are instances of the Jid class. Each type of object used in a serializable data structure must also be registered.
All code is LGPL, so there are effectively no constraints on usage.
The JASocket package makes it easy to create distributed, scalable software. JAConfig, in turn, makes it easy to manage in production.
The Config DB
JAConfig provides a fully replicated non-transactional, eventually consistent, key/value pair database for maintaining both configuration data and operator passwords. The database also provides change notifications, so servers can react to configuration changes. Every node in the cluster has a copy of this database both on disk and in memory, ensuring that the database is fully robust and supports fast queries. And there is neither a separate log file nor any need for a recovery mechanism--on startup, if the database is not valid its contents are discarded.
The underlying assumption of the database is that changes are infrequent, and that the system clocks of all the nodes in the cluster all have roughly the same time. Key/value pairs in the database always carry the timestamp of when the last change was made. Changes are propagated across all nodes in the cluster and shared when a node connects to another node. For each key, the change with the latest timestamp is retained.
The database is peer-based. So there is no single point of failure. And because there is no master copy, there are no warm or hot backups and fallover time is effectively 0. On the flip side, status information is completely out of scope, as frequent updates will break the underlying assumptions.
A cluster can be split into 2 or more smaller clusters by something as simple as a loose cable. If these smaller clusters act independently, inconsistent results can occur. This is managed by knowing the total number of host computers in the cluster and only allowing some activities to occur the the number of hosts currently connected to a given cluster is equal to or grater than (totalNumberOfHosts / 2) + 1. Clusters connected to this number of hosts have what is called a quorum and as there can not be two sub-clusters with a quorum of hosts, at most only one sub-cluster will be active.
Note that we are talking about a quorum of hosts rather than a quorum of nodes. Multiple nodes can run on each host and indeed there may be a large host which runs many of the nodes in a cluster. So if the quorum was based on nodes, it is possible that all the nodes of the quorum are running on the same host, which creates a single point of failure.
Cluster and Host Server Managers
At this time there are two types of server managers, the Cluster Manager and the Host Manager. These managers are started (and monitored) by a kingmaker server, which runs in every node. The kingmaker servers are responsible for having one cluster manager running in the cluster and one host manager running on every host.
The cluster manager uses the data in the config database to start and monitor a number of cluster servers, where each type of cluster server has only a single instance running somewhere in the cluster. The cluster manager and all the cluster servers stop running when the node is not a part of the active cluster (a cluster with a quorum of hosts).
Similarly, the host managers use the data in the config database to start and monitor a number of host servers, where each type of host server has only a single instance running on each host. Unlike the cluster manager and servers, the host manager and servers are unaffected by quorum considerations.
The cluster and host managers use a server named ranker to determine which node to use when starting a server. A simple ranker is provided which provides a list of nodes ordered by the number of servers running on each node. Alternative ranker implementations can be used as they are developed.
laforge49 270005CXQG Tags:  api jactor actor basic continuation multi-threading java 4,559 Views
Here we cover the basic API of JActor, which is very easy to use as you will see from the examples provided. --Bill
Actors extend the JLPCActor class. The methods of an actor need not be thread-safe, as they are always called from the appropriate thread.
Actors interact by sending requests. But before sending or receiving any requests an actor must be assigned a mailbox, which manages its input and output queues.
Actors methods are of two types: synchronous and asynchronous. Asynchronous methods can use an exception handler and can send messages to other actors, while synchronous messages can not.
Asynchronous methods are distinguished by the presence of an RP parameter, a callback which is used to return the result of method. And the return value of an asynchronous method is always void.
Actors which use the same mailbox can call each other's synchronous methods. This is because they always share a thread.
Use the initialize(mailbox) method to assign a mailbox to an actor. Or use the initialize(mailbox, actor) method to also inject a dependency. Actors with an injected dependency "inherit" the request processing abilities of the injected actor, recursiveSo when an inappropriate type of request is sent to an actor, it will actually be sent to the appropriate injected dependency.
The getMailbox() method returns the mailbox assigned to an actor and the getMailboxFactory() method returns the factory used to create mailboxes.
The getParent() method returns the injected dependency, or null. The getAncestor(actorClass) returns the dependency which implements the class provided, or null.
A mailbox has an inbox of incomming messages (requests and responses) that have been sent to the various actors which have been assigned to it, as well as an outbox of messages for each target actor to which those actors are sending messages. (Outgoing messages are organized into message blocks prior to sending them as a means of increasing message throughput.)
Mailboxes are tasks which are assigned to a thread from a threadpool when a message is sent to a mailbox with an empty inbox.
Most mailboxes also support commandeering, which allows the thread of an actor sending a message to process the message directly. But commandeering can only be done when the mailbox of the target actor has not already been assigned to a thread.
Messages are normally only sent when the last message in the inbox has been processed, but calling the sendPendingMessages() method forces all the messages in the outboxes to be sent.
The setInitialBufferCapacity(int) method is used to set the initial capacitity of new outboxes, which otherwise defaults to 10. (Outboxes are implemented as array lists.)
The isEmpty() method returns true when the inbox holds no additional messages that need processing.
The getMailboxFactory returns the factory used to create mailboxes.
The JAMailboxFactory is responsible for managing a thread pool and for creating mailboxes. It has a convenience constructor with an int parameter that specifies the number of threads in the thread pool.
The close() method closes the thread pool.
The createMailbox() returns a new mailbox that can be commandeered.
The createAsyncMailbox() returns a new mailbox that can not be commandeered. This is used to force message processing of all the actors assigned that mailbox to process the message on a thread that is not the same as the thread used by the actor which sent the message--providing the source and target actors have not been assigned the same mailbox.
The addClosable(Closable) and removeClosable(Closable) methods add and remove Closable objects from a list. When the close() method is called, the Closable.close method is called on all of the objects in this list.
The first time the timer() method is called, a java.util.Timer object is created. All calls to this method then return the same Timer object. And when the close() method is called, it in turn calls the Timer.cancel() method.
All requests passed between actors must extend the Request class, a generic class that takes 2 parameters: RESPONSE_TYPE, which is the class of the result returned when the request has been processed successuflly, and TARGET_TYPE, which a class that the target actor is an instance of.
The isTargetType(targetActor) is an abstract class that must return true when the targetActor is an instance of TARGET_TYPE. (Subclasses of Request must implement this method.)
The processRequest(targetActor, rp) is an abstract class that typically calls a method on the target actor. (Subclasses of Request must implement this method.) For asynchronous actor methods, the rp parameter is simply passed to the actor's method. But for synchronous actor methods, the rp.processResult(result) method must be called with the result returned by the actor's method. (If a synchronous method on the actor has a return type of void, then a result of null must be returned.)
(Note that the processRequest methos is always called from an appropriate thread so as to maintain the thread safety of the target actor. Typically this is the same thread being used by the actor which is sending the message, but not always.)
The send(sourceActor, targetActor, rp) method is used for sending a 2-way request message from one actor to another, with rp being a callback for handling the result. (The rp.processResponse message is always called from an appropriate thread so as to maintain the thread safety of the source actor.)
The send(jaFuture, targetActor) method is used for sending a 2-way request message from non-actor code to an actor. This method waits until it receies a result, which is then returned.
The sendEvent(sourceActor, targetActor) method is used to for sending a 1-way request message from one actor to another.
The sendEvent(targetActor) method is used for sending a 1-way message from non-actor code to an actor.
Download Jactor-4.5.0.zip from here. (The JActor version will change—current version is 4.5.0.) Then extract the jactor-4.5.0.jar file and copy it to a directory, GettingStarted. You will also need some slf4j jar files.
Create a j.bat file in the GettingStarted directory:
The j.bat file can be used to compile and run a test.
Sending a Message to an Actor
Here is the main method which creates a Test actor and sends a Start request to it.
The Start request is a singleton, as it has no parameters and is impermiable. The result returned is always null, so the RESPONSE_TYPE is Object. And the TARGET_TYPE is Test. The Start request calls the synchronous method Test.start().
The Test actor sports a single synchronous method, start.
Calling a Synchronous Method on Another Actor
There are times when one actor can directly call the synchronous methods of another actor. For example, when initializing the other actor or when both actors use the same mailbox.
Below we have modified the Test actor to create a Greeter actor which shares the same mailbox and then print the greeting returned by the Greeter.greet() method.
The Greeter actor has only a single synchronous method, greet.
Sending a Request to Another Actor
Instead of the Test and Greeter actors using the same mailbox, we will have them use different mailboxes, and have the Test actor send a Greet request to the Greeter actor. But as only asynchronous methods can send messages, we will need to change the parmeters of Test.start, which means changing the Start request as well.
The change to the Start request is minor--we only need to change the processRequest method.
Changes to the Test actor are a bit more interesting. In addition to creating another mailbox for initializing the Greeter actor, we must use a callback to receive the value returned by the Greet request.
We also need to define the Greet request, a singleton which calls the synchronous method Greeter.greet() and then sends back the returned value.
Composing Actors with Dependency Injection
Dependency injection can be used to compose cactus stacks of actors, with actors able to send messages that will be processed by the appropriate ancestor. (In a cactus stack the links point towards the root, which is the reverse of a tree.)
We will define a simple Printer actor which has a Print request, and then inject this actor into a modified Greeter actor which prints its own greeting.
Test now creates and initializes a Printer actor, and then creates a Greeter actor initialized with the Printer actor as its parent. A Print request is sent to the Greeter actor and, on completion, a Greet request is sent.
The Greeter actor now sends a Print request to itself, passing the rp parameter on so that on completion of the Print request the Greet request also completes.
The Printer actor is quite simple, having a synchronous method which prints the value that is passed.
Print is the first request that we have looked at which is not a singleton. This is because the Print request must hold the value to be printed.
Forcing Parallel Operations
Whenever possible, messages are passed and processed on the same thread. This achieves the highest performance most of the time. But if you need to have an actor which operates on a separate thread, you need only assign it an asynchronous mailbox.
The modified Test actor above creates two Timer actors, initializes each of them with its own asynchronous mailbox. And then sends a Delay request to each of them, but with the same callback, prp.
The prp callback receives the responses from the two Timer actors and, on receipt of the second response, prints the elapsed time and sends back a null response to the Start request.
The Delay request calls the synchronous Timer.delay(ms) method.
The Timer actor simply blocks the thread it is assigned to for a number of milliseconds.
The total elapsed time to run two timers in parallel for 1000 milliseconds each was 1003 milliseconds. And the reason for the additional 3 milliseconds is due to the inherent latency of passing messages between threads.
When a request is received, it comes with an RP continuation. Calling the processResponse(rsp) method returns the response to the actor that sent the request in a thread safe way.
Continuations can be saved and a response sent at a later time. The only restriction here is that when sending a response it must be done within the context of the actor that received the request. (Or another actor with the same mailbox.)
The Test actor creates and initializes the Greeter actor, sends a Greet request, prints "trigger...", sends a Trigger event, and then, on receipt of the response to the Greet request, it prints the greeting.
The Trigger request simply calls the synchronous method Greeter.trigger(). There is nothing special about this request to make it an event request. Only it is used as an event by the Test actor which does a Trigger.sendEvent rather than a Trigger.send.
The Greeter class saves the continuation on receipt of a Greet request. And on receipt of a Trigger request it uses that saved continuation to send back the requested greeting.
laforge49 270005CXQG 1,054 Views
The JAFactory actor binds actor type names to actor factories, allowing type names to be used in place of class names for serialization/deserialization. There are 3 reasons for this:
The JLPCActor.initialize(Mailbox mailbox, Actor parent, ActorFactory factory) method is used by JAFactory to initialize the actors it creates. And the JLPCActor.getActorType() method returns either the name of the actor type or null if the actor has not been assigned an actor factory.
All actor factory classes, which are used to create, configure and initialize one type of actor, must extend the ActorFactory class:
When binding an actor type to an actor class, JAFactory uses a default actor factory:
All of JAFactory's methods are thread-safe. This is because JAFactory uses a ConcurrentSkipListMap<String, ActorFactory> to bind type names to actor factories. And there are no Request classes defined to call methods of JAFactory, so you must call the methods directly or via one of the static methods that have been provided.
To define an actor type without an actor factory class, use the method defineActorType(String actorType, Class clazz). Alternatively, the method registerActorFactory(ActorFactory actorFactory) can be used when there is an actor factory class.
The static method JAFactory.getActorFactory(Actor actor, String actorType) returns an ActorFactory, where actor is either a JAFactory or has a JAFactory ancestor. But if no actor factory is found for the given actor type, then an IllegalArgumentException is thrown.
A number of static convenience methods are provided which return a new Actor:
When no parent is provided for dependency injection, the JAFactory which bound the actor type is used. And when no mailbos is provided, the mailbox of the JAFactory which bound the actor type is used.
Finally, here is a simple unit test for JAFactory:
laforge49 270005CXQG 868 Views
JActor2 is a multi-threaded OO programming model, inspired by Alan Kay's early thoughts on Objects. JActor2 is based on asynchronous 2-way messaging with assured responses. The net result being code that is both simpler and more robust, and hence easier to maintain.
JActor2 has been under development for a year. It has a simpler API and better documentation than JActor.
For more information, see here.
laforge49 270005CXQG 1,323 Views
"An Object Oriented Model for Robust Multi-threaded Programming", which introduces JActo2, has been published by Java Magazine: http://javamag.org/developing-java-applications-issue-on-java-is-out/
JASocket is a lock-free, scalable and robust server framework with no single point of failure. Servers are run on a cluster of nodes and interact with other servers using mobile agents, which reduces the number of messages and thus reduces the overall system latency. Administration is handled via ssh.
Latency for agents sent between 2 JVM’s on the same machine (round trip): 84 microseconds. Throughput for agents sent between 2 JVM’s on the same machine: 137,000 per second.
JASocket is licensed under LGPL. The project page can be found here. You can also download both the source and the JAR files here. Maven users will find also this project in the Central Repository.
JASocket supports both requests/responses (2-way messaging) and notification events (1-way messaging), where requests and notification events are mobile agents and responses are actors. (The difference is that actors do not receive a Start request when deserialized on arrival.) The idea is that an agent can interact with any number of other actors when it arrives at its destination and the actor sent as a response can hold the requested information or serve as a smart proxy for subsequent requests. This use of agents and actors in place of simple messages then holds significant potential both for reducing the volume of traffic and for reducing the number of message exchanges, thereby increasing the throughput and reducing the latency of the cluster as a whole.
Will JASocket run in the Cloud?
Administration is via ssh, which is compatible with use in a cloud. The JAConfig project builds on JASocket and provides basic management capabilities, but has no dependencies on any Cloud API.
How does a Node learn about other Nodes in the Cluster?
Nodes periodically send a multicast UDP packet containing its cluster port number. When a node receives one of these packets, it checks to see if it already has a connection. If not, it opens a connection and sends a SetClientPortAgent that will associate its server port number with the connection in that remote node. Connections then are bi-directional and every node is connected with all other nodes, so long as all nodes are under a common NAT (if any)--this technique is not compatible with port mapping.
Does JASocket detect Network Failures?
Using TCP, a connection to another node is closed (with an EOF) when the other node terminates normally. But a network failure,a cable disconnection for example, is not detected even when the TCP KEEP ALIVE option is enabled. To detect network failure, a KeepAliveAgent is sent on all connections which have not sent any messages/actors in the last N1 milliseconds and connections are closed which have not received any actors/messages in the last N2 milliseconds. (N2 is generally a multiple of N1 and must be large enough to accomodate both Java garbage collection and any periodic resource starvation imposed by the [MS Windows] operating system.)
Does JASocket use Nagle's Algorithm?
Nagle's algorithm is used to improve TCP socket throughput, but also increases latency. It is enabled by default and is turned off through the use of the TCP NODELAY option--which JASocket does. Rather, JASocket does dynamic buffering, aggregating outgoing traffic until either the output buffer is full or the actor has no pending input. JASocket latency between nodes running on the same machine is 42 microseconds, including serialization and deserialization.
How are Servers located?
Servers running on a node are registered on that node with a unique name. The names are shared across nodes, when a server is registered (startup), when a server is unregistered (shutdown), and when a new connection is made between nodes. And when a connection is closed, all the names of all the servers residing on the (now unreachable) remote node are dropped.
Server names can be used in place of node addresses when shipping an agent to another node. If the same name is used to register resources on multiple nodes, an arbitrary choice is made.
laforge49 270005CXQG 1,242 Views
All Jid objects have the same superclass, Jid, which in turn is a subclass of JLPCActor, which means that all Jid objects are actors.
So far, we have not given any examples of a Jid object initialized with a Mailbox, which means that none of the Jid objects shown are able to send or process messages. But initializing a Jid object with a Mailbox is easy to do and most of the methods in the JID API have corresponding Request classes. Also, the Jid objects in a Jid tree structure will always share the same mailbox, so an application Jid never needs to send Requests to the Jid objects in its tuple--it can just call their methods directly.
In the code below we create a RootJid with a JidString set to "Hello world!", serialize it and then deserialize it. Many of the method calls shown earlier have been replaced with request messages to illustrate their use. However, the serialization and deserialization logic still uses method calls, which means that thread safety is the responsibility of the application developer for these operations. (Thread safety can always be achieved by performing these operations within an actor which uses the same mailbox as the Jid Actor.)
JASocket's mobile agents are subclasses of the AgentJid class:
Agents then are actors which build on JID for fast serialization/deserialization. All agents support the StartAgent request, which is sent to an agent when it arrives on its destination host:
Agents are initialized with their own mailbox. And unless the agent overrides the async method, their mailbox will be an async mailbox.
Agents which have been shipped to another node are initialized with the AgentChannel that delivered them as their parent. Otherwise the parent is the AgentChannelManager, in which case the isLocal method returns true.
The isLocalAddress method can be used to determine if a given node address is for the node where the agent is currently executing.
Shipping an Agent to another Node
A ShipAgent request is used to ship an agent to another node:
A number of agents are used internally to implement the JASocket cluster: SetClientPortAgent, KeepAliveAgent, AddResourceNameAgent and RemoveResourceNameAgent.
SetClientPortAgent is sent when a channel is opened to identify the server port of the node which opened the channel.
KeepAliveAgent is sent on otherwise idle channels to keep them from being closed.
When a server is registered in a node, AddRemoteServerNameAgent is sent to all other nodes.
Similarly, when a serve is unregistered on a node, RemoveServerNameAgent is sent to all other nodes.
laforge49 270005CXQG 1,066 Views
Recently posted on Code Plex: Microsoft's Orleans, now available as a preview. http://lnkd.in/dh4G-zq. Orleans is a validation of the actor model used by JActor2. For example, Orleans actors, like JActor actors, lack the failure modes of traditional actors and consequently do not have monitors. And really there is no good reason for actors to fail except that the traditional actor model makes it difficult to maintain systems of actors that do not experience deadlocks. More on this later.
The HelloWorld class implements a very simple Server:
The serverName method returns the name of the server, which is published to all the nodes in the cluster.
The startServer method is called to start the server. In the case of the HelloWorld server, this method defines a server command, hi, and then performs the default server initialization.
Finally, the main method has been included to show how to run a node and start an initial server by calling the Node.startup method.
ServerCommand Base Class
Server commands, like HelloWorld's hi command, extend the ServerCommand class:
Every server command has a name, a description, and an eval method.
Command arguments are passed to the eval method as a String, which may be empty.
Command output is created using the out.print method, and the out object is returned as a response by calling the rp.processResponse method.
Some commands need to process user interrupts (^C) so that they can deliver partial results when one of the nodes is slow to respond. These commands subclass InterruptableServerCommand:
Server Base Class
All servers extend the Server class:
The getOperatorName method returns the name of the operator [or server] which started the server.
The runTime method returns the length of time the server has been running.
The startupArgs method returns the args string passed to the startup method.
The serverName method provides a default name for the server--the full class name. This method can be overridden to provide a more user-friendly name.
The node method returns the Node object.
The agentChannelManager method returns the AgentChannelManager, which provides an API for accessing other servers both on the same node and on other nodes.
The registerServerCommand method is used to register server commands.
The startup method is used to initialize and run the server. The MailboxFactory.addClosable method is called to ensure that the server's close method is called when the node is halted. A RegisterServer request is also sent to register the server with the local node and to publish the server's name with all the nodes in the cluster.
The startup method then calls the startServer method, which registers the help and shutdown commands. This method is overridden when the server supports additional commands or needs to perform additional initialization to start running the server.
The close command is called when the node is halting gracefully and when the server is being shutdown. In many cases this method must be overridden to close files or sockets and halt any ongoing processes.
Server command requests are passed to the server by a EvalServerCommand request, which in turn calls the evalServerCommand method.
The serverUserInterrupt method is called when a user interrupt is passed to a server. This method then forwards the interrupt to the server command.
The resisterShutdownCommand and registerHelpCommand methods define and register the shutdown and help commands respectively.
Finally, methods are provided for interacting with the operator which invoked a server command.
JASocket contains a number of commands, some are for there as an aid in managing the cluster and others are there to illustrate how they work. Here we look at the implementation of some of those commands to aid you in the implementation of your own.
The toAgent command is of particular interest as it is used to send commands to other nodes. The arg string consists of the node address (or resource name), the name of another command and [optionally] the arg string of that other command. This command removes the address from its arg string and creates an EvalAgent initialized with the remainder of the arg string. The EvalAgent is then shipped to the designated node.
This command shuts down the node. This is especially interesting when used with the to command, as it causes the channel to the remote node to be halted while a result is pending.
The exception command just raises an exception to show how an exception is handled.
The channels command lists the accessible remote nodes.
The servers command lists the names of the accessible servers for all the nodes in the cluster.
The localServers command provides information about all the servers running on a node.
The latencyTest command measures the time it takes to send a KeepAliveAgent to another node and get a response. This command has an optional argument--the number of times the request/response is to be performed.
The throughputTest command measure how quickly a number of messages can be sent to another node and get their responses.
The help command lists all the commands with a brief description of each command.
The startup command is used to start a server, given the full server class name and any arguments needed by that server.
The server command is used to send a command string to the named server.
The pause command simply completes after a number of seconds. It is implemented using a TimerTask and a Continuation.
The write command is used to send a message to another operator that is logged into the node where this command is run. The operator may be logged in via ConsoleApp or via SSH.
BroadcastAgent and BroadcasterAgent
The broadcast comman sends a message to all operators logged in on any node in the cluster.
WhoAgent and WhoerAgent
The who command lists every operator logged in on any node in the cluster. The display includes the operator name, node where the operator is logged in, how long the operator has been logged in, the number of commands entered and how long it has been since the last command.
Actor-based programming is quite appealing and holds a lot of potential.
Developers new to actor-based programming quickly become enthralled. It is event-base programming, but with thread safety and modularity. The code is clean and easy to write--as long as the actors are kept small and focused on a single concern. But trouble starts when you start dealing with performance issues.
You try writing a simple benchmark and, as soon as you crank up the numbers you get an out of memory exception. Oh! Actors use one-way messages and there is no flow control. So you add ACK messages to your actors. Suddenly everything is just a bit less clear and application code is a bit more difficult to write.
Now you have your benchmarks working and look at the numbers. Disaster! Actors pass all messages between threads, and that introduces a huge amount of latency. You ask some developers who have more experience with actor-based programming and you are told that you are using actors the wrong way.
A well designed actor-based application minimizes the number of messages that are passed between actors, at least where performance is important. So you stop using small actors which address only a single concern. You loose modularity and clearity. And your actors become big bowls of spaghetti. Actor-based programming becomes ever so much less interesting.
On the other hand, we can change the way actors work, and realize the real potential of actor-based programming. We can introduce 2-way messaging through the use of continuations. And we can minimize the passing of messages between threads. Then we can use small, tightly focused actors with clean/readable code without having to give up performance.
There are a number of problems with Java serialization and numerous alternatives have been developed. But my focus here is on a particular use case, databases, and a single issue, performance.
Databases generally work with very large byte arrays. This is because seek time is very slow compared to the data transfer rate, so working with larger byte arrays often results in a performance gain. This is true for both hard disks and Solid State Disks (SSD). On the other hand, deserializing very large byte arrays is very CPU intensive. So Java databases optimize the size of the byte arrays they use to balance between the performance of the disk and the performance of the CPU. Fortunately the price of RAM has dropped significantly over the years so large memory caches can be used to hold the deserialized data, which reduces both the need to repeatedly read and deserialize the same data. Unfortunately there is still a need to frequently reserialize the updated data, which is also CPU intensive, and write it back to disk because of the transactional nature of many databases.
Significant performance gains are achieved by not using Java serialization, but working more closely with the data, reading and writing the binary form of integers, floats, strings and the like. This is not difficult to do, especially as most databases work only with well-defined tables where all the data in a given column is of the same type. But still, when reading or writing the data to disk, the entire byte array must be deserialized and subsequently reserialized. Working directly with the binary data is much faster than using Java serialization. But this is still a CPU-intensive process. And the irony is that many database transactions only access or update a miniscule amount of data.
Now in an ideal world, we would only deserialize data as needed, and then only reserialize the data that has changed. Doing this will mean that we can work with much larger byte arrays, resulting is an overall improvement in Java database performance. The data structures for doing this efficiently may be somewhat complex, but that should not be an issue so long as the API is reasonable. The term I use for this technology is JID, or Java Incremental Deserialization/reserialization.