laforge49 270005CXQG 691 Visits
I see two kinds of actor models, the traditional actor model used by virtually all actor frameworks and the robust actor model used by JActor2 and Microsoft's Orleans project. But before we get into that, we need to talk just a little bit about systems like Python Twisted and Node.js. Both of these systems are event-based application platforms, where the events are processed by a single thread but with many services operating on separate threads. Both Twisted and Node.js deliver high-performance and, being largely single-threaded, provide implicit thread safety without the need for locks.
The key differentiator between actors and twisted/node.js is the extensive use of multi-threading in actor frameworks. All these frameworks work by message/event passing. But the use of events in a single threaded environment is proven technology. With actors, we do have the potential of making better use of today's multi-core processors, but single-threaded processes typically run faster than all but the most well-designed multi-threaded processes. And there is a significant down-side in the use of actors in terms of coupling arising from multiple event queues (one per actor) and out-of-order event processing.
The issue arises in part from the obligation of an actor not to block its thread, as actor platforms often run on a minimum number of threads and such blocking could easily lead to thread starvation of other actors and even deadlocks. Actors do not block when passing events to other actors, except for 2-way messages (request/response). The common actor model only supports replies, if it does so at all, by either blocking the actor's thread or by event filters.
What happens is that when one actor receives a request that requires consultation with other actors, it will temporarily select only events from the other actors it is consulting with until it receives the responses it needs to complete that initial request. Event processing then is not always in the order the events are received, the events being selected being based on the actor's state. This means that actors will often be coupled to other actors, with expectations as to how those other actors will behave. And coupling is fine, if it is well documented, isolated to small sub-systems, and the project is reasonably short-lived and/or reasonably stable.
The actor model used by both JActor and Orleans is a bit different. One of the key differences is that non-blocking 2-way messaging (request/response) is fully supported. Events are still supported, but are used mostly for things like notifications. And for 2-way messages, the actor's expectation is only that a response will be received within a reasonable time. And if an error occurs, then that error serves as the response.
With JActor/Orleans, responses are dispatched quite differently than events and requests. In JActor, a callback is provided when sending a request and that callback will be executed on the actor's thread. (Keeping in mind that the thread of an actor may change over time, but it only ever has one at most.) In contrast, any distinction in the common actor model between a request and a response must be done by the application itself. And to keep the application complexity reasonable, this generally means event filtering.
This is a really wonderful talk by someone who as worked with actors for a very long time. He makes a strong case for why message filtering can not reasonably be avoided when using the common actor model.
Here I explain the common actor model by way of an analog implementation that uses locks. My reason for doing this is to get people in a strong enough position that they can reason about actors. Actors are too often explained axiomatically, which leaves people with only a shallow understanding of actors.
laforge49 270005CXQG 619 Visits
Recently posted on Code Plex: Microsoft's Orleans, now available as a preview. http://lnkd.in/dh4G-zq. Orleans is a validation of the actor model used by JActor2. For example, Orleans actors, like JActor actors, lack the failure modes of traditional actors and consequently do not have monitors. And really there is no good reason for actors to fail except that the traditional actor model makes it difficult to maintain systems of actors that do not experience deadlocks. More on this later.
laforge49 270005CXQG 985 Visits
"An Object Oriented Model for Robust Multi-threaded Programming", which introduces JActo2, has been published by Java Magazine: http://javamag.org/developing-java-applications-issue-on-java-is-out/
laforge49 270005CXQG 432 Visits
JActor2 is a multi-threaded OO programming model, inspired by Alan Kay's early thoughts on Objects. JActor2 is based on asynchronous 2-way messaging with assured responses. The net result being code that is both simpler and more robust, and hence easier to maintain.
JActor2 has been under development for a year. It has a simpler API and better documentation than JActor.
For more information, see here.
laforge49 270005CXQG 630 Visits
To run JASocket you will need the JAR files for compatible versions of JActor, JID, JASocket and JFile. Download these projects from here and extract the jar files to a common directory, in our case c:\jaconfig. You will also need jar files from slf4j, sshd, joda-time and jline.
Next you need a few shell scripts. Here are some which work with windows:
Your directory should now look something like this:
The basic configuration of JAConfig is handled by JACNode. If you are familiar with JASocket, this code should be largely self explanatory:
Open a command window, go to the directory containing the jar files (you will need to be able to write to this directory) and enter the node command:
Now log in with the name admin and password admin.
Enter help to see the list of commands. There is nothing new here, as the commands are all implemented in JASocket:
Now we can use the localServers to see what is running on the local node:
We see that the config, kingmaker, quorum and ranker servers were started by node (class JACNode), while hostManager was started by kingmaker.
Authentication of operator passwords is handled by the config server, so a good starting point is to see what commands this server supports:
The very first thing we should do is to change the password for admin, and then test it.
The config database now holds the new admin password.
Accounts for operators are created just by assigning a password. Any operator can do this, but you will need the admin password. Note however that there is no difference between creating an operator account and changing another operator's password.
A quorum requires the participation of N/2+1 hosts. So with one host, you need one host to achieve a quorum. With two hosts you need both hosts running to achieve. But with three hosts, only two hosts are needed. Fault tolerance then is only achieved when running with 3 or more hosts.
The quorum server tracks the number of hosts that are available and gets the total host count from the config database--which must be manually assigned.
As soon as the total host count is set to 1 we achieve a quorum. The kingmaker, which was listening to the quorum server for a quorum notification, immediately starts up the cluster manager server.
Setting the totalHostCount back to 2 means we loose quorum and clusterManager stops running:
JAConfig has an alternate implementation of SSH server that subclasses HostServer. This allows us to use the host manager to run one copy of the SSH server on every host.
The port number used by the SSH server, 8889, is specified in the assignment to hostManager.ssh.
Multiple Nodes per Host
Starting a second node on the same host, we need an unused port. But when specifying the port, a console is not opened.
When more than one node runs on the same host, only one instance of the host manager is run, as well as only one instance of each host server.
Running a node on another host establishes quorum. In this case we will explicitly specify a port of 8880 to prevent the opening of another console:
The new host runs an instance of host manager, and an instance of each of the host servers. A single instance of the cluster manager is now running in the cluster.
As we define new cluster servers, one instance of each is run on the cluster. The nodes chosen to run the new servers are those least loaded.
Taking down the second node on the first host, the load has been rebalanced between the two remaining nodes.
The JASocket package makes it easy to create distributed, scalable software. JAConfig, in turn, makes it easy to manage in production.
The Config DB
JAConfig provides a fully replicated non-transactional, eventually consistent, key/value pair database for maintaining both configuration data and operator passwords. The database also provides change notifications, so servers can react to configuration changes. Every node in the cluster has a copy of this database both on disk and in memory, ensuring that the database is fully robust and supports fast queries. And there is neither a separate log file nor any need for a recovery mechanism--on startup, if the database is not valid its contents are discarded.
The underlying assumption of the database is that changes are infrequent, and that the system clocks of all the nodes in the cluster all have roughly the same time. Key/value pairs in the database always carry the timestamp of when the last change was made. Changes are propagated across all nodes in the cluster and shared when a node connects to another node. For each key, the change with the latest timestamp is retained.
The database is peer-based. So there is no single point of failure. And because there is no master copy, there are no warm or hot backups and fallover time is effectively 0. On the flip side, status information is completely out of scope, as frequent updates will break the underlying assumptions.
A cluster can be split into 2 or more smaller clusters by something as simple as a loose cable. If these smaller clusters act independently, inconsistent results can occur. This is managed by knowing the total number of host computers in the cluster and only allowing some activities to occur the the number of hosts currently connected to a given cluster is equal to or grater than (totalNumberOfHosts / 2) + 1. Clusters connected to this number of hosts have what is called a quorum and as there can not be two sub-clusters with a quorum of hosts, at most only one sub-cluster will be active.
Note that we are talking about a quorum of hosts rather than a quorum of nodes. Multiple nodes can run on each host and indeed there may be a large host which runs many of the nodes in a cluster. So if the quorum was based on nodes, it is possible that all the nodes of the quorum are running on the same host, which creates a single point of failure.
Cluster and Host Server Managers
At this time there are two types of server managers, the Cluster Manager and the Host Manager. These managers are started (and monitored) by a kingmaker server, which runs in every node. The kingmaker servers are responsible for having one cluster manager running in the cluster and one host manager running on every host.
The cluster manager uses the data in the config database to start and monitor a number of cluster servers, where each type of cluster server has only a single instance running somewhere in the cluster. The cluster manager and all the cluster servers stop running when the node is not a part of the active cluster (a cluster with a quorum of hosts).
Similarly, the host managers use the data in the config database to start and monitor a number of host servers, where each type of host server has only a single instance running on each host. Unlike the cluster manager and servers, the host manager and servers are unaffected by quorum considerations.
The cluster and host managers use a server named ranker to determine which node to use when starting a server. A simple ranker is provided which provides a list of nodes ordered by the number of servers running on each node. Alternative ranker implementations can be used as they are developed.
laforge49 270005CXQG 650 Visits
When deserialization and reserialization are reasonably fast, their use to make deep copies of data structures becomes a reasonable approach. And Jid provides the CopyJid request, which is supported by all Jid actors. The GetSerializedBytes request is similar, in that it returns a byte array holding the serialized data of a Jid actor, and it too is supported by all Jid actors.
Making copies of data structures is important in a multithreaded application when it can be used to reduce the number of messages sent between threads. Conversely, being able to add a copy of a Jid actor to a collection may be even more useful. This is done by first getting the byte array of a Jid's serialized data and passing it in one of several requests which then create a copy of that Jid and add it to their collection. These requests include SetActorBytes for RootJid, ActorJid and UnionJid, IAddBytes for BListJid and KMakeBytes for BMapJid.
laforge49 270005CXQG 786 Visits
All Jid objects have the same superclass, Jid, which in turn is a subclass of JLPCActor, which means that all Jid objects are actors.
So far, we have not given any examples of a Jid object initialized with a Mailbox, which means that none of the Jid objects shown are able to send or process messages. But initializing a Jid object with a Mailbox is easy to do and most of the methods in the JID API have corresponding Request classes. Also, the Jid objects in a Jid tree structure will always share the same mailbox, so an application Jid never needs to send Requests to the Jid objects in its tuple--it can just call their methods directly.
In the code below we create a RootJid with a JidString set to "Hello world!", serialize it and then deserialize it. Many of the method calls shown earlier have been replaced with request messages to illustrate their use. However, the serialization and deserialization logic still uses method calls, which means that thread safety is the responsibility of the application developer for these operations. (Thread safety can always be achieved by performing these operations within an actor which uses the same mailbox as the Jid Actor.)
laforge49 270005CXQG 593 Visits
BMapJid is the base class for balanced tree maps which, like bListJid, provide for super-fast incremental deserialization and reserialization. BMapJid has 3 subclasses, IntegerBMapJid, LongBMapJid and StringBMapJid, which support Integer, Long and String keys respectively.
BMapJid<KEY_TYPE, VALUE_TYPE> is a collection of MapEntry objects, where MapEntry holds a key/value pair. BMapJid is effectively a sorted list of MapEntry objects, with fast indexing supporting the same methods as BListJid exception only the iAdd and iAddBytes methods are not supported. But access by key is also supported. These additional methods include
BMapJid objects are created using a registered factory object. As a convenience, JidFactories registers 24 such factory objects, though it is easy enough to define register additional factory objects using the IntegerBMapJidFactory, LongBMapJidFactory and StringBMapJidFactory classes.