Scaling OpenJPA applications with Slice

Slice is a module for distributed persistence in OpenJPA. Slice enables an application developed for a single database to adapt to a distributed, horizontally partitioned, possibly heterogeneous, database environment. This all occurs without any change in the original application code or the database schema. See how to leverage this flexibility for your own applications, especially those destined for the cloud or Software as a Service.

Share:

Pinaki Poddar, Senior Software Engineer, IBM  

Pinaki PoddarPinaki Poddar works in middleware technology with an emphasis on object persistence. He is a member of the Expert Group for the Java Persistence API (JSR 317) specification and a committer for the Apache OpenJPA project. In a past life, he contributed to the building of component-oriented integration middleware for a global investment bank and a medical image-processing platform for the healthcare industry. For his doctoral thesis, he developed an indigenous, neural-network-based automatic speech-recognition system.


developerWorks Contributing author
        level

24 August 2010

Also available in Japanese Portuguese

Introduction

Slice extends OpenJPA for a distributed, horizontally partitioned database environment. An OpenJPA-based application currently using a single database can be reconfigured with Slice for a storage environment where data is partitioned across many databases. This upgrade does not require any change in application code or the database schema.

The immediate advantage of horizontal data partitioning is to boost performance against massive data volume, especially for those applications where transactional units of work or queries are often limited to a subset of the entire dataset (e.g., multi-tenant Software-as-Service platforms or a customer database partitioned on geographic regions). In such scenarios, a data partitioning-based solution like Slice can be useful because Slice not only executes all database operations in parallel across the partitions to gainfully utilize multi-core hardware and concurrency of I/O bound operations but the database queries can also be targeted to a subset of partitions.

This article describes how:

  • To configure an application for Slice
  • Slice distributes data across partitions
  • It aggregates or sorts query results from many partitions
  • The conditions to be satisfied for partitioning to operate effectively in parallel
  • The core design/architectural challenges addressed by Slice to extend OpenJPA runtime to adapt to partitioned databases

Elevator pitch on JPA

The Java™ Persistence API (JPA) is a specification for managed-object persistence to relational databases. The core conceptual constructs in JPA are Persistence Unit and Persistence Context are realized as two interfaces: EntityManagerFactory and EntityManager in javax.persistence package. A persistence unit represents:

  • The set of persistent Java types
  • Their mapping specifications
  • Database connection properties
  • Often a set of provider-specific custom properties

A persistence context represents a set of managed persistence instances. Persistence context is also the basic interface for persistent operations such as:

  • Creating new instances
  • Finding instances by their primary identities
  • Selecting instances by string-based or dynamically constructed query
  • Demarcating transaction boundaries

A persistence context manages the instances in the sense that any mutation of the persistence state by the application is monitored by the JPA provider. Appropriate database records are updated automatically when the transaction is committed or the context is flushed.

As a general theme, the JPA promotes an application programming model where the persistence operations and queries refer to the Java object model, while the provider is responsible for mapping both the object model to a database schema: a Java class turns into one or more database tables, persistent attributes of the Java types into database columns, relations into foreign keys, etc.; and the persistence operations to SQL statements: new() operator of Java maps to one or more INSERT, a find() maps to a SELECT, a setter method of persistent instance translates to an UPDATE statement to the database.

slice vs. Slice

In this article, you will see slice in lowercase and Slice as a proper noun. We'll use slice (in lowercase) to refer to database subsets, while Slice (proper noun) denotes the runtime OpenJPA module.

The core JPA specification is based on the implicit but hard assumption that the persistent objects and operations are mapped to a single relational database. Each persistence unit is essentially connected to a single database, and the managed persistence object states are stored to the same database. Slice changes this integral assumption about a single database. Slice allows the same JPA application to operate while the underlying data storage environment has been transformed from a monolithically large database to a set of horizontally partitioned databases. These physical databases that store a subset of an entire dataset are referred to as partitions, shards, or slices.

Formally, horizontal partitioning or sharding P(D) of a dataset D is a decomposition of D into N mutually disjoint sets Di such that:

D = D1 ∪ D2 ∪ ⋯ ∪ Dn and 
Di ∩ Dn = ∅ for any i ≠ j

Slice introduces the abstraction of a virtual database to encompass the underlying physical database partitions or shards. The persistence unit in a Slice-enabled application connects to a single virtual database which, in turn, multiplexes all the persistence operations to the actual physical databases via corresponding JDBC drivers. For example: A Slice-enabled application is configured to use four slices, or shards, then a JPA query such as select c from Customer c where c.age > 20 order by c.age is executed (in parallel) on each of the four slices. The respective sorted results from each slice are merged and sorted again in memory by the virtual database before being presented to the application. As the virtual database interface offers exactly the same API as a single database from the user application's perspective, both the application code and database schema require no modification while adapting from a single to a distributed, partitioned database environment. This is because virtual database abstraction in Slice follows a composite design pattern. This seamless nature is the strongest usability feature of Slice.

It is worthwhile to note that alternative approaches to cope with partitions are not tenable where each persistence unit is configured with a separate shard, or a single persistence unit that configures each EntityManager to connect to a single partition.

Why not? It's because the JPA specification imposes a group-like behavior on the managed instances of a persistence context.


Configuring Slice

An application can upgrade to a partitioned database environment using Slice by reconfiguration only. This section introduces Slice in terms of its user-configurable properties and how they represent different capabilities. The configuration for Slice uses the same mechanics of configuring any standard JPA runtime (i.e., a META-INF/persistence.xml resource is made visible to the classpath). META-INF/persistence.xml often contains provider-specific properties sections as name-value pairs, besides other JPA-specification-defined properties, such as the persistent class names or mapping descriptors. Slice-specific properties are mentioned in the provider-specific properties sections, prefixed with openjpa.slice.*.

The configuration properties can be categorized into three broad groups:

  • Properties that configure the storage environment as a whole
  • Properties that configure individual slices
  • Properties that configure the runtime behavior

Let's look at these properties and how they affect behavior.

Configurations for partitioned storage environment

First, consider an environment where the data is partitioned into three Apache Derby databases. The databases are identified by their logical slice identifiers: One, Two, and Three. The logical slice identifier is a simple human-readable moniker that uniquely (within the scope of a specific persistence unit) represents a physical database URL and its other details. To configure Slice for a partitioned database environment, the persistence.xml is shown in Listing 1.

Listing 1. Example of Slice configuration

Click to see code listing

Listing 1. Example of Slice configuration

<persistence-unitname="slice"><properties><propertyname="openjpa.BrokerFactory"value="slice"/><propertyname="openjpa.slice.Names"value="One,Two, Three"/><propertyname="openjpa.slice.Master"value="One"/><propertyname="openjpa.slice.Lenient"value="true"/><propertyname="openjpa.ConnectionDriverName"value="org.apache.derby.jdbc.EmbeddedDriver"/><propertyname="openjpa.slice.One.ConnectionURL"value="jdbc:derby:target/database/slice1"/><propertyname="openjpa.slice.Two.ConnectionURL"value="jdbc:derby:target/database/slice2"/><propertyname="openjpa.slice.Three.ConnectionURL"value="jdbc:some-bad-url"/><propertyname="openjpa.slice.DistributionPolicy"value="acme.UserDistributionPolicy"/></properties>
</persistence-unit>

Activating Slice

The first and foremost property to activate Slice is:

<propertyname="openjpa.BrokerFactory"value="slice"/>

This property instructs the OpenJPA runtime to create a specialized persistence unit that is connected to a virtual database underlying a set of physical databases. This property is mandatory.

Each slice has logical name

The next important property is a list of the logical slice identifiers.

<propertyname="openjpa.slice.Names"value="One,Two,Three"/>

This property value enumerates all the available logical identifiers in a comma-separated list. A logical identifier must not be the same as the name of the physical database. The logical identifier is the unique identifier for a particular slice within a persistence unit. For example, each configuration property name specific to a slice is prefixed with a logical identifier such as:

<propertyname="openjpa.slice.One.ConnectionURL"value="…"/>

It is not, however, mandatory to list the logical identifiers via openjpa.slice.Names property. If the property is unspecified, the entire persistence.xml is scanned to identify all unique logical slice identifiers. It is recommended, though, to enumerate the logical identifiers explicitly. I'll explain more about that later.

Designate one slice as master

The master slice is used to generate the primary identifier for managed instances, whenever required. According to JPA specification, each persistent instance must have a persistence identity. The value of this identity can be specified by the application or generated by a database sequence. In the latter case, to maintain uniqueness of the database-generated primary key in a multi-database environment, one of the slices is designated to generate these keys. That particular designated slice is called a master slice.

A slice is designated as master by the following property:

<propertyname="openjpa.slice.Master"value="One"/>

The explicit designation of a master slice is not mandatory. If the property is unspecified, the first slice is designated as master. The operative word, of course, is first. It assumes that the slices imply an ordering. Ordering of the slices is imposed by the list when openjpa.slice.Names is explicitly specified. Otherwise the slices are ordered by lexicographic ordering of their identifiers (which is a heuristic, but definite, ordering). To avoid such implicit heuristics, however, it is recommended to specify both openjpa.slice.Names and openjpa.slice.Master explicitly.

Availability of each slice

In a multiple database scenario, one or more databases may not be available. The following property dictates the behavior when Slice cannot connect to one or more of the partitions:

<propertyname="openjpa.slice.Lenient"value="true"/>

Setting this property to true permits Slice to continue even when one or more slices are not reachable. By default, the value is false and Slice will fail to start up if any of the configured slices are not connectable. As the example shows, the third slice points to an invalid database URL. Setting this property to true will allow Slice to startup with two valid slices and ignore the unreachable slice.

Configuring individual physical database

Each slice identified by its logical identifier must specify its physical database URL and other properties. The example below shows a slice-specific configuration for a single slice, identified logically as One:

<propertyname="openjpa.slice.One.ConnectionURL"value="jdbc:derby:target/database/slice1"/>

This property assigns the partition logically identified as One to a physical instance of a Derby database with a URL of jdbc:derby:target/database/slice1.

As mentioned, each slice-specific configuration property name is prefixed with openjpa.slice.<logical slice identifier> followed by the original OpenJPA property key suffix, such as ConnectionURL. This naming convention allows the user to configure each slice independently with any OpenJPA property. On the other hand, the configuration properties that are common across slices can simply be specified as original OpenJPA property. Thus, in the example configuration, the JDBC database driver is specified as a common property and applies to all slices as shown below:

     <propertyname="openjpa.ConnectionDriverName"value="org.apache.derby.jdbc.EmbeddedDriver"/>

Note that it is eminently possible with Slice to specify a fourth slice with the logical identifier Four that represents a MySQL database within the same configuration as follows:

Click to see code listing

<propertyname="openjpa.slice.Four.ConnectionURL"value="jdbc:mysql://localhost/slice4"/><propertyname="openjpa.slice.Four.ConnectionDriverName"value="com.mysql.jdbc.Driver"/>

In such a case, the slice-specific properties will override the common properties for the specific fourth slice.

Configuring for runtime behavior

The major design goal of Slice was to encapsulate the storage environment such that the application code remains exactly the same as in a typical single database usage. On the other hand, the user application will require information on the underlying slices as well some degree of control, for example, to target some queries to a specific subset of active slices. To maintain these somewhat-contradictory goals of activating Slice without affecting the application code and, at the same time, allow some control, Slice employs a plug-in policy-based approach. A policy interface is implemented by the user application and specified in the configuration. During runtime, Slice calls back to this user implementation and uses the returned values to control the flow. The available policy mechanics are:

  • Data Distribution Policy — Controls which slice stores a newly persisted instance
  • Replication Policy — Controls which slices store a replicated instance
  • Query Target Policy — Targets a query to be executed on a subset of slices
  • Finder Target Policy — Targets a find by primary key operation be executed on a subset of slices.

This section will elaborate runtime behavior of Slice pertaining to a distributed database environment in terms of these configurable policies.

Data Distribution Policy in Slice

Slice does not require any database schema change. Similar partition-based persistence solutions often require that a special column be added to the database schema to identify the partition identifier. Slice does not require such additional schema-level information because it maintains the association between a persistence instance and its original database partition via its logical name. This association is established when a persistent instance is read from a particular slice. But when a new instance is being persisted, Slice cannot determine which database partition should be associated with a new instance. Hence, the application must specify the slice that will be associated with the new instance. An application specifies the slice for a new instance via Data Distribution Policy. This policy can be configured in persistence.xml as follows:

<propertyname="openjpa.slice.DistributionPolicy"value="acme.UserDistributionPolicy"/>

The property value designates a fully qualified class name of a user implementation of the org.apache.openjpa.slice.DistributionPolicy interface. The interface contract allows the user application to determine the logical slice for a newly persisted entity.

Listing 2. Data Distribution Policy interface contract
package org.apache.openjpa.slice;
public interface DistributionPolicy { 
 String distribute(Object pc, List<String> slices, Object context);  
}

The input arguments:

  • pc is the instance to be persisted. This is the same instance passed as input argument to an EntityManager.persist(pc).
  • slices are the immutable list of logical slice identifiers. This list does not contain the slices that are currently unreachable.
  • context is an opaque object reserved for future use. Currently, implementation-wise, this context is the same as the current persistence context. This implicit semantics is not a warranty for future usage.

The implementation must return one of the given logical slice identifiers.

Slice calls this user implementation on every root-object instance being persisted. The root-object instance is the explicit input argument to EntityManager.persist(Object r) called by the application. It is important to note that an explicit persist(r) operation on a single entity r can indirectly persist other related entities. JPA annotation (or mapping descriptor) can adorn a Java reference relation with how a persistence operation such as persist(), refresh(), merge(), or remove() will cascade along that relationship path. Therefore, if an instance r is related to another instance q, and the relationship between r and q is annotated with cascade PERSIST, then q will be persisted as well as a side effect of persisting r. This behavior is referred as transitive persistence.

The critical decision Slice makes while persisting a new root instance r is all the related entities reachable from r are stored in the same slice. Hence, the user implementation of distribution policy is only invoked for the root entity r. Slice automatically computes the transitive closure C(r) for the root instance and assigns each member of C(r) to the same slice for r as determined by the current distribution policy. This collocation of transitive closure is necessary because the virtual database is not capable to execute join across physical databases and hence cannot eager fetch a relation if the logically associated records reside on different databases. This limitation is referred as collocation constraint. Later, we will discuss how to work around collocation constraint (e.g., a lazy relation can exist across partition or how the same entity instance can be replicated in multiple slices).

For most applications, the Data Distribution Policy will be supplied by the user application, but Slice does provide a few out-of-the-box implementation policies for the beginner or experimental prototype. The default policy assigns an out-of-the-box policy that assigns a random slice to every new instance.

Storing the same entity to multiple slices

The distribution policy is useful for entities stored in a single slice, but the collocation constraint dictates that all related instances are to be stored in the same slice. This constraint becomes too restrictive for certain common data-usage patterns (e.g., master data like Stock Ticker Symbol or Country Code or Customer Type — that are referred to by many other types). In such cases, a type can be designated to be replicated across multiple slices. The persistence.xml configuration must enumerate the replicate type names in a comma-separated list:

<propertyname="openjpa.slice.ReplicatedTypes"value="acme.domain.Foo,acme.domain.Bar"/>

Persisting an instance whose type is replicated invokes a Replication Policy instead of a Data Distribution Policy. Because a replicated entity can be stored in more than one slices, the policy interface is similar to Data Distribution Policy, but differs in terms of the return type.

Listing 3. Replication Policy interface contract
package org.apache.openjpa.slice;
public interface ReplicationPolicy { 
 String[] replicate(Object pc, List<String> slices, Object context);  
}

While the semantics of the input arguments remain the same as in the Data Distribution Policy, the return value now contains an array of slice identifiers instead of a single one. A null return value implies all active slices, while an empty array will raise an exception. Slice, again, tracks all slice identifiers in which a replicated entity is stored and when a replicated instance is modified, then the same update is committed to all those slices. Hence, a replicated entity instance can be considered a single logical entity with multiple identical copies in several databases. The default Replication Policy replicates the entity to all active slices.

When a query involves a replicated type, Slice filters out the individual result from the slices for the replicated entities so that an aggregate query such as 'select count(o) from CountryCode o' does not return an erroneous result by counting duplicate CountryCode instances from multiple slices.

Target query to a subset of slices

By default, Slice executes a query across all active slices and consolidates the result if necessary in memory. The user, however, can target each query to be executed to a subset of slices. The user application can control such query targeting via a Query Target Policy interface.

Listing 4. Query Target Policy contract
package org.apache.openjpa.slice;
public interface QueryTargetPolicy { 
 String[] getTargets(String query, Map<Object,Object> \
params, List<String> slices, Object context);  
}

The input arguments are query, which is the JPQL string; and params, which is the bound parameter values to the query, each indexed by key. The rest of the parameters have the same semantics as in Data Distribution or Replication Policy.

The return value designates the slices on which the given query will be executed. An empty or null array is not a valid return value. This interface is invoked before every query execution. There is no default query target policy.

Target finder to a subset of slices

A Finder Target Policy is quite similar to Query Target Policy except that there are no bound parameters for a find() call. The interface is shown in Listing 5.

Listing 5. Finder Target Policy interface contract
package org.apache.openjpa.slice;
public interface FinderTargetPolicy { 
 String[] getTargets(Class<?> cls, Object oid, List<String> slices, Object context);  
}

The input arguments are cls, which is the entity class being searched for; and oid , which is the persistence identifier being searched for. The rest of the parameters have the same semantics as in the other policies.

The contract of the return value carries similar semantics to Query Target Policy.

There is no default finder target policy, thus find() looks for an instance in all slices by default.

Distributed Query Execution

Slice executes database operations in concurrent threads per slice. The threads are maintained in a cached pool of threads per persistence unit. The pool grows as per the application demands for concurrency, and the threads that have finished execution are returned to the pool.

The virtual database coordinates the execution of the queries on the physical databases and post-processes the results of the individual query in memory to prepare a consolidated result. Some typical queries are presented for elaboration.

Queries without in-memory post-processing

Figure 1 is for a simple query where the results from individual slices need not to be processed further in memory.

Figure 1. Query without in-memory post-processing
Diagram shows a basic select query and results returned from two of three database slices
    select e from Employee e where e.age < 30

The query predicate is evaluated on each slice. The final result list is concatenation of the result list from the individual slices. The ordering of logical slice identifiers gains significance and effectively determines the ordering of the selected elements. The ordering is determined by the ordering of the slice identifiers as returned by the query target policy or, otherwise, the configured ordering via explicit openjpa.slice.Names or implicit lexical ordering as discussed earlier. Assuming an ordering of {slice1, slice2, slice3}, the elements in the resultant list will appear in the same order. Notice that the third slice does not return any selection.

Our next example, shown in Figure 2, is a query with an ORDER BY clause using the code below:

    select e from Employee e where e.age < 30 order by e.name
Figure 2. Query with ORDER BY clause requires in-memory merge
Diagram shows a simple SQL query with results returned from two of three database slices which are then sorted by name

The individual query results are merged and then sorted in-memory. If Li is the ordered list from the i-th slice, the resultant consolidated list L is:

L = sort(ΣLi)

The in-memory sort operation can be made effective (from storage and computation standpoints) under the fact that each list Li is ordered by itself.

In this example, the resultant list is a merged version of the individually sorted lists from each slice. Hence, though the natural ordering of slices had "Mary" from the first slice, it is "Bill" who appears first in the resulting list due to lexicographic ordering on their names.

Top-N distributed query

In Figure 3, our third example is where the query result is limited to top-N elements.

Figure 3. Query with LIMIT BY
Diagram shows an SQL query returning results from all three database slices which are then filtered to two results by the setMaxResult function

The top-N query is realized in JPA via a combination of the ORDER BY clause in the query and setting limits on the result. The following query will find the five youngest employees:

    em.createQuery(“select e from Employee e order by e.age”)
        .setMaxResults(5)
        .getResultList();

In a distributed query environment, the query is executed on each slice individually, and the top two elements of the merged lists are evaluated at the virtual database layer in memory. Again, the in-memory top-N computation uses the fact that if an element x appears in the final list L, x must be present in one of the individual lists Li.

Aggregate query in a distributed environment

Let A(D) be an aggregate operation (e.g., SUM() or MAX()) evaluated on a data set D.

An aggregate operator is defined as commutative to partition if A(D) = A(R) where R is the set of individual evaluation of A() on each partition —R = {A(Di), i=1N).

Figure 4 illustrates an aggregate operation that is commutative to partition.

Figure 4. Aggregate query
Diagram shows how an SQL query generates a numerical result from each slice which are then aggregated together for the result

The example query is select SUM(e.age) from Employee e where e.age > 30. If S designates the sum of ages of employees 30 years and older across all slices and if Si is the sum within the i-th slice, it is easy to see that S is the sum of Si. Hence, SUM() is commutative to partition. Slice can compute all aggregate operations that are commutative to partition, such as MAX(), MIN(), SUM(), or COUNT().

Not all common aggregate operations are commutative to partition, such as AVG(). Currently, Slice can evaluate an aggregate query correctly that is not commutative to partition.

Transaction and Slice

Transactions involving an EntityManager can be controlled either through Java Transaction API (JTA) or by the application through EntityTransaction API, which maps to a resource transaction on the underlying database resource. For a JTA EntityManager, the JTA transaction is propagated to the underlying resource manager (i.e., the virtual database, which relays the transaction to physical databases). A typical configuration is a container environment for a JTA transaction with three JNDI-registered data sources.

Listing 6. Example of configuring with JNDI-registered slices
<persistence-unit name=”slice” transaction-type=”JTA”>
<property name="openjpa.slice.Names" value="One,Two,Three"/> 
       <property name="openjpa.slice.Master" value="One"/> 
       <property name="openjpa.slice.One.ConnectionFactoryName" 
value="jdbc/slice-ds1"/> 
       <property name="openjpa.slice.Two.ConnectionFactoryName" 
value="jdbc/slice-ds2"/> 
       <property name="openjpa.slice.Three.ConnectionFactoryName" 
                 value="jdbc/slice-ds3"/>
</persistence-unit>

For a resource-local EntityManager, the underlying resource manager acts as a transaction manager over the physical databases, albeit with a weaker transactional warranty than a proper two-phase commit protocol. In a resource-local transaction, the unit of work is first analyzed to partition the managed instances into subsets for each underlying slice. Then each subset is flushed to the corresponding database. If the subset for a slice is empty, it is ignored. If the flush fails for any database, the entire transaction is rolled back.

Collocation constraint

Earlier, we saw that the Data Distribution Policy describes how Slice automatically computes the transitive closure C(r) of a root instance r following the cascaded PERSIST and stores the entire closure to a single slice. It is important to note that the closure is computed at the time of the persist() call. Therefore, relations that are added after persist() are not part of the explicit closure.

While care should be taken to ensure that the collocation constraint is satisfied by persisting a root entity after its relationships are assigned, the same fact can be exploited to store related instances in different slices deliberately violating the collocation constraint. Let us consider a definite empirical example to illustrate the point.

Let us consider a simple 1:1 bi-directional relation between Person and Address. Person.address is cascaded with PERSIST. In mapping terms, Person is the owner of the relation (i.e., the foreign key to the ADDRESS table resides in the PERSON table).

Let us assume that we have two slices —One and Two— and our distribution policy is such that if the Person's name starts with the letters A through M, it is stored in slice One. Otherwise, it is stored in slice Two. Similarly, if the ZIP code of the Address ends with an even digit, it is stored in slice One. Otherwise, it is stored in slice Two.

Under these simple rules, let us create a Person p and an Address a instance and store them.

Listing 7. Code example on the effect of correlation constraints
Person p = new Person();
p.setName(“Alan”); // slice One as name starts with letter A
Address a = new Address();
a.setZipCode(12345); // slice Two as zip code is odd digit

em.getTransaction().begin();
p.setAddress(a);
em.persist(p); // relation to address is set before persist
               // Address persisted by transitive persistence
em.getTransaction().commit();

In Listing 7, the Person p is associated with the Address a, when em.persist(p) is called and, hence, Address a will be stored in the same slice One as Person p. The distribution policy would have determined a slice Two for Address instance a because its ZIP code ends in an odd digit. The policy, however, will not be invoked at all for Address a, but only for the root instance Person p. Slice will detect that a lies in transitive closure of p and hence as p is assigned to slice One by the distribution policy, the same slice One will be automatically assigned to Address a as well.

The code in Listing 8 demonstrates what happens if the persist call is ordered differently.

Listing 8. Code example on how to bypass collocation constraint
em.getTransaction().begin();
em.persist(p); // relation to address is not set before persist
p.setAddress(a);
em.persist(a); // a has to be persisted explicitly
em.getTransaction().commit();

The distribution policy will be invoked independently for person p and Address a, and they will reside in two different slices.

Thus, while it is possible to deliberately violate the collocation constraint and store related instances in separate slices, the usefulness of such storage strategy is limited. Only a few operations are possible when Person and its related Address reside in separate databases. For example, it is possible to lazy-load the relation from the owning side (i.e., Person.getAddress() will get the correct address even from the other database if the relation is lazy). If the relation is eager or navigated from the non-owning side or a query that requires join such as 'select p from Person p where p.address.zipcode = 12345', it will produce an erroneous result.


Conclusion

Data partitioning is an effective strategy for scaling against massive data volume, especially where natural partitions exists (e.g., customer accounts by name, home listing by region) or data separation is preferred by the application semantics (e.g., multi-tenant hosted platforms). Standard JPA has no effective means of dealing with sharding or portioning as the specification implicitly assumes a single database as the repository. Slice has extended OpenJPA implementation to support data partitions or sharding in a seamless manner. Unlike other sharding solutions, Slice does not require adding any extra column to the existing schema to enable partitioning. The distribution and query targeting using Slice is provided via a policy-based plug-in interface that ensures that existing JPA applications require no code modification (other than addition of new policy interfaces and reconfiguration of persistence.xml).

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Java technology
ArticleID=509551
ArticleTitle=Scaling OpenJPA applications with Slice
publish-date=08242010