In WP 6.0, we broke up our configuration and content repository databases into "domains", or database instances/schemas that are organized by function. The intention of these domains is to allow them to be located separately from each other, and in some instances, shared between multiple identical portal clusters. The domain-based organization also enabled some functional features, such as loose-coupling between customization data and static release data, but that isn't the subject of this blog entry.
Geographic deployments is core to WebSphere Portal's ability to deploy global portals and meet 24x7 uptime goals. To facilitate this, certain database domains must be shared across deployments, to ensure consistency of data for all end-users.
The database domains are:
- release: contains static portal configuration, such as pages, portlets, and entitlements
- community: contains community-oriented portal configuration, or that shared across a small group of users, but is not user-specific. Composite application instance configuration goes here.
- customization: contains user-specific configuration, such as customized portlets and pages
- likeminds: used to hold information about e-commerce selection trends across multiple users, and provides a recommendations engine for providing selections to end users based on similar purchase trends. Likeminds is a feature of the Personalization service.
- feedback: used to hold information about what Personalization rules are being used, what parameters feed into those rules and what their results are. Feedback is a feature of the Personalization service.
- wmm: used to hold user and group information that is otherwised not maintained by the underlying user repository, such as through the use of the "lookaside" feature. It can also serve as the user repository underneath a custom user registry based on the WMM APIs.
- jcr: the Java content repository holds several different types of data, including WCM content, Personalization rules, and policy definitions.
The release, feedback, and likeminds domains should be unique per cluster. Since feedback and likeminds are rarely used, you can include their schemas along with the release domain in the same database instance if you like. Even if those features are in use, to keep the deployment simple, I still recommend maintaining them with the release domain. Keeping the release domain specific per cluster is essential for allowing one cluster to be serviced while allowing other identical clusters to remain in production.
The community, customization and wmm domains should be shared across all identical clusters, as they contain end-user information which must be common across these clusters. That means all clusters can either refer to the same DB instance that holds these three domains, or you can employ 2-way replication to keep them synchronized. If you use 2-way replication, the replication frequency should be less than the event that triggers a user-binding to a particular cluster. You do not want an end-user rerouted to a different cluster BEFORE their data is replicated over. The trigger depends on your global load balancing logic, but I prefer domain-based routing since it pretty much guarantees that a particular user will go to a particular cluster/datacenter unless there is a failure causing the user to be rerouted to another cluster. DNS-based routing is problematic because hostnames could be re-resolved at any time, even during a user's active session with a particular cluster.
The last domain, jcr, is special, because it can contain both release-oriented and user-oriented data. In general, though, I recommend that the JCR be treated as a release domain, since the vast majority of the data it contains comes from staging or authoring environments. Personalization rules and policy definitions are typically developed internally and staged out. WCM content is authored and can be syndicated out to multiple clusters simultaneously. Database based 2-way replication is not recommended or supported for the JCR as a means for reducing the amount of syndication required as it can cause problems with content visibility across all clusters. WCM uses a caching mechanism that relies on syndication as a cue to invalidate cache entries. Without it, users may not see updated content without the servers being restarted.