Technical Blog Post
Avoiding data modeling issues, Part 1
When implementing an IBM FileNet P8 Content Manager-based solution, one of the more complex issues to resolve is the data model. A properly designed data model can ease storing, finding and retrieving documents and content. However, a poorly designed data model can lead to confusion, lost content and missed business opportunities.
This entry is to be a primer for someone building a new data model or looking to revise an existing one. The principles and recommendations in this series are a result of written best practices and recommendations, as well as the collective experience of the IBM ECM Software Services group has collected in fifteen-plus years of implementing various FileNet products from Image Services to IBM FileNet P8.
The review will be broken into four basic sections:
- The purpose of metadata
- Design approaches
- The basic design elements available within Content Manager
- Additional features of Content Manager for information management
Purpose of Metadata
Any description of lessons learned and recommendations for handling metadata manipulation and design stars with an understanding of the purpose of metadata. At its most basic level, metadata is data about data. In the case of IBM FileNet P8 Content Manager, metadata is data that is collected about a document so that it can be stored and retrieved at a future point. This metadata in Content Manager is stored as properties on a document (or object) and is available for searching via Stored Searches, Search Templates, out of the box applications (such as Workplace and Workplace XT) and via the API.
Any metadata collected should generally help a user find a document. Good examples would be account number, customer ID, case number, etc. What metadata should not be is a wholesale copy of the data in the document. Any structure that dictates keying in a good majority of the document content into the metadata can be reexamined. There may very well be a reason for doing so, but if the key purpose of this type of data is to just help find the documents, then this may be requiring too much time when committing the document. In addition, if the metadata itself is useful (for future data mining or for the application itself), there may be a reason to not keep the original document, but keep the metadata only. Part three of this series will cover custom objects and ways they can be used to store metadata-only.
When approaching the metadata design, there are two basic starting points: a top down, enterprise approach or a more departmental, bottom up approach. This section is a summary of what is described in the IBM FileNet Content Manager Implementation Best Practices and Recommendations Redbook, Section 5.2.1. While this summary is a good start, the Redbook itself has more information and should be read both for this information as well as a large number of other recommendations and best practices.
A top down approach starts with an enterprise wide view of what is to be accomplished with the data model and how this integrates with the wider goals of the enterprise. Practically this means that an enterprise wide system would review the basic metadata that every document should contain to comply with these goals and mandate this. Then at each lower level, for example department or organizational unit, the business analysts would work to add the metadata elements appropriate to that line of business or problem to be solved.
One key advantage to this approach is a unified structure allowing for better organization and cross line of business searching. Having the enterprise mandate, for example, that any customer document have the appropriate customer identifier as a metadata property would make it easier to identify all documents related to a specific customer.
However, if not properly balanced, an enterprise approach can lead to creation of shared properties that are not necessarily appropriate for all business units. In the example above, the enterprise architect may determine that a customer ID is appropriate for the enterprise structure, but it may not be appropriate for human resources or any non-customer facing class. As such a mandate to include it may be counter productive.
The alternate approach turns is to start at the department level and work up, generalizing where possible. If a system was born as a departmental system, this is likely how this happened. Each department went about solving their own problem and then tried to incorporate themselves into the enterprise at large.
This approach does have its advantages. Most important is that each group can start with what is important to them. In the above example, customer facing and noncustomer facing lines of business or departments may have totally different needs and requirements. Starting at the bottom would allow for a more customized structure that meets their needs.
The downside is the potential for lack of consistency can make cross line of business searching and data mining more difficult. A lack of standards for naming convention and data integrity requirements may exacerbate this problem. Which approach, or how an organization balances these two approaches depends on its needs. Neither answer approach is “best”, so it may be an issue of just picking the best of both worlds and applying them. Also, regardless of the approach, it would make sense to have one person or group who is responsible for the data model, or at least to oversee the creation of the model to help ensure consistency and interoperability.
Basic Design Elements
After selecting the approach to designing the structure, it helps to know what tools Content Manager offers the designer and architect for building that taxonomy. This part of the entry goes into them it a bit more detail and how they can be both used and misused. Where possible, I've also tried to give concrete examples of issues I've seen at customer sites related to each.
Ignoring the Content Engine domain as the lowest level container, the primary data container or organizational unit within the Content Engine is an object store. An object store contains a distinct set of documents, properties, metadata and security. Since the data is completely separate from any other object store's data, it provides a convenient way to subdivide a data set or provide separation between lines of business.
When working with customers, I'm often asked how many object stores should they create and use. The answer I always give is the same, it depends. There is no hard and fast rule, no mathematical equation that I can offer to help decide. Instead, what I generally offer are four basic guidelines:
As few as necessary
Object stores are generally easy to create, so an administrator might be inclined to create a large number. In fact, it may be possible for each line of business, no matter how small or large, to have its own object store. And that may work, but remember that with the addition of each object store, adds another set of databases or schemas (depending on database), storage areas, classes, properties and security that have to be managed.
For this reason, we (ECM Software Services) tend to recommend a smaller number of object stores, if for no other reason than to limit the amount of maintenance that needs to be done on a regular basis. In addition, with larger numbers of object stores (and just about any type of item in Content Manager), the more confusing the choice of where to store data can become for your users and even your developers and administrators.
Enough to segregate unrelated data
Of course, the flip side of having a large number of object stores is that there is an increased risk of improper security on documents allowing the wrong user access to information. For example, if Human Resources and Marketing information are mixed, it is possible to use access control to ensure that marketing people cannot access HR documents and vice-versa. But if someone were to either by accident or intentionally modify security on one of the HR documents, someone from marketing could potentially find and view this sensitive information.
If these documents are in separate object stores, the object store security itself can ensure that only HR users can access HR documents. Think of it as both locking the HR file cabinet, but also locking it in a separate room. If the cabinets are in a separate room, even if someone were to accidentally leave a cabinet open (or their key in the cabinet lock), the lock on the office door itself will provide an additional measure of security.
Enough to meet performance requirements
Another facet of the object store count discussion revolves around performance and activity levels. While there is no limit to the number of items in an object store (or any limits would be database dependent), there is a more practical set of limits. In addition, even if a dataset is not likely to hit that upper limit, there may be two datasets that are both accessed very heavily requiring different performance tuning settings. Having them in separate object stores can allow for each schema or tablespace to be treated separately, indexed properly and maybe even located on separate spindles for better performance.
Minimum 2 (settings, content)
One other recommendation is to create at minimum two object stores: one for content and one for Workplace/Workplace XT preferences. I've worked with a number of customers who created only one object store, stored their preferences there with their data. Then, when it comes time to bring a second line of business into the system, they discover that they need to lock down the original object store. However, since their preferences are in there and every user of the system needs access, they cannot lock that object store down. The result is they have to reconfigure Workplace, and while not a monumental undertaking, it is still a maintenance task that can easily be avoided by just separating it during initial configuration.
Control Object Store
This recommendation came from some of my coworkers in field delivery. If a third object store is created and used solely as a control object store, it can used to validate out of the box functionality when upgrading. If an upgrade causes issues with creating or retrieving documents in the primary object store(s), the same operations can be performed in the control object store. If it fails there, it is more likely a system issue and not a code or data issue.
Part two will cover additional components in a Content Manager object model, including properties, classes and folders.