Skip to main content

skip to main content

developerWorks  >  Architecture | Information Management  >

Information architecture essentials, Part 2: Managing enterprise information

Identifying, capturing, controlling, presenting, and archiving content

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss


Rate this page

Help us improve this content


Level: Introductory

Benjamin Lieberman, Ph.D., Principal Software Architect, BioLogic Software Consulting, LLC

05 Feb 2008

Information content management involves identifying useful information, organizing that information into an intuitive structure, and governing changes made to that information. Content comes in many forms, including text, graphics, tables, charts, illustrations, recordings, maps, video, audio, and many others. Learn how to organize that information into a maintainable and usable structure by categorizing and organizing the content to suit your audience.

Information has value

It's surprising how much value can be placed on an object as ephemeral as a piece of data. But information doesn't exist in a vacuum; it's used by every living thing in the context of its surroundings. From a bacterium sensing a chemical gradient to locate food, to a space shuttle astronaut relying on instruments to safely reenter Earth's atmosphere, information is of value only in a particular context to a particular user. Without the correct identification, capture, management, and presentation of information, you'd be hard pressed to make any decisions -- business, personal, or governmental -- let alone good ones. To properly manage information, it's essential to understand the way in which information will be used, by whom, and for what purpose.

The first step in managing content is (paradoxically) identifying content worth managing. Not all information has the same value, particularly because value is determined by who is interested in the data. The ability to manage how a refinancing advertisement is periodically displayed on a popular Web site is of much more interest to the advertising executive than it is to the average Web user. Information must be evaluated based on the subjective needs of the primary audience. Is the purpose of the content to generate revenue or provide education? Does the audience comprise children or adults? Is the information for entertainment? Business development? Formulation of policy?

As shown in Figure 1, you can start to determine the value of any collection of data with any one of the three concerns: the audience (users), the purpose of the information (context), or the information itself (content).


Figure 1: Intersection of concerns for information management
Intersection of concerns

Not only does the context of how the information will be used matter in identifying data, but it also guides the capture and presentation of that information. For example, a business executive may have a great deal of interest in the financial health of a potential investment but can easily be overwhelmed by a massively detailed financial statement, leading to guesswork rather than reasoned decision-making. In this instance, the context is involved with how the raw data must be summarized or modeled to remove extraneous information and focus on the core question -- is this investment sound or risky? Context is all about asking the right questions to understand the eventual use of managed information.

Content isn't always readily accessible, and it may not be in a manageable form. Take, for example, the U.S. government 1040 tax form. Although many people take advantage of direct electronic submission of tax forms, a substantial amount of people still mail their paper forms to the IRS office. If the mandate is to manage all tax submissions electronically (and it soon will be), then the IRS information manager is faced with the problem of converting the paper forms to electronic. The form of the information will directly influence the management mechanisms, perhaps limiting the possible solutions to non-computer-based management.

As another example, consider your local library: The old card catalog has long since been converted to electronic form, but the content (books, films, maps, and so on) still consists of physical objects that must be stored on shelves or in boxes.

Finally, information users are a mixed bag. Some are information savvy and able to find the exact right set of keywords, whereas others struggle to find useful content. And not all users have the same needs; some prefer a complex, detailed display, and others prefer a simplified presentation that allows a free-form ability to browse. This is similar to the determined shopper who knows the exact item he wants as compared to the shopper who simply wishes to look around and see what might be of interest. An effective information-management policy must support both kinds of users.



Back to top


Skills and competency

Information can only be understood in context...

A key element of information management is the ability to identify valuable information and to organize that information to best benefit a particular audience. This task requires that you be able to think critically about what is important and what isn't. Critical thinking represents the combination of education, experience, and research.

Education provides a common starting point for both the information manager and the expected audience; the language, navigation metaphors, categories, and presentation form are all based on a common understanding between you and your information users. Experience is gained from trying many approaches and discovering which one works -- the classic "trial and error" method. Research lets you benefit from the mistakes of others rather than learn from your own tedious efforts. An information manager must be familiar with critical thinking to reason about the collection of information to be managed. Judgment calls are required at every stage of information management, from identification through categorization to control, utilization, and archiving.

Information obeys the law of entropy...

Information repositories tend toward disorder unless acted on by an outside force. Data must be continually managed, or it will fall into disarray. Consider again the example of your local library. If patrons were permitted to not only remove books from shelves, but also to put them back or add new ones, then mistakes of omission or commission would lead to an unusable mess. Unfortunately, many content-management approaches are little better than trusting the users to regulate themselves (for example, "shared drives"), usually with disastrous results.

Information repositories must be managed by a trained set of individuals who act as librarians and keep the organizational policy working as planned. Primarily, this role requires periodic review of the materials in the repository to ensure that the established management policy is being followed. As information moves through the management life cycle (discussed later in this article), it must be constantly monitored to ensure that categories are used properly and consistently. The information manager is responsible for knowing the established scheme and choosing the best category for long-term information capture.

Information must be accessed to be useful...

Even if information is properly captured, skillfully organized, and masterfully managed, it still must be accessed by a user to have any purpose. Access to complex collections of information requires a simple but sophisticated search-and-filter mechanism to avoid non-information (where the user is shown results that aren't what was needed) or mis-information (where the user is misled into believing the information is applicable).

A common example of the non-information is the 50 million search results when I type my name into Google: The first dozen or so hits have nothing to do with me, but rather are about someone who shares my name (and is apparently more popular). The second issue, mis-information, is more insidious and is a result of trusting the search algorithm too much. This can happen, for example, if you search online for an inexpensive version of a product, such as a $25 Rolex from a "well-known" provider. The repository manager should be well versed in techniques of information searching (keyword, topic, authority source, category, and so on) and information filtering (such as by statistical relevance of search word occurrence, or narrowing words or phrases) to ensure accuracy of searches and search results.

Information is most useful when it has an aesthetic quality...

Presentation is often the most-overlooked aspect of information management. Substance must always take precedence over style, but poorly presented content runs the risk of confusing the users it was meant (often at great expense) to serve. A repository manager should have at least a passing familiarity with human-usability principles: appropriate use of color, font, layout, and navigation. The best information model in the world will sit unused if the repository's interface is confusing, complex, and unappealing.



Back to top


Tools and techniques

Information management follows a life cycle...

Some information is always valuable, such as investment account balances; other information has a defined period of time when it's valuable, such as plane departure and arrival information; and still other data has value only periodically, such as business intelligence. Nevertheless, all information has a life cycle during which it's identified, captured, organized, controlled, utilized, and eventually archived. Figure 2 illustrates these six principle steps in the information life cycle.


Figure 2: Information life cycle
information life cycle

Identification

As mentioned earlier, the first step in information management is identifying content to be managed. For example, if you're creating a repository of requirements for a development team, the items of value can be initially identified as business requirements, system requirements, and testing requirements. Most if not all information to be managed falls into one of those categories. This also provides you with an understanding of the data source, which may be easy or difficult to manage, depending on the form and period of time between updates. A frequently updated information source that has an inaccessible format requires a much more sophisticated scheme than one that is periodically updated in a readily accessible form. This approach also provides scope to the effort, which prevents trying to manage everything having to do with developing a new system or modifying an existing system.

Capture

With the information identified, the next step is to capture that information into a manageable repository, where the content format dramatically affects the storage needs. Assuming all the information of interest is binary (which isn't always the case, even considering online content-management systems), then the primary questions of storage are size and bandwidth. The size of the files determines the principle storage needs (including backup) and the level of bandwidth required for capture and eventual display. Large files, like video or music, require a much larger storage space and delivery capacity. You can use the following formula to estimate your needs:

File Storage Requirements = (average file size * number of files + index size * number of indexes) * 2 (for backup needs)

If you use compression, then you can often divide the result by a factor up to 2, depending on whether the files are already compressed (like JPEGs and MPEGs). Also note that file metadata needs are usually a small fraction of overall storage needs.

You can scale this simple calculation with a weighting factor against the average if you need to accommodate some very large files. Storage needs are similar regardless of the mechanism (database, network device [tape], or file system). Remember that you must provide sufficient scaling for future needs and sufficient bandwidth to accommodate user downloads of the content. As for processor power, if the metadata associated with a file is properly indexed to the searches (hitting only the indexes), then processor needs tend to scale linearly with the user load.

Organization

Organization of content means that all information must be tagged in some fashion so that users can readily locate it later. This tagging may be as simple as document title or as sophisticated as the Library of Congress metacategory method (see Resources). In either case, it's a good idea to develop a controlled vocabulary in a formal metadata definition document to guide both the initial repository development and the acquisition of new materials. A controlled vocabulary is a hierarchy of categorization labels that are applied to all the information in the repository. For most purposes, a single hierarchy is sufficient, such as for simple document retrieval; but you may need to organize materials in a cross-referenced secondary hierarchy if multiple content forms are stored (for example, the first dimension may describe the content, and the second may denote content form -- Comedy/Video or Documentary/Audio Books).

With any controlled vocabulary, choosing the granularity for each level of the tagging hierarchy is a critical decision for both maintenance and information navigation. This is the hardest part of organizing information and the one most likely to cause long-term difficulties in adding new materials. The next article in this series will address issues of abstraction and leveling that are important to the development of a controlled vocabulary.

The ability to navigate and filter the return set from a repository search is directly influenced by the selection of terms familiar to the end users. It serves no purpose to establish a controlled organization of materials if that organization doesn't make sense to your users. Be sure to spend time understanding the nature of the information context when you're developing metadata tagging for content.

Manage

Managing the repository involves updating materials periodically as older materials are archived and newer ones are added. Depending on the technical storage of the information (database, content management system, or file system), the configuration change-control mechanism either is directly provided by the storage software (such as for content-management systems) or must be layered over the information storage (such as for a file system).

Configuration management provides multiple purposes for information management:

  • Information is automatically provided with versions, letting you return to a previous edition in the case of corruption.
  • Configuration control lets you roll out sets of information as a group to the production system. Consider a content-management system for Web advertising: It's required to have ads appear for a defined period of time, often as a group. The configuration-management system can track these collections regardless of the number of controlled files and allow tagging for promotion to production.
  • Configuration management lets you create multiple versions of the repository, to better track against organization activities. For example, system-development materials need to be version-controlled along with the releases of the code base.

Utilization

As noted earlier, if end users can't effectively find the information they're looking for, the repository won't be effective and will likely fall into disuse. Proper utilization involves two interrelated functions: search and navigation. Searching is based on the metadata associated with the repository materials; index design based on the expected search categories dramatically speeds discovery of properly labeled materials. Navigation is the ability to rapidly move around the information space to locate related information. Users aren't always sure what they want, so remember to let them browse directly from the search result (such as by including links to related categories or refinements on the search). Many commercial and other organization Web sites provide this kind of support for shoppers who may or may not know exactly what they want.

Information presentation is also a key factor in utilization; this topic will be addressed in the upcoming article on usability design. For information-management purposes, presentation is involved in ensuring the accuracy of data. Accuracy means ensuring that the tagged information belongs with the assigned category, much like putting a book on the correct shelf. Presentation tools that let the maintainer see and browse the content for a particular category are valuable, especially where content is automatically captured from the information source.

Archive

The goal of archiving is preservation rather than ready access. Information reaches the end of its life cycle when it begins to lose direct value to the user community. At this point, it's no longer cost-effective to have the data take up space in the primary information store; you should move the data to an archival location where the long-term maintenance cost is reduced. Currently, that means moving the content to either tape or disk-archive arrays.

Moving content means repeating the identification step, only in reverse; now you're looking for information that isn't frequently accessed by the user community and migrating that information to the archive, freeing up space for new acquisitions. For one interesting long-term archival strategy, see the Resources.



Back to top


Milestones

The discussion so far leads to a set of key milestones for information management:

  • Identify valuable content -- Locate and evaluate the value of particular information content with regard to long-term storage.
  • Code and label content -- Label content according to your defined organizational scheme, including hierarchical categories and controlled vocabularies.
  • Review and approve the organizational scheme -- Be sure the organizational scheme meets the end users' needs.
  • Storage -- Define and establish adequate storage technologies that are sufficient for identified needs.
  • Publish -- Let end users search, navigate, and view information content.
  • Archive -- Provide the ability to move inactive information into long-term storage.


Back to top


Conclusion

Information management is a huge topic that can involve discussions of content-management strategies, distributed access, federated security, and much more. This article has just skimmed the surface of this interesting field but has provided a starting point you to create an effective information-management policy for your specific needs. Future articles will introduce a variety of organizational techniques: data modeling, distributed data collection, business intelligence, and packaging information for sale to interested customers. It all starts with the ability to recognize information of value and then organize that information for storage, access, and ultimately presentation to a specific audience.



Resources

Learn

Get products and technologies

Discuss


About the author

Ben Lieberman photo

Benjamin A. Lieberman serves as the principal architect for BioLogic Software Consulting, a firm providing services on a wide variety of software development topics, including requirements analysis, software analysis and design, configuration management, and development process improvement. Dr. Lieberman is also an accomplished professional writer and author of The Art of Software Modeling and numerous software-related articles. Dr. Lieberman holds a doctorate degree in biophysics and genetics from the University of Colorado.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top