 | Level: Introductory Benjamin Lieberman, Ph.D., Principal Software Architect, BioLogic Software Consulting, LLC
05 Feb 2008 Information content management involves identifying useful information,
organizing that information into an intuitive structure, and governing changes made
to that information. Content comes in many forms, including text, graphics, tables,
charts, illustrations, recordings, maps, video, audio, and many others. Learn how to
organize that information into a maintainable and usable structure by categorizing
and organizing the content to suit your audience.
Information has value
It's surprising how much value can be placed on an object as ephemeral as a piece
of data. But information doesn't exist in a vacuum; it's used by every living
thing in the context of its surroundings. From a bacterium sensing a chemical
gradient to locate food, to a space shuttle astronaut relying on instruments to safely
reenter Earth's atmosphere, information is of value only in a particular context
to a particular user. Without the correct identification, capture, management, and
presentation of information, you'd be hard pressed to make any decisions --
business, personal, or governmental -- let alone good ones. To properly manage
information, it's essential to understand the way in which information will be
used, by whom, and for what purpose.
The first step in managing content is (paradoxically) identifying content worth
managing. Not all information has the same value, particularly because value is
determined by who is interested in the data. The ability to manage how a
refinancing advertisement is periodically displayed on a popular Web site is of
much more interest to the advertising executive than it is to the average Web
user. Information must be evaluated based on the subjective needs of the primary
audience. Is the purpose of the content to generate revenue or provide education?
Does the audience comprise children or adults? Is the information for
entertainment? Business development? Formulation of policy?
As shown in Figure 1, you can start to determine the value of
any collection of data with any one of the three concerns: the audience (users),
the purpose of the information (context), or the information itself (content).
Figure 1: Intersection of concerns
for information management
Not only does the context of how the information will be used matter in
identifying data, but it also guides the capture and presentation of that
information. For example, a business executive may have a great deal of interest
in the financial health of a potential investment but can easily be overwhelmed by
a massively detailed financial statement, leading to guesswork rather than
reasoned decision-making. In this instance, the context is involved with how the
raw data must be summarized or modeled to remove extraneous information and focus
on the core question -- is this investment sound or risky? Context is all about
asking the right questions to understand the eventual use of managed information.
Content isn't always readily accessible, and it may not be in a manageable form.
Take, for example, the U.S. government 1040 tax form. Although many people take
advantage of direct electronic submission of tax forms, a substantial amount of
people still
mail their paper forms to the IRS office. If the mandate is to manage all tax
submissions electronically (and it soon will be), then the IRS information manager
is faced with the problem of converting the paper forms to electronic. The form of
the information will directly influence the management mechanisms, perhaps
limiting the possible solutions to non-computer-based management.
As another
example, consider your local library: The old card catalog has long since been
converted to electronic form, but the content (books, films, maps, and so on)
still consists of physical objects that must be stored on shelves or in boxes.
Finally, information users are a mixed bag. Some are information savvy and able
to find the exact right set of keywords, whereas others struggle to find useful
content. And not all users have the same needs; some prefer a complex, detailed
display, and others prefer a simplified presentation that allows a free-form
ability to browse. This is similar to the determined shopper who knows the exact
item he wants as compared to the shopper who simply wishes to look around and see
what might be of interest. An effective information-management policy must support
both kinds of users.
Skills and competency
Information can only be understood in context...
A key element of information management is the ability to identify valuable
information and to organize that information to best benefit a particular
audience. This task requires that you be able to think critically about what is
important and what isn't. Critical thinking represents the combination of
education, experience, and research.
Education provides a common starting point for both the information manager and
the expected audience; the language, navigation metaphors, categories, and
presentation form are all based on a common understanding between you and your
information users. Experience is gained from trying many approaches and
discovering which one works -- the classic "trial and error" method. Research lets
you benefit from the mistakes of others rather than learn from your own tedious
efforts. An information manager must be familiar with critical thinking to reason
about the collection of information to be managed. Judgment calls are required at
every stage of information management, from identification through categorization
to control, utilization, and archiving.
Information obeys the law of entropy...
Information repositories tend toward disorder unless acted on by an outside
force. Data must be continually managed, or it will fall into disarray. Consider
again the example of your local library. If patrons were permitted to not only
remove books from shelves, but also to put them back or add new ones, then
mistakes of omission or commission would lead to an unusable mess. Unfortunately,
many content-management approaches are little better than trusting the users to
regulate themselves (for example, "shared drives"), usually with disastrous
results.
Information repositories must be managed by a trained set of individuals who act
as librarians and keep the organizational policy working as planned. Primarily,
this role requires periodic review of the materials in the repository to ensure
that the established management policy is being followed. As information moves
through the management life cycle (discussed later in this article), it must be
constantly monitored to ensure that categories are used properly and consistently.
The information manager is responsible for knowing the established scheme and
choosing the best category for long-term information capture.
Information must be accessed to be useful...
Even if information is properly captured, skillfully organized, and masterfully
managed, it still must be accessed by a user to have any purpose. Access to
complex collections of information requires a simple but sophisticated
search-and-filter mechanism to avoid non-information (where the user is
shown results that aren't what was needed) or mis-information (where the
user is misled into believing the information is applicable).
A common example of the non-information is the 50 million search results when I
type my name into Google: The first dozen or so hits have nothing to do with me,
but rather are about someone who shares my name (and is apparently more popular).
The second issue, mis-information, is more insidious and is a result of trusting
the search algorithm too much. This can happen, for example, if you search online
for an inexpensive version of a product, such as a $25 Rolex from a "well-known"
provider. The repository manager should be well versed in techniques of
information searching (keyword, topic, authority source, category, and so on) and
information filtering (such as by statistical relevance of search word occurrence,
or narrowing words or phrases) to ensure accuracy of searches and search results.
Information is most useful when it has an aesthetic quality...
Presentation is often the most-overlooked aspect of information management.
Substance must always take precedence over style, but poorly presented content
runs the risk of confusing the users it was meant (often at great expense) to
serve. A repository manager should have at least a passing familiarity with
human-usability principles: appropriate use of color, font, layout, and
navigation. The best information model in the world will sit unused if the
repository's interface is confusing, complex, and unappealing.
Tools and techniques
Information management follows a life cycle...
Some information is always valuable, such as investment account balances; other
information has a defined period of time when it's valuable, such as plane
departure and arrival information; and still other data has value only
periodically, such as business intelligence. Nevertheless, all information has a
life cycle during which it's identified, captured, organized, controlled,
utilized, and eventually archived. Figure 2 illustrates these
six principle steps in the information life cycle.
Figure 2: Information life cycle
Identification
As mentioned earlier, the first step in information management is identifying
content to be managed. For example, if you're creating a repository of
requirements for a development team, the items of value can be initially
identified as business requirements, system requirements, and testing
requirements. Most if not all information to be managed falls into one of those
categories. This also provides you with an understanding of the data source, which
may be easy or difficult to manage, depending on the form and period of time
between updates. A frequently updated information source that has an inaccessible
format requires a much more sophisticated scheme than one that is periodically
updated in a readily accessible form. This approach also provides scope to the
effort, which prevents trying to manage everything having to do with developing a
new system or modifying an existing system.
Capture
With the information identified, the next step is to capture that information
into a manageable repository, where the content format dramatically affects the
storage needs. Assuming all the information of interest is binary (which isn't
always the case, even considering online content-management systems), then the
primary questions of storage are size and bandwidth. The size of the files
determines the principle storage needs (including backup) and the level of
bandwidth required for capture and eventual display. Large files, like video or
music, require a much larger storage space and delivery capacity. You can use the
following formula to estimate your needs:
File Storage Requirements = (average file size * number of files + index size *
number of indexes) * 2 (for backup needs)
If you use compression, then you can often divide the result by a factor up to 2,
depending on whether the files are already compressed (like JPEGs and MPEGs). Also
note that file metadata needs are usually a small fraction of overall storage
needs.
You can scale this simple calculation with a weighting factor against the average
if you need to accommodate some very large files. Storage needs are similar
regardless of the mechanism (database, network device [tape], or file system).
Remember that you must provide sufficient scaling for future needs and sufficient
bandwidth to accommodate user downloads of the content. As for processor power, if
the metadata associated with a file is properly indexed to the searches (hitting
only the indexes), then processor needs tend to scale linearly with the user load.
Organization
Organization of content means that all information must be tagged in some fashion
so that users can readily locate it later. This tagging may be as simple as
document title or as sophisticated as the Library of Congress metacategory method
(see Resources). In either case, it's a good idea to
develop a controlled vocabulary in a formal metadata definition document to guide
both the initial repository development and the acquisition of new materials. A
controlled vocabulary is a hierarchy of categorization labels that are
applied to all the information in the repository. For most purposes, a single
hierarchy is sufficient, such as for simple document retrieval; but you may need
to organize materials in a cross-referenced secondary hierarchy if multiple
content forms are stored (for example, the first dimension may describe the
content, and the second may denote content form -- Comedy/Video or
Documentary/Audio Books).
With any controlled vocabulary, choosing the granularity for each level of the
tagging hierarchy is a critical decision for both maintenance and information
navigation. This is the hardest part of organizing information and the one most
likely to cause long-term difficulties in adding new materials. The next article
in this series will address issues of abstraction and leveling that are important
to the development of a controlled vocabulary.
The ability to navigate and filter the return set from a repository search is
directly influenced by the selection of terms familiar to the end users. It serves
no purpose to establish a controlled organization of materials if that
organization doesn't make sense to your users. Be sure to spend time understanding
the nature of the information context when you're developing metadata tagging for
content.
Manage
Managing the repository involves updating materials periodically as older
materials are archived and newer ones are added. Depending on the technical
storage of the information (database, content management system, or file system),
the configuration change-control mechanism either is directly provided by the
storage software (such as for content-management systems) or must be layered over
the information storage (such as for a file system).
Configuration management provides multiple purposes for information management:
- Information is automatically provided with versions, letting you return to a
previous edition in the case of corruption.
- Configuration control lets you roll out sets of information as a group to the
production system. Consider a content-management system for Web advertising:
It's required to have ads appear for a defined period of time, often as a group.
The configuration-management system can track these collections regardless of
the number of controlled files and allow tagging for promotion to production.
- Configuration management lets you create multiple versions of the repository,
to better track against organization activities. For example, system-development
materials need to be version-controlled along with the releases of the code
base.
Utilization
As noted earlier, if end users can't effectively find the information they're
looking for, the repository won't be effective and will likely fall into disuse.
Proper utilization involves two interrelated functions: search and navigation.
Searching is based on the metadata associated with the repository
materials; index design based on the expected search categories dramatically
speeds discovery of properly labeled materials. Navigation is the ability
to rapidly move around the information space to locate related information. Users
aren't always sure what they want, so remember to let them browse directly from
the search result (such as by including links to related categories or refinements
on the search). Many commercial and other organization Web sites provide this kind
of support for shoppers who may or may not know exactly what they want.
Information presentation is also a key factor in utilization; this topic will be
addressed in the upcoming article on usability design. For information-management
purposes, presentation is involved in ensuring the accuracy of data. Accuracy
means ensuring that the tagged information belongs with the assigned category,
much like putting a book on the correct shelf. Presentation tools that let the
maintainer see and browse the content for a particular category are valuable,
especially where content is automatically captured from the information source.
Archive
The goal of archiving is preservation rather than ready access. Information
reaches the end of its life cycle when it begins to lose direct value to the user
community. At this point, it's no longer cost-effective to have the data take up
space in the primary information store; you should move the data to an archival
location where the long-term maintenance cost is reduced. Currently, that means
moving the content to either tape or disk-archive arrays.
Moving content means repeating the identification step, only in reverse; now
you're looking for information that isn't frequently accessed by the user
community and migrating that information to the archive, freeing up space for new
acquisitions. For one interesting long-term archival strategy, see the
Resources.
Milestones
The discussion so far leads to a set of key milestones for information
management:
-
Identify valuable content -- Locate and evaluate the value of
particular information content with regard to long-term storage.
-
Code and label content -- Label content according to your defined
organizational scheme, including hierarchical categories and controlled
vocabularies.
-
Review and approve the organizational scheme -- Be sure the
organizational scheme meets the end users' needs.
-
Storage -- Define and establish adequate storage technologies that are
sufficient for identified needs.
-
Publish -- Let end users search, navigate, and view information
content.
-
Archive -- Provide the ability to move inactive information into
long-term storage.
Conclusion
Information management is a huge topic that can involve discussions of
content-management strategies, distributed access, federated security, and much
more. This article has just skimmed the surface of this interesting field but has
provided a starting point you to create an effective information-management policy
for your specific needs. Future articles will introduce a variety of
organizational techniques: data modeling, distributed data collection, business
intelligence, and packaging information for sale to interested customers. It all
starts with the ability to recognize information of value and then organize that
information for storage, access, and ultimately presentation to a specific
audience.
Resources Learn
- Read Part 1 of this series, The data and content
dilemma.
- For a nice introduction to the topic of
information architecture, see
"Information Architecture 101: A crash course for the enterprise"
architect
by Tom Meyr.
- To see an approach to creating topic-based
controlled vocabularies for technical information, read the developerWorks article
Introduction to the Darwin Information Typing Architecture, updated September 2005.
- For an excellent description of information
architecture focused on Web site development, see
Information Architecture for the World Wide Web
by Louis Rosenfeld and Peter Morville (O'Reilly Publishing, 2006).
- To see one of the most extensive and
comprehensive information-tagging schemes in the world, visit the
Library of Congress
Classification
Web site.
- For more information on categorizing information
for libraries, see
The Dynamics of Classification Systems as Boundary Objects for
Cooperation in the Electronic Library
by Hanne Albrechtsen and Elin K. Jacob (Library Trends 47(2), 293-312).
- To see how information organization schemes
affect end users, see
The Kindness of Strangers: Kinds and Politics in
Classification Systems
by Geoffrey C Bowker (Library Trends 47(2), 255-292). See also
Sorting Things Out: Classification and Its Consequences
by Geoffrey C. Bowker and Susan Leigh Star (MIT Press, 2000).
- For more information on extremely long-term
archival strategies, spend some time investigating the
Long Now Foundation Web site.
- In the
Architecture
area on developerWorks,
get the resources you need to advance your skills in the architecture arena.
- In the
Information Management
area on developerWorks,
get the resources you need to advance your skills in the information management arena.
- Learn about the
DB2 family.
- Get Information Management
on demand
demos.
- Browse the
technology bookstore
for books on these and other technical topics.
Get products and technologies
Discuss
About the author  | 
|  | Benjamin A. Lieberman serves as the principal architect for BioLogic Software Consulting, a firm providing services on a wide variety of software development topics, including requirements analysis, software analysis and design, configuration management, and development process improvement. Dr. Lieberman is also an accomplished professional writer and author of The Art of Software Modeling and numerous software-related articles. Dr. Lieberman holds a doctorate degree in biophysics and genetics from the University of Colorado.
|
Rate this page
|  |