 | Level: Introductory Benjamin Lieberman , Ph.D., Principal Software Architect, BioLogic Software Consulting
13 May 2008 Valuable business information should never be left sitting around.
It should be organized and saved into a permanent data store. In the past, these
data stores were represented by large filing rooms filled with cabinets. Today, the
same goal is achieved by using relational and other types of electronic databases.
Just like those dusty file systems of yore, a legacy database tends to become the
final resting place for useful business information—and this information is
essentially lost, because it can't be accessed in a meaningful way. Data-store
design can help you establish an efficient mechanism to store and retrieve valuable
business information.
Consider the design of data stores from the perspective of how the information
will be used. Organizations use information for many different reasons, with three
general purposes being the most common: monitoring, operations, and control.
Monitoring involves the collection of measurement information on the performance
of the business, adherence to internal and external regulations, satisfaction of
customer needs, return on investment to stakeholders, and other relevant metrics.
This information may be time-constrained in that it has value only during a
specific window of risk or opportunity, such as a system outage. Or it may have
long-range value, such as the effect of cost-reduction strategies.
Operational data is involved with the day-to-day running of the organization and
supplying value to customers. This information covers the vast bulk of data
collected by a business, including orders, shipping records, inventory, accounts
receivable and payable, and a variety of other transactional data. Operational
data is the lifeblood of a business, recording the actions of the business in
pursuit of the business's goals.
Control data is used for decision making and is usually compiled from the other
two sources. For example, an executive may require information about the rate of
order delivery compared to the cost of supporting delivery operations to decide
whether to invest in a new order-tracking system. Control data is also important
for reporting on the overall capabilities of the organization to the stakeholders
and investors. Information from monitoring and operations is combined to
illustrate cost centers, revenue-generation potential, and other critical key
indicators to the senior management.
Skills and competencies
Each different type of data use calls for a different storage approach and
associated data-store design. Design considerations such as storage sizing,
performance, level of data detail, indexing, and access are all affected by the
different ways data is used.
These data uses are summarized with the focus of data capture and key business
drivers:
- Monitoring
- Focus: Performance tuning
- Driver: Timeliness
- Operations
- Focus: Transaction management
- Driver: Data integrity
- Control
- Focus: Data manipulation
- Driver: Accuracy
Because of the many design considerations affected by how the data will be used,
various skills are necessary to support a cohesive data-storage
strategy. Monitoring data-store design is mostly concerned with data collection.
For example, a real-time flight monitor must be capable of capturing the aircraft
flight status to the second on multiple flight parameters (airspeed, ground speed,
altitude, attitude, flight control surface position, and so on). Storage must be
rapid, especially if the system of interest is monitoring for abnormal conditions,
such as a sudden drop in altitude or airspeed. Individuals tasked with design of
these data-storage systems must be thoroughly familiar with data-capture
characteristics of the available devices (which may be embedded systems), as well
as the display of monitoring information to the operational support staff. They
may also need to consider the amount of collected data, because storage may be at
a premium.
Operational support data design is the most common task in a business system,
requiring a balanced set of skills among performance, storage, and ease of access.
The skills required include an understanding of basic relational data mechanics,
such as normalization, metadata attributes, and indexing. Operational information
is used by all aspects of the business, so it's vital to understand the system
demands placed on the data. Index strategies, proper table joins, prefetching,
caching, and other techniques permit rapid access to stored information.
Decision-support systems rely on data collected from other data stores as the
basis for processing and analysis. Skills in the design of data-warehouse
extraction (including data-field mapping), transformation, consolidation, and
efficient data loading are all critical to the design of systems providing
business performance reports.
Along with the necessary technical skills and experience, it's also important for
data designers to have strong familiarity with the business domain. Many critical
design decisions are directly influenced by the utilization of the data. Questions
about indexing strategies, performance-tuning approaches, referential integrity,
and critical design issues require a deep appreciation for the quality of the data
source and the intended target audience. For example, in developing a data store
for an over-the-air cell phone programming system, the designers considered how
much transactional data was required to track the success rate of programming
attempts. The primary goal of the system was to transmit cell phone configuration
data for newly reprogrammed mobile units; only limited requirements were provided
for the tracking of each programming attempt. By investigating the problem domain
more deeply, the designers found that the programming could fail for a variety of
reasons (the cell user could enter a tunnel or hang up the unit; the unit could
impact a solid surface at a high rate of speed; and so on), not all of which were
under the system's control. Only by storing a detailed step-wise history of each
transaction was it possible to tune the system for a maximum success rate.
Tools and techniques
When you consider the technical aspects of data-store architecture, the focus is
on the business architectural drivers and how the data-store implementation is
influenced by intended data use. This section focuses on the data architecture
drivers and constraints, data modeling, data design, and performance.
As noted earlier, the primary concern for monitoring data systems is the speed of
capture. In terms of the drivers listed earlier, the most critical items are data
capture and database sizing. When you're considering a data-store design to
capture monitoring information, some solutions may not involve a formal relational
database. For example, consider the case of monitoring airplane dynamics; one
solution would be to implement a B-tree that stores to a flash drive on the
monitoring device. This is an efficient way to rapidly store and retrieve
information (a B-tree runs in logarithmic O(TlogMN) time for
insertions and retrieval, which is an efficient method for data organization and
is often used in indexing strategies), while providing semipermanent storage in a
small space.
For operational data stores, the focus is on both speed of access and accuracy of
transactions. For these purposes, a relational database is almost always the
technique of choice. Modern relational databases handle concurrency, transaction
management, efficient data storage, security, and backup-and-restore operations.
The main choice is between cost and capability: an open source database may be
acceptable to an entrepreneurial venture due to low cost of ownership, whereas a
Fortune 500 company may demand the highest performance and vendor support of a
top-of-the-line commercial product. Operational systems are focused on maintaining
accuracy via highly normalized data structures, with associated indexing support
for both efficient retrievals and network capabilities for fast transfer of large
amounts of data (such as storage area networks).
 | |
Finally, for decision making, the data store must support accurate, up-to-date
information and allow for complex manipulations. The typical choice is a managed
data warehouse, hosted on separate hardware from the operational or monitoring
data stores and updated using some form of automated extraction, transformation,
and loading. Many data warehouse choices exist, including IBM®
InfoSphere™. These solutions focus on the transfer of information from
operational or monitoring data stores, the transformation of that information
based on business needs for reporting, and the display of that information as a
set of reports to support business decisions.
Data-store logical and physical modeling is focused on the capture of the
business domain in such a way as to facilitate the development of accurate and
efficient data structures. As noted, this doesn't always result in an entity
relationship model (the most common form of database modeling) but may instead
focus on packaging information for storage. In the monitoring example, the data
model should represent the information stored with a minimal transformation of
that data, even when there is duplication. This supports the driving concern over
rapid storage rather than efficient retrievals or transaction management. In
contrast, an operational data model frequently results in a standard normalized
entity-relationship model, with minimal duplication and strong referential
integrity for the data. Modeling for data warehouses should focus on mapping
information from input data stores and transforming that information based on the
questions to be answered (which may result in data redundancy to support faster
processing).
Data-store implementation is driven by the following practical considerations:
- Physical storage (relational, hierarchical, tagged, object)
- Network layout (stand-alone, clustered, grid)
- Operational support (user and application security, deployments, housekeeping)
- Data conversion (mapping source to target, data cleansing, configuration data)
- Data warehouse (extraction, translation, loading)
Each of these considerations is influenced by the intended use of the data store.
The physical layout for an operational database may be highly distributed to
support business functions, whereas the decision-making system may be located in a
single data-management center to minimize maintenance costs.
When you're considering a data-store architecture, it's important that the system
performance be adequate for the task at hand. You should do extensive performance
testing to ensure that the system will operate effectively even in unexpected
conditions, such as high client concurrency or the loss of one or more data
clusters.
The word performance can mean different things to different people, but in
general it's measured by the data store's ability to provide the level of service
expected for the intended use. Many factors influence a data store's ability to
perform effectively, each of which must be considered during design and
implementation:
- Performance testing
- Access or response time per request
- Deadlocks
- Table scans (thrashing)
- I/O utilization
- Transaction throughput
- Referential integrity
- Indexing, hints, and query optimization
Performance testing should be an integral part of the data-store design. Many
open source and commercial products are available to assist you in evaluating
potential data-store solutions.
Milestones
The development of an organization's set of data stores is based on the same best
practices as any other system development, starting with the investigations of
data needs and followed by identifying candidate data-store architectures,
iteratively developing the data-store hardware and schema, testing, and release
into production. The primary difference in the development of a data store is that
it is often the target of other software activity, rather than directly
interacting with end users. Consequently, the data-store development team must be
in close contact with the software development group to ensure that the correct
data-store elements (such as tables and columns) are available when needed.
Data stores in production have a very different management and maintenance
schedule than the software or data-warehouse reporting team. The most frequent
request is to correct data issues that may have occurred due to a system failure
or input of invalid data. For database administrators, the second most frequent
task is to tune the database for better performance, either for direct report
queries or to better support client applications.
Resources Learn
Get products and technologies
- Download
IBM product evaluation versions
and get your hands on application development tools and middleware products from
DB2 ®, Lotus ®, Rational ®, Tivoli ®, and WebSphere
®.
Discuss
- Participate in the
IT architecture forum
to exchange tips and techniques and to share other related information about the
broad topic of IT architecture.
About the author  | |  | Benjamin A. Lieberman serves as the Principal Architect for BioLogic Software
Consulting. Dr. Lieberman provides consulting and training services on a wide
variety of software development topics, including requirements analysis, software
analysis and design, configuration management, and development process
improvement. Dr. Lieberman is also an accomplished professional writer with a book
(The Art of Software Modeling, Auerbach Publishing, 2006) and numerous
software-related articles to his credit. Dr. Lieberman holds a doctorate degree in
Biophysics and Genetics from the University of Colorado, Health Sciences Center,
Denver, Colorado. |
Rate this page
|  |