Select the business process for which the dimensional model
will be designed. Based on the selection, the requirements for the
business process are gathered. A business process require more than
one dimensional model.
In dimensional modeling, the best unit of analysis is the business
process in which the organization has the most interest. A business
process is a set of related activities. Business processes
are classified by the topics of interest to the business. When you
create a candidate list of high potential business processes, you
must prioritize the requirements. Examples of business processes are
customers, profit, sales, organizations, and products.
Business processes are not always business departments. For example,
consider a scenario where the sales and marketing department accesses
the order data. In this case, you would build a single dimensional
model to handle order data instead of building separate dimensional
models for the sales and marketing departments. If you create dimensional
models based on departments, you would store duplicate data. Duplication,
or data redundancy, can result in many data quality and data consistency
issues.
When you select a single business process (out of all of the possible
processes that exist in a company), you must prioritize the business
processes according to certain criteria. Criteria might include business
process significance, quality of data in the source systems, and the
feasibility and complexity of the business processes.
When you identify the business processes of a dimensional model,
you collect the following metadata:
- Business requirements for the selected business for which you
will design the dimensional model
- Business processes
- Owners
- Source systems that will be used
- Data quality issues
- Common terms used across business processes
- Other business-related metadata
Dimensional models are typically used in two environments: the
data warehouse and OLTP systems.
- The data warehouse and the dimensional model
- When you partition the data in a data warehouse, you divide the
data based on subject area. A data warehouse is subject-oriented.
The data warehouse contains specific, selected subject areas in the
organization, such as customer and product. For a practical implementation
of a data warehouse, the most important data is contained in a specific
business process. This requirement is quite different from OLTP requirements.
Queries in the data warehouse environment are more strategic in nature
and ask questions that are concerned with a larger scope. An example
of a query is, "What products are selling well?" or "Where are my
weakest sales offices?" To answer those queries, the data warehouse
is structured and oriented to subject areas such as product or organization.
These subject areas are the most common unit of logical partitioning
in the data warehouse.
- OLTP systems and the dimensional model
- In the operational environment, you partition data by application
or function because the operational environment is built around transaction-oriented
applications that perform a specific set of functions. The objective
of the operational environment is to perform those functions as quickly
as possible. If there are queries performed in the operational environment,
they are more tactical in nature and should answer questions that
are concerned with that instant in time. An example query could consist
of the question, "Has the check from Mr. Smith been processed?"
Create and study the enterprise
business process list
Create a complete enterprise-wide business
process list. Consider the following assessment factors when you create
your list:
- Complexity of the source systems of each business process
- Data availability of these systems
- Data quality of these systems
- Strategic business significance of each business process
- Other factors that are important to your business processes
Tip: It might be helpful to assign values to each
assessment factor and business process. When you assign values, you
can determine the priority of each business process.
Identify the business process that
you want to model
Prioritize the business processes. You should
identify the most and least feasible processes when you build a dimensional
model. This step summarizes the assessment factors that you determined
above. The processes that are the most significant to the business
should be modeled first.
Identify high-level entities and
measures that are common across several processes
Determine
the high-level business entities that are involved in each process.
Determine which entities are common across several business processes.
After the common entities are identified, you can tie business processes
together through these common (shared) dimensions.
To create shared
dimensions that are used across the enterprise, you must ensure that
the various parts of the business agree on the definitions for these
common entities. This process can take some time, because the definitions
for common entities can vary between various parts of the business.
You should define the common entities early, because if you need to
change this definition in the future, existing applications can be
impacted.
A data warehouse must provide consistent information
for queries that request similar information. One method to maintain
consistency is to create dimension tables that are shared and used
by all applications and data marts (dimensional models) in the data
warehouse. Candidates for shared dimensions include customers, time,
products, and geographical dimensions, such as the store dimension.
Developing a set of shared dimensions is a significant challenge.
Any dimensions that are common across the business processes must
represent the dimension information in the same way. That is, the
information and underlying data must be shared. Each business process
will typically have its own schema that contains a fact table, several
shared dimension tables, and dimension tables that are unique to the
specific business function.
Identify data sources
Identify the data sources that are involved with the business processes.
A dimensional model is created from one of the following sources:
- An enterprise-wide data warehouse
- OLTP source systems (in the case of independent or dependent data
mart architectures)
- Independent data marts (in this situation, you might be interested
in consolidating the independent data marts into another data mart
or data warehouse)
Select the requirements gathering
method
Requirements are typically difficult to define. Typically,
only after seeing a result can you decide that the result does (or
does not) satisfy a requirement. The requirements of an organization
also change over time. What is valid one day might no longer be valid
the next day. Regardless, you use the requirements identified at this
point in the development cycle to build the dimensional model.
The questions are:
- How can you build something that cannot be precisely defined?
- How do you know when you have successfully identified the requirements?
Although there is no definitive test, you can typically begin
the modeling process if your requirements address the following questions:
To gather the full set of requirements, you should consider the
following questions:
- Who are the people, groups, and organizations of interest?
- What functions need to be analyzed?
- Why is the data required?
- When does the data need to be recorded?
- Where, geographically and organizationally, do relevant processes
occur?
- How is performance of the functions measured?
- How is performance of the business process measured? What factors
determine the success or failure?
- What is the method of information distribution? Is it a data report,
paper, email, or another method?
- What types of information are lacking for analysis and decision
making?
- What steps are currently taken to fulfill the information gap?
- What level of detail would enable data analysis?
In general, most of the methods for deriving business
requirements conform to one of two approaches: source-driven and user-driven.
- Source-driven
- Source-driven requirements gathering is based on
defining the requirements by using the source data in production operational
systems. You can define the requirements by analyzing a source data
model if one is available or the actual physical record layouts and
selecting data of interest.
The major advantage of this method is
that you know from the beginning that you can supply all the data,
because you are already limiting yourself to what is available. A
second benefit is that you can minimize the time required by the users
in the early stages of the project. However, there is no substitute
for the importance and value you get when you involve the users.
Of course there are also disadvantages to this method:
- By minimizing user involvement, you increase the risk of producing
an incorrect set of requirements.
- Depending on the volume of source data that you have, and the
availability of source models for the data, this can also be a very
time-consuming method.
- Some users might need to access data that is currently unavailable.
Without the opportunity to identify all of the requirements,
there is no chance to investigate what is needed to obtain external
data.
External data is data that exists outside the enterprise.
Even so, external data can often be of significant value to the business
users.
The result of the source-driven method is to provide
what you have, which is appropriate in at least two cases:
- The method can be used to develop a fairly comprehensive list
of the major dimensions of interest to the enterprise. If you plan
to create an enterprise-wide data warehouse, this could minimize the
proliferation of duplicate dimensions across separately developed
data marts.
- Analyzing relationships in the source data can identify areas
on which to focus your data warehouse development efforts.
- User-driven
- User-driven requirements gathering is a method based
on defining the requirements by investigating the functions the users
perform. This is usually done through a series of meetings or interviews
with users.
The major advantage to this method is that the focus
is on providing what is really needed, rather than what is available.
In general, this method has a smaller scope than the source-driven
method. Therefore, the user-driven method generally produces a useful
data warehouse or a data mart in a shorter time span.
However,
expectations must be closely managed. The users must clearly understand
that some of the data they need can simply not be made available for
a variety of reasons. However, you should not to limit the things
for which the user asks. Alternative ideas should be promoted when
you define the requirements for a data warehouse. These requirements
prevent you from eliminating requirements simply because you think
they might not be possible. If a user is too tightly focused, you
can miss useful data that is available in the production systems.
User-driven requirements gathering is typically the method of
choice, especially when developing dependent data marts or populating
data marts from a business-wide enterprise warehouse.
Gather the requirements
When you gather the requirements, the needs of business users are
collected and documented. When you gather the requirements, you study
the business processes and information analysis activities with which
users are involved. A user typically needs to evaluate or analyze
some aspect of the business. Focus your efforts on gathering the two
key elements of analysis that business users are involved with on
a day-to-day basis:
- What is being analyzed?
- What are the evaluation criteria?
When you gather the requirements, you should work to understand
the problem domain for which the modeling is done. Typically, the
requirements at this stage are documented informally, and schemas
are not fully detailed. When you gather these requirements, identify
the following areas of interest:
- Identify the most important questions that the business needs
to address. You can assign importance values to each question to determine
the most important questions to address.
- Determine how the business wants to record the data as it changes.
For example, you will want to know how to manage historical data for
discontinued products or employee records.
Analyze the requirements
Analyze the business requirements. Determine informal requirements,
and set up high-level measures and high-level entities. These objects
might become dimensions when you model the data. Use this draft as
a starting point to filter down the entities and measures. You can
start to draft the structure of the requirements, drafting dimensions,
hierarchies, and the measures for each part of the data model.
Summarize the business process
analysis
Create a report from the analysis. The report should
contain the following information:
- Business process listing
- Business process prioritization
- High level entities and measures, which are common between various
business processes
- Business process identified for which the dimensional model will
be built
- Data sources listing
- Requirement gathering which contains all business process requirements
- Requirement gathering analysis
- High level entities and measures identified from the requirement
analysis