Skip to main content

Fundamental Information Aggregation concepts

Overview

The Information Aggregation pattern, also known as User to Data, facilitates user access to and manipulation of data. Information Aggregation functionality is often also called one of the following names:

Designing applications that automate the Information Aggregation business pattern can be challenging for many reasons. User requirements in this area of e-business tend to be vague and constantly changing. Prioritization of needs across a company is challenging. Substantial infrastructure is often necessary. And often, several business intelligence applications must be built simultaneously. Some of these may have common data needs while others may have conflicting needs.

To overcome challenges like these, best practice suggests that two separate steps be used to aggregate and distill structured and unstructured data. To implement the Information Aggregation business pattern you may need to pre-populate either derived data stores or indices before you can execute the User Information Access pattern. This separation allows for greater flexibility in changing either the population function or the information access function without impacting the other. The separation further promotes component reuse as well, one of the main goals of the Patterns for e-business model.

The Population and User Information Access functionalities of the Information Aggregation business pattern are first described in general terms here.

Population

Population involves designing and creating applications to extract, cleanse, restructure, and move data into or between appropriate data stores. The population step is needed if the required data does not already exist in the appropriate data store, or if the data is not in an optimal form to satisfy the user's needs.

Consider a large grocery store chain. Location managers of these stores would like to receive a daily report summarizing perishable items that must be placed on sale in order to clear the inventory before these items expire. Such a Decision Support System (DSS) would need to distill information from a vast inventory data store. Inventory data is most likely not optimally structured to run such reports. A Population application must be developed that extracts the relevant data from the Inventory Management System and structures it in a way that facilitates optimal access. In this scenario the Population application pattern primarily deals with structured data.

As another example, consider a financial services portal that aggregates securities analysis from multiple sources and categorizes such information into different folders. In this scenario, the population step involves crawling selected Web sites for specified information, creating an index of selected articles and categorizing them. This example identifies the need for a population step in aggregating and distilling meaningful information from unstructured data.

The patterns for population can be found under Application Integration::Data Integration on this web site.

User Information Access

User Information Access involves designing and creating the user interface and processes for unraveling relevant information from raw data to meet the business needs of the user. User Information Access applications cover a wide range of functions, from simple queries to complex data mining.

The specific business functionality supported by applications that automate the Information Aggregation business pattern vary from one industry to the other. Yet a closer survey of such applications in multiple industries reveals certain common approaches that have been successful. The following Application patterns document these repeatable, successful solutions.

Information Aggregation User identification

Anyone involved in decision-making processes can use applications built according to the Information Aggregation pattern. Often users access data in a read-only format to inform themselves for decision-making tasks. Sometimes they create new data to explore alternative scenarios.

Users of data might be within an organization or external to it. Internal users include executives, managers, and business analysts. These users access data to analyze the long-term performance of a business. Managers of operational departments, marketing and sales personnel, call-center employees, and others all use informational systems to make judgments on short to medium-term business actions. Increasingly, internal data users include employees who work off site and connect to company data using the Internet, an extranet, or through a dial-in mechanism.

External users include customers, partners, agents, and others who are given access to portions of company data to help them in their interactions with the company. Personal customers connect to this data using the Internet. Partners and agents might use an extranet connection to this data for added facilities and security.

The Information Aggregation Usage page contains more information on the business uses of information as facilitated through the Information Aggregation Pattern.

Identification and storage of structured data

While in theory the data needed by users of an Information Aggregation solution includes all a company's data, best practice dictates that the data required for informational purposes be copied into a separate environment from the enterprise's operational data and structured according to the needs of the information access environment. The entire set of such data is usually distributed widely within the information technology systems of the company and on workstations connected to the company's IT systems directly or through intranets, extranets, or the Internet.

The contents, structure, placement, and relationships of data stores (often called the data architecture of the environment) are the key design points for Information Aggregation applications. Best practice data architecture identifies the following key types of data stores and relates them as shown.

Storage of structured data

In the data architecture depicted above, business data, sourced from operational systems and external data sources, is stored in three distinct types of data store:

Additionally, metadata (descriptive data about the business data and applications) is stored in the business warehouse catalog.

Functionality and Application Needs

In addition to defining the data architecture, we need to define the functionality required both to create and manage these data stores and to allow the users to access and use the data contained in them.

The specific operations users perform within a business determine the functionality required to access and use data. The function a user requires is often called an application, and will be termed the Business Intelligence (BI) application hereafter. The black arrows in the figure above represent the BI application. BI application functionality covers a wide range of functions, from simple queries to complex and comprehensive applications.

The Information Aggregation Application Classes page contains additional information on the types of applications that can be built using the Information Aggregation pattern.

Recommended reading

Data Warehouse - from Architecture to Implementation, (1997) Addison Wesley, by IBM Distinguished Engineer Dr. Barry Devlin, provides a detailed and comprehensive description of a data warehouse architecture and recommendations for implementing it.

What's Next

If you've determined that the Information Aggregation business pattern can provide an appropriate solution design for your business need, the next step is to select an Application pattern. The Information Aggregation business pattern can be implemented using the base Application pattern or its two variations, providing solution flexibility so that the determined pattern can address the specific needs of the business process being automated. The next step is to select a User Information Access application pattern. If your choice indicates the need for a derived data store (e.g. data warehouse, data mart etc) or index you will later probably want to review the Application Integration::Data Integration application patterns.

Management information system (MIS)

Management information system (MIS) is one of a number of older synonyms for applications and data used to support decision-making and business management processes, now broadly called business intelligence systems.

See Also

Business Intelligence (BI)

Business Intelligence (BI) is the gathering, management and analysis of vast amounts of data in order to gain insights to drive strategic business decisions, and to support Operational processes with new functions.

BI is about the development of information that is conclusive, fact based, and actionable. It includes technology practices like data warehouses, data marts, data mining, text mining, and on-line analytical processing (OLAP). The objective of a BI solution is to transform data into useful information, such as customer profiles, buying habits, product profitability and competitive analysis. It may involve analyzing volumes of data for unsuspected, but valuable, associations and insight. It includes streamlining data into useful reports and sharing that information with people inside and outside the organization who need that information.

However, implementing a successful BI initiative is not as simple as just installing the required technology. It is imperative that the business objectives for the project be clearly defined at the outset and that the project has upper management's complete support. At this point, the technological solution can be developed, and the expected benefits of undertaking the project quantified. By predicting the return on investment expected from a project, management will have a means by which to measure the success of the project. Equally as important is the communication that must take place between a company's IT staff and the end users on the business side of the company. A data warehouse will not be a success if the end users are not fully aware of the many ways they can use it.

Assuming that the overall strategic business approach is sound, the next important factor is the use of reliable, consistent data. Too many business decisions are being made using data of dubious quality and consistency. The process of extracting data from disparate sources, transforming it into a consistent record -- also known as cleansing -- and then loading it into a data warehouse is a critical process. Another important aspect is the data model, the map of how the database is organized and how its fields relate to each other. Using tried and tested models specific to the needs of a given industry will greatly enhance the effectiveness of the solution.

See Also

Decision support system (DSS)

Decision support system (DSS) is one of a number of older synonyms for applications and data used to support decision-making and business management processes, now broadly called business intelligence systems

See Also

Data warehouse

"Data warehouse" is a generic term usually used to cover all of the components involved in the provision of business intelligence, from data acquisition to the data marts, as well as the metadata and services elements of business intelligence. It can also be used as a synonym for business data warehouse, or to represent a simple unlayered data warehouse structure.

See Also

Data acquisition

Data acquisition includes the processes, tools, and services responsible for the acquisition and reconciliation of data from the operational systems and other sources into the business data warehouse. It includes the capture or extraction of data from these sources, its transformation according to predefined rules, and its load or application to the BDW.

See Also

Data mart

A data mart is a data store defined and designed to meet the information needs of a department or group of users. It contains needed data, detailed or summarized, and preferably sourced from the business data warehouse. Data marts are the primary sources of information for users, are optimized to satisfy their query or reporting needs, and are usually used in read-only mode.

See Also

Metadata

Metadata is information that describes the meaning and structure of business data, as well as how it is created, accessed, and used.

Data warehouse services

Data warehouse services include the processes, tools, and services needed to manage and run the data warehouse environment, including archive/restore, security, database management, process management, and so on, but excluding data acquisition and population.

See Also

Business data warehouse (BDW)

A business data warehouse is a data store containing detailed, reconciled, and historical basic business data, structured according to an enterprise data model and designed to be the single, consistent source for all information required for business intelligence purposes. It is guaranteed to be integrated and consistent across the breadth of the business and to cover the span of required history of the business. The business data warehouse is seldom accessed directly by end users and then solely in read-only mode.

See Also

Business Intelligence (BI)

Business Intelligence (BI) is the gathering, management and analysis of vast amounts of data in order to gain insights to drive strategic business decisions, and to support Operational processes with new functions.

BI is about the development of information that is conclusive, fact based, and actionable. It includes technology practices like data warehouses, data marts, data mining, text mining, and on-line analytical processing (OLAP). The objective of a BI solution is to transform data into useful information, such as customer profiles, buying habits, product profitability and competitive analysis. It may involve analyzing volumes of data for unsuspected, but valuable, associations and insight. It includes streamlining data into useful reports and sharing that information with people inside and outside the organization who need that information.

See Also

User-to-Data

The User-to-Data pattern encompasses the provision of Business Intelligence (BI) capabilities to an organization.

The user is someone connected to the data through one of four paths:

  • Internet: the user is external to the company, or an agent of the company. Note the possibility of a ‘chaining’ effect: the external user can trigger the user-to-data scenario while the user actually connected to the data is an internal staff member or agent. For example, a customer phones a query into an organization and a staff member connects to a data store to respond to the query. In this scenario, the user is defined as the staff member, not the customer.
  • Intranet: a staff (internal) user.
  • Extranet or privileged Internet: an associate of the client’s organization who acts as a business agent on the company’s behalf.
  • Fat client-connected as in a client-server system: applies to internal users only.

The data can be held in a:

  • Web-content store: holding Web pages and cached information. The data may include copies of operational detailed records, such as consolidated account information for a customer reference. The data is read-only; if an update of the data is required, the pattern is User-Business, not User-Data.
  • Data mart: the data may be read-write with local scope-of-effect only.
  • Data warehouse: the data is read-only for applications.
  • Tool-specific store (proprietary): some tools, such as Essbase, require specialized data stores for efficiency.

This pattern has several distinguishing characteristics:

  • The user is not connecting to a traditional transactional system performing an operational business process.
  • The user perceives himself to be interacting directly with the data rather than with a system.
  • Normally, the user has significant freedom and flexibility in her access of available data. The data sets are often specially prepared in advance to suit the user or the tool she is using.
  • The data sets being accessed:
    • are not the company’s prime operational data.
    • include a copy of relevant operational data and other data as necessary.
    • include a historical set of data.

See Also

Content navigation