 | Level: Introductory Jeff J. Li (liji@us.ibm.com), Advisory Software Engineer,
Department13
21 Feb 2008 The SAP® NetWeaver® Business Intelligence (SAP BI) is an integral part
of the SAP NetWeaver 2004s application server. Designing extract, transform, and load (ETL) jobs to load data into
and extract data from SAP BI is often a time-consuming task. This article
provides a step-by-step guide on how to use the IBM® Information Server to load
data into the SAP BI. It also explains in detail the procedure used to leverage the IBM Information Server to extract data from the SAP BI.
Introduction
SAP BI provides an Enterprise Data Warehouse solution for SAP customers to consolidate
and analyze their business data. Enterprises can use the SAP BI to read into their
business, react to the market changes, and gain competitive advantages. In SAP NetWeaver
2004, SAP BI was called SAP Business Information Warehouse (SAP BW).
Building an SAP BI data warehouse is a complex project. It includes various activities
such as planning, data modeling, data sizing, ETL job design, and performance tuning. Designing the ETL jobs to load data into and extract data from SAP BI is often the most time consuming task.
IBM Information Server is a unified and comprehensive information integration platform.
Businesses can use IBM Information Server to connect to various data sources, retrieve
and process data contents, and deliver cleansed and high quality information. In the SAP BI projects, IBM Information Server can be leveraged as an efficient ETL tool to process a large volume of data and build the enterprise data warehouse.
Appendix A explains the terminology used in this article. Appendix B lists the tools for creating the examples shown in this article.
Product prerequisites and installation
IBM Information Server includes many software products for data integration and
analysis tasks. Those products include WebSphere® DataStage®, WebSphere
QualityStage®, WebSphere Information Analyzer, WebSphere Federation Server, and other companion products. Depending on the specific project requirements, you can choose to install a subset of the products in the IBM Information Server.
Figure 1. Software products needed for
designing ETL jobs for SAP BI and SAP BW
Figure 1 shows the minimum set of IBM Information Server products needed to design ETL
jobs for the SAP BI or BW data warehouse.
-
WebSphere DataStage, which includes:
-
DataStage Client
-
DataStage Server
-
DataStage Metadata Repository
-
DataStage Domain Server
WebSphere DataStage products can be installed separately on different hosts or installed
on the same host.
-
WebSphere DataStage Pack for SAP BW (DataStage BW Pack)
The DataStage BW Pack is a companion product of the IBM Information Server. The pack was
originally developed to support SAP BW and currently supports both SAP BW and SAP BI. The
GUIs of the DataStage BW Pack are installed on the DataStage Client. The runtime part of
the Pack is installed on the DataStage Server.
-
SAP Remote Function Call (RFC) Library
The SAP RFC library is an external component to the IBM Information Server. The DataStage
BW Pack uses the SAP RFC interface to call SAP BI and SAP BW functions. The SAP RFC
library is a prerequisite for using the DataStage BW Pack and must be installed on both the DataStage Client and Server.
Architecture overview
The software components in Figure 1 play different roles in designing and executing the ETL jobs for SAP BI.
-
The DataStage Client and DataStage BW Pack GUI components provide a friendly user
interface to design ETL jobs and to set up the data operations to be performed on SAP BI systems.
-
The DataStage Server and DataStage BW Pack Server components enable users to schedule and run the ETL jobs.
-
The DataStage Domain Server manages user accounts and authorizes users to use different
features of the IBM Information Server.
-
The DataStage Metadata Repository is a database for storing and sharing tables, fields, or object definitions.
The DataStage BW Pack includes four major components:
-
BW Load Stage: Loads data from non-SAP data sources to an SAP BI
system. BW Load Stage is an SAP-certified, data-loading integration solution implemented using the SAP Staging BAPI interface.
-
BW Extract Stage: Extracts data from an SAP BI system. It is an SAP-certified,
data-extraction integration solution based on the SAP Open Hub Service interface.
-
BW RFC Server: Implements various functions that are invoked by an SAP BI system.
It accepts the SAP BI initiated data-loading or data-extraction requests and triggers the DataStage jobs to execute the corresponding data operations.
-
BW RFC Manager: Manages the BW RFC Server processes. It creates one BW RFC Server
process per source system. It also provides the functions to start or stop BW RFC Server processes. A source system represents a logical or physical system that is external to an SAP BI system. A source system provides source data to an SAP BI system or accepts extracted data from an SAP BI system.
Load data into SAP BI
DataStage jobs can be designed to retrieve, cleanse, and consolidate the data from non-SAP sources and load the data into SAP BI systems. For example, you can extract the customer data from your CRM applications and then look up the purchase orders for your customers in your purchase order applications. The consolidated purchase orders can be loaded into your SAP BI system for analysis.
This section uses a simple ETL job to illustrate the steps necessary for loading data
into an SAP BI system. Figure 2 shows the sample job. The job extracts customer data
from an Oracle® database table using an ODBC Stage. It passes the extracted data to
the BW Load Stage LoadDataToSAPBI, which loads the processed data into the CUSTOMER
Characteristic in an SAP BI system. Table 1 shows the sample data in the Oracle database
table. The CUSTOMER Characteristic is created in the SAP BI using the SAP Data Warehousing Workbench, which is shown in Figure 3 and Figure 4.
Figure 2. DataStage job for loading data into SAP BI
Table 1. Sample data in the Oracle database table
|
ID
|
NAME
|
STREET
|
CITY
|
REGION
| |
0000001
|
IBM
|
New Orchard Road
|
Armonk
|
NY
| |
0000002
|
Microsoft
|
One Microsoft Way
|
Redmond
|
WA
| |
0000003
|
SAP America
|
3999 West Chester Pike
|
Newtown Square
|
PA
|
|
POSTALID
|
COUNTRY
|
PHONE
|
FAX
|
TAXNUM
| |
10504
|
USA
|
800-426-4968
|
866-722-9226
|
000000000
| |
98052
|
USA
|
800-642-7676
|
425-936-7329
|
000000000
| |
19073
|
USA
|
610-661-1000
|
404-943-2950
|
000000000
|
Figure 3. CUSTOMER Characteristic in SAP BI
Figure 4. CUSTOMER Characteristic in SAP BI
— Attribute tab
Figure 5. Data flow diagram of BI data load operation
Figure 5 illustrates the data flow diagram of a BI data load operation.
-
A source system is defined to represent one or more DataStage jobs loading data into the
SAP BI. A data transfer structure describes the data available in the source system. A
DataStage job loads the data from external data sources into an SAP BI Persistent Staging
Area (PSA) staging table.
-
Transfer rules are defined to transfer data from the staging table into an InfoSource. An InfoSource is a collection of data fields treated as a single unit. The communication structure defines the data fields of the InfoSource.
-
Update rules are created to transform the data from the InfoSource to one or more BI data targets (InfoObjects, DataStore objects, or BI InfoCubes).
The BW Load Stage provides a Stage Editor, as shown in Figure 6. The Stage Editor contains several tabs for setting various parameters of the BI load operation.
-
The General tab creates or selects an RFC connection to the SAP BI.
-
The Transfer Structure tab creates or selects a source system and an InfoSource.
-
The Columns tab displays the column definitions of the data being sent to the SAP BI.
-
The InfoPackage tab defines how and when the DataStage job loads data into the SAP BI.
-
The Process Chain tab enables you to run the data load operation as a process within a BI
process chain. A process chain provides the workflow function. It is used to design and schedule a series of dependent data-processing processes.
Figure 6. BW Load Stage
Editor
The BW Load Stage LoadDataToSAPBI in Figure 2 loads the data in Table 1 into the SAP BI. Multiple steps are involved in setting up the BW Load Stage. Those steps are illustrated in Figure 7. They are described in detail in the following sections.
Figure 7. Setting up the BW Load Stage
LoadDataToSAPBI
BW connection setup
BI connection properties are set in the General tab, shown in Figure 6. Menu items are provided to create, select, or modify BI connections.
Figure 8 shows the new connection dialog window. The new BI connection DEMOCONN is saved on the DataStage server, and can be reused to design other DataStage jobs to perform BI data operations.
Figure 8. Connection property dialog window
Source system setup
The source system property is set in the Transfer Structure tab, shown in Figure 9. Menu items are provided to create, select, or view source systems.
In SAP BI, a source system can be a flat file, an SAP system, a database system, a multidimensional data source, a Web service, or a staging BAPI interface-based external application. BW Load Stage uses the staging BAPI APIs to exchange metadata with SAP BI and load data into SAP BI.
Figure 9. Transfer Structure tab
The New… menu item in Figure 9 creates a new source system. When a new source system is
created, BW RFC Manager starts a new BW RFC Server process. The RFC Server process waits
for data-load requests from SAP BI and requests the corresponding DataStage job to start
the data-load operation. The new source system is saved on the DataStage server for future reuse.
Figure 10 shows that a new source system DEMODSSRC is created. The BW RFC Server process for the new source system is automatically created on the DataStage server and registered with the SAP BI as the program <hostname>.DEMODSSRC. As shown in Figure 11, the SAP transaction sm59 can be used to see and test the RFC destination for the new source system.
Figure 10. Create a source system
Figure 11. RFC destination
InfoSource setup
An InfoSource is set on the Transfer Structure tab, shown in Figure 12. Menu items are
provided on the tab to create, update, view, and search SAP BI characteristics, key figures, and InfoSources.
Figure 12. InfoSource menu
items
The menu item Create Master InfoSource from Existing Characteristic… creates an
InfoSource based on an existing characteristic. Two subsequent user actions are needed when this menu item is selected:
-
Select an existing Characteristic. BW Load Stage shows the BI characteristics matching the
search condition and allows the selection of an existing characteristic. Figure 13 selects
the CUSTOMER characteristic.
Figure 13. Select an existing characteristic
-
Specify the properties of the new BI InfoSource object, shown in Figure 14.
Figure 14. Specify the properties of new InfoSource
As shown in Figure 15, the BW Load Stage creates the specified InfoSource in the SAP
BI. The Stage also selects the InfoSource on the Transfer Structure tab in Figure 16.
Figure 15. New InfoSource
Figure 16. Select new InfoSource
DataStage table definition setup
When an InfoSource is selected on the Transfer Structure tab, a DataStage table definition is created based on the transfer structure of the InfoSource. Figure 17 shows the table definition. Table 2 shows how the SAP data types are mapped to the DataStage data types. The table definition can be validated and synchronized with the InfoSource fields using the Validate Columns and Synchronize Columns buttons.
Figure 17. Columns tab
Table 2. Data type mapping table
|
SAP data type
|
DataStage data type
| |
DATS
|
SQL DATE
| |
CURR
|
SQL CHAR
| |
TIMS
|
SQL TIME
| |
FLTP
|
SQL FLOAT
| |
CHAR (no more than 256 characters)
|
SQL CHAR
| |
CHAR (more than 256 characters)
|
SQL VARCHAR
|
InfoPackage setup
The InfoPackage is set on the InfoPackage tab, shown in Figure 18. The InfoPackage is
an entry point for SAP BI to request data from a source system. An InfoPackage defines
when and how a DataStage job loads data into an SAP BI system. The InfoPackage tab
creates and selects an InfoPackage. As shown in Figure 19, the tab also allows you to set the InfoPackage properties.
Figure 18. InfoPackage tab
Figure 19. InfoPackage property dialog window
BW Load Stage supports three data load mechanisms:
-
Push mode: A DataStage job is started first. The DataStage job schedules the InfoPackage for the job to start the data loading operation.
-
Pull mode: An InfoPackage is scheduled first using SAP Data Warehousing Workbench.
When the SAP BI is ready to receive data, it notifies the RFC Server process. The RFC
server process launches the DataStage job to send data to the SAP BI.
-
File mode: A DataStage job runs first. The DataStage job saves the data for SAP BI to a temporary file. An InfoPackage is then scheduled to load the data in the file into SAP BI.
The InfoPackage third-party parameters are defined to support third-party integration
tools like the DataStage BW Pack. As shown in Figure 20, the DataStage job BILoadJob is
automatically set as a third-party parameter for the new pull InfoPackage DEMO Pull InfoPacakge.
Figure 20. InfoPackage third party parameters
The use of third-party parameters in the BI data loading process is described as follows:
-
SAP BI schedules and runs the InfoPackage DEMO PULL InfoPackage.
-
When it is ready to receive data, the InfoPackage sends the loading request to the source
system DEMODSSRC. It also passes the third-party parameter DSJob and its value BILoadJob to the source system.
-
The RFC Server process for the source system receives the request and starts the DataStage job BILoadJob to send data packages to the SAP BI.
Process Chain Setup
The process chain is set on the Process Chain tab. This step is optional. The BW Load Stage can run with or without a process chain.
A process chain defines a sequence of dependent processes linked together. The execution of an InfoPackage is one of the process types that SAP BI defines. As shown in Figure 21, the execution of the InfoPackage DEMO Pull InfoPacakge is added as a process in the process chain Demo Load Chain. Figure 22 selects the process chain Demo Load Chain.
Figure 21. Run the data loading job as part of
a process chain
Figure 22. Select process chain
Run data load operation
The process chain Demo Load Chain must be scheduled to run your BI data load operation. The process chain can be started in two ways:
In this example, the DataStage job is first started to invoke the process chain. Figure
25 shows that the DataStage job runs successfully. Figure 26 shows that the source data
is loaded into the SAP BI.
Figure 25. Run DataStage job
Figure 26. Data loaded into SAP
BI
Extract data from SAP BI
The BW Extract Stage extracts data from an SAP BI system. The extracted data can be fed into non-SAP applications. The BW Extract Stage is based on the SAP Open Hub Service, which defines a controlled and monitored data exporting process.
This section uses a simple ETL job to illustrate the steps necessary to extract data
from an SAP BI system. Figure 27 shows the sample job. The job extracts the data from
the SAP BI characteristic CUSTOMER, as shown in Figure 4. The job processes the extracted data using a DataStage Transformer Stage and then saves the processed results into a flat file.
Figure 27. DataStage job for extracting data
from the SAP BI
The BW Extract Stage has a Stage Editor, shown in Figure 28. The Stage Editor contains four tabs for setting up various properties for the BI data extraction operation.
Figure 28. BW Extract Stage
GUI
-
The General tab creates or selects an RFC connection to an SAP BI.
-
The Process Chain tab selects a source system and a process chain. The BI data extraction operation is run as a process within a process chain.
-
The Open Hub Destination tab selects an InfoSpoke or an Open Hub Destination.
-
The Columns tab displays the column definitions for the extracted data from the SAP BI.
The BW Extract Stage ExtractDataFromSAPBI, in Figure 27, extracts data from an SAP BI system. Multiple steps are involved in setting up the BW Extract Stage. Those steps are illustrated in Figure 29 and described in detail in the following sections.
Figure 29. Setting up the BW Extract Stage ExtractDataFromSAPBI
BI Open Hub Destination and data transfer
process setup
The BW Extract Stage supports both BW InfoSpoke and BI Open Hub Destination for data extraction.
InfoSpoke is a central piece of the Open Hub Service Pack in SAP BW. An InfoSpoke specifies three properties:
-
An InfoProvider that provides the original data. An InfoProvider can be an InfoCube, a
DataStore object, or an InfoObject.
-
An Open Hub Destination that defines the targets to receive the extracted data. An Open Hub Destination can be a flat file, a database table, or a RFC destination.
-
Transformation that converts the data from its original form to the destination form.
An InfoSpoke must be created first before it can be selected on the Open Hub Destination tab. BW Extract Stage supports creating an InfoSpoke and using the InfoSpoke for data extraction.
In the SAP BI, Open Hub Destination has been integrated into the new BI data transfer process and is no longer tightly coupled with InfoSpoke. As illustrated in Figure 30, a data transfer process transforms the data from an InfoProvider to an Open Hub Destination. When the data is ready in the Open Hub Destination, SAP BI notifies the DataStage RFC Server process, which starts a DataStage job to extract the data from the Open Hub Destination. A process chain is created to control the whole data extraction process.
Figure 30. BI data extraction diagram
The Extract Stage ExtractDataFromSAPBI, in Figure 27, is set up based on the BI Open
Hub Destination. The use of a traditional BW InfoSpoke is not discussed in this article. The following steps summarize how to create the BI artifacts for the stage:
-
Create and activate a new Open Hub Destination using the SAP transaction RSBO or the Data
Warehousing Workbench GUI. Figure 31 shows the dialog window for creating a new Open Hub
Destination DEMODEST. The attributes of the CUSTOMER characteristic are selected to create
the field definitions of the new Open Hub Destination. As shown in Figure 32, the
DEMODSSRC RFC destination is selected as the data receiver for the Open Hub Destination.
Figure 31. Create new Open Hub Destination
Figure 32. Select RFC destination
-
Create and activate a new data transfer process to transform the data from an InfoProvider to the Open Hub Destination. Figure 33 shows the dialog window for creating a new data transfer process. The data transfer process transfers the data from the Customer attributes to the Open Hub Destination DEMODEST.
Figure 33. Create a new data transfer process
-
Create and activate a new process chain to invoke the new data transfer process. Figure 34
shows that the data transfer process CUSTOMER > DEMODEST is added as a process to the process chain CUSTCHAIN.
Figure 34. Create and activate a process chain
Source system and process chain setup
The source system and process chain are set on the Process Chain tab. A source system is the destination for the extracted data. A process chain controls the data extraction process.
In Figure 35, DEMODSSRC is selected as the source system. The process chain CUSTCHAIN is selected for the stage ExtractDataFromSAPBI.
Figure 35. Process Chain tab
Open Hub Destination setup
An Open Hub Destination is set on the Open Hub Destination tab, as shown in Figure 37. Two user actions are required:
-
Select an Open Hub Destination.
In Figure 36, the Open Hub Destination DEMODEST is selected for the stage
ExtractDataFromSAPBI. The stage retrieves the DEMODEST definitions from BI and
automatically populates various GUI controls shown in Figure 37.
The table fields shown in Figure 37 are a part of the DEMODEST definitions. Those
fields are converted to a DataStage tab definition that is displayed on the Column tab. The Column tab for the BW Extract Stage is the same as the Column tab for the BW Load Stage shown in Figure 17.
Figure 36. Select an Open Hub
Destination
Figure 37. Open Hub Destination
tab
-
Update third-party parameters for the Open Hub Destination.
An Open Hub Destination supports third-party parameters in the same way as InfoPackage.
The button Update BW, in Figure 37, sets the third-party parameters of the selected Open
Hub Destination. Figure 38 shows the result of when the button is clicked. Figure 39
shows that the job name BIExtractJob, the process chain CUSTCHAIN and the source system
DEMODSSRC are set as the third-party parameters of the Open Hub Destination DEMODEST.
The usage of the third-party parameters in the BW Extract Stage is similar to the usage of the parameters in the BW Load Stage.
Figure 38. Update the third-party parameters
Figure 39. DEMODEST third-party parameters
Run data extraction operation
Similar to the data load operation, the process chain CUSTCHAIN must be scheduled to
run the data extraction operation. It can be started either by the DataStage job
BIExtractJob or by the SAP Data Warehousing Workbench.
In this example, the DataStage job is started to invoke the process chain. Figure 40
shows that the DataStage job runs successfully and Figure 41 shows the data extracted
from the SAP BI.
Figure 40. Run the DataStage
job
Figure 41. Data extracted from SAP
BI
Conclusion
This article demonstrated how to integrate SAP BI data with non-SAP BI data using IBM
Information Server and the WebSphere DataStage SAP BW pack. It explained the SAP BI
data loading and extraction processes and the new SAP BI feature: the data transfer
process. Two examples illustrated the step-by-step design processes.
IBM Information Server provides leading technology and integration solutions to two
other critical issues in the SAP BI Data Warehouse environment:
-
Data Quality: The data that builds a data warehouse often comes from various data sources. The structure of the legacy data is often not documented and the data quality is poor. The WebSphere Information Analyzer product analyzes your data and determines the data structure and quality. It helps you understand your data. The WebSphere QualityStage product standardizes and matches any type of information to create high quality data.
-
Data Volume: There is often a huge amount of data that needs to be processed
regularly for a data warehouse environment. Sometime the data volume grows beyond
expectations. The issue needs to be addressed with a scalable ETL architecture. IBM
Information Server leverages the pipeline and partition technologies to support high
data throughput. IBM Information Server can be deployed on symmetric multiprocessing
(SMP) and massively parallel processing (MPP) computer systems to achieve the maximum scalability.
Currently work is underway on other SAP BI integration solutions, including direct data
accesses and metadata management. The work focuses on developing a fast and efficient
ETL solution to build corporate transactional data stores with massive document-level data items.
Acknowledgements
I would like to thank Wendi Nusbickel, Nicolas Cominetti, and Susan Shay for their
feedback and reviews of this article.
Appendix A: Terminology
|
Terminology
|
Description
| |
ETL
|
Extract, Transform, and Load
| |
SAP BI
|
Business Intelligence
| |
SAP BW
|
SAP Business Information Warehouse
| |
GUI
|
Graphical User Interface
| |
CRM
|
Customer Relationship Management
| |
ODBC
|
Open Database Connectivity
| |
DataStage job
|
A sequence of data operations performed by IBM Information Server.
| |
RFC
|
SAP term, Remote Function Call
| |
PSA
|
SAP BI term, Persistent Staging Area
| |
Staging BAPI
|
SAP BI term, an open interface for third party ETL tools
| |
Open Hub Service
|
SAP BI term, an SAP BW/BI data exporting mechanism
| |
Source System
|
SAP BI term, a logical or physical system external to an SAP BI system.
| |
InfoObject
|
SAP BI term, a lowest level information provider
| |
DataStore Object
|
SAP BI term, a storage location for consolidated transaction and master data at document level.
| |
InfoCube
|
SAP BI term, several relational tables arranged in a star schema
| |
InfoSource
|
SAP BI term, a quantity of information that logically belongs together
| |
InfoPackage
|
SAP BI term, an entry point for requesting data from a source system
| |
InfoSpoke
|
SAP BI term, an extraction object that exports data within the Open Hub Service
| |
Process Chain
|
SAP BI term, a sequence of processes linked together
| |
Transfer Structure
|
SAP BI term, a selection of data fields from a source system
|
 |
Appendix B: Tools
|
Figure
|
Tool
| |
Figure 2, 25, 27, 40
|
DataStage Designer
|
Figure 3, 4, 11, 15, 20, 21, 26, 31, 32, 33, 34, 39
|
SAP Data Warehousing Workbench
| |
Figure 6, 8, 9, 10, 12, 13, 14, 16, 17, 18, 19, 22
|
BW Load Stage Editor
| |
Figure 28, 35, 36, 37, 38
|
BW Extract Stage Editor
| |
Figure 41
|
DataStage File Stage Editor
|
Resources Learn
Get products and technologies
-
Build your next development project with
IBM
trial software, available for download directly from developerWorks.
Discuss
About the author  | 
|  | Jeff Li is an Advisory Software Engineer in Software Group in Boca Raton, Fla. He joined IBM in 2005 as part of the Ascential acquisition. He has been working in the area of enterprise applications and telecommunication systems for 15 years. Currently, he is developing the ERP and BI adapters for IBM Information Server. He is also a certified project management professional. |
Rate this page
|  |