Contents


Accelerating SAP CO-PA with DB2 for Linux, UNIX, and Windows

Use DB2 intrapartition parallelism for reporting queries

Comments

Traditional online transaction processing (OLTP) workloads are characterized by large numbers of parallel queries. Queries that are executed independently from one other can be evaluated in parallel by the database server. As a result, large, multicore systems are used to their capacity by processing each query by a single core at a time.

Online analytical processing (OLAP) workloads, on the other hand, are typically characterized by small numbers of parallel-running queries. They read and aggregate a large amount of data and model parts of the application logic. As a result, OLAP queries are more complex and take a long time to run. The processing capabilities of a single CPU core are limited. So, to speed up OLAP queries, you need to process them in parallel to use all the cores of a large, multicore system. This parallel processing is either data-oriented, by using the database partitioning feature (DPF), or query-oriented, by using intrapartition parallelism. Because it's straightforward and easy to maintain, intrapartition parallelism is the best choice for enabling parallel query processing with IBM® DB2®.

SAP® Profitability Analysis (CO-PA) reports, which are primarily used for drill-down analysis, fetch a significant number of records and aggregate them to a small result set. Traditionally, those results are calculated in advance and stored on the SAP application server to give users acceptable response times. Using intrapartition parallelism can possibly speed up the query processing significantly. This speed enables the application to skip the pre-calculation step and to read data for each navigation step directly from the database. This eliminates the functional restrictions that are typically in place because of resource limitations on the application server and reduces the number of required summarization levels. Based on an example report, an increased speed of eight times or more could be reached.

Data access methods in SAP CO-PA

In SAP CO-PA reports, all data is initially presented at a coarse level of detail. You can refine your query to get a more detailed view on a certain aspect. SAP CO-PA can be configured to load the data all at once during the program start or at each step as you walk through the drill-down steps.

If you request all the data to load during the program start, the data that is read from the database has a fine granularity. If read on demand, the data has the granularity that is needed for the corresponding step. Both methods have advantages and both can profit from intrapartition parallelism.

Requesting data only one time, when the report is loaded, has several drawbacks:

  • The initial report might take a long time to load.
  • The memory consumption on the application server is high during the entire runtime of the report.
  • The amount of data that can be evaluated by a report is limited by the memory available to a single process on the application server.

Loading the data on demand does not require the data to be kept on the application server, resulting in lower memory consumption. Additionally, the report starts faster since fewer rows have to be transferred from the database to the application server. Data loaded this way results in higher performance-related requirements. When dealing with huge amounts of data, an acceptable response time can only be achieved with intrapartition parallelization.

Summarization levels

To reduce the memory consumption and speed the query processing, SAP CO-PA uses pre-calculated summarization levels. Those summarization levels are physically persistent aggregates at different levels of detail. To fetch as little data as possible, SAP CO-PA uses the most coarse summarization level that matches the requirements of a report. Those summarization levels can be defined and used independently from the SAP CO-PA reports to access all SAP CO-PA data. Also, you can use report-specific summarization data, which contains only the data for a certain SAP CO-PA report.

Reducing the number of summarization levels eliminates redundant information, saves space on the application server, and reduces the time and resource consumption that is required to build the summarization levels.

Reading data on demand eliminates memory limitations on the application server. With faster query processing, fewer summarization levels are required to meet the runtime requirements and creating them is faster as well.

Methods

The following example demonstrates how to use intrapartition parallelism to improve the performance of SAP CO-PA queries. To show the workload characteristics of SAP CO-PA, I describe the report configuration and explain the data flow and structure of SAP CO-PA. I also discuss the properties of the data used in this scenario. In addition, I review how to determine the best summarization levels and how to enable intrapartition parallelism.

Report configuration

Reports are configured by using the KE30 transaction. To see the performance settings, select Edit Report > Options > Performance > Presummarized data. From there, you need to choose one of the three data access methods:

  • Use summarization data: Use report-specific, dynamically created summarization data (frozen report data).
  • Use a summarization level: Read system-wide, defined summarization levels.
  • Read upon each navigation step: Read data as required. This option uses summarization levels as well.

The preferred option is "Read upon each navigation step". This data-access method enables on-demand reporting and eliminates the need for long-running calculations to prepare aggregates and the restrictions regarding the memory consumption on the application server.

Environment

All data that is collected by SAP CO-PA is stored separated into operating concerns which correspond to the organizational units of a company. Each operating concern holds its data in four tables. They are named according to the schema CEnxxxx, where n is a number from one to four and xxxx is the name of the particular operating concern. For the sake of clarity, they are referenced as CEn in the following example.

Line items are written to the tables CE1 and CE2 divided by the plan and actual data. This data is merged and pre-aggregated, and the result is split and written into the segment level CE3 and segment table CE4. Key figures and the segment-specific characteristics record type, planned/actual indicator, version, and period are written to CE3. The remaining characteristics are written to CE4. Those two tables are the main data source for reports and the creation of summarization levels. Usually, several line items share the same combination of characteristics. Therefore, table CE4 can be condensed again. In the example scenario, this results in a cardinality of 62,509,896 rows for CE3 and 24,603,426 rows for CE4.

A report provides a view of the data that is created by defining certain predicates and drill-down possibilities. Predicates reduce the scope of the report and can be defined by any characteristic. A drill-down is defined by a sequence of characteristics. Initially, data is aggregated by the first drill-down characteristic c0. This coarse overview is presented to the user. When an entry is chosen, its value v0 is added to the list of predicates and the next characteristic c1 is added to the GROUP BY clause of the corresponding SQL statement.

As all drill-down characteristics are stored in CE4, the number of rows fetched from CE4 is reduced with each step, while the number of rows fetched from CE3 remains constant. Joining CE3 and CE4 eliminates outdated combinations of characteristics or items that are filtered by restrictions on CE3. Due to the n:1 mapping between CE3 and CE4, the size of the join result corresponds to the number of qualifying rows of CE3.

You should be aware of four relevant metrics related to the number of processed rows:

  • The number of rows read from CE4
  • The number of rows that are meeting the join predicates
  • The cardinality of the join result
  • Size of the grouped result set

Those cardinalities are shown in Figure 1.

Figure 1. This simplified access plan shows the important cardinalities of each processing step.
Access plan which shows the corresponding names for each processing step

The example report has a predicate on table CE3. On table CE4, only drill-down predicates are applied. Table 1 shows the four rows for each navigation step.

Table 1. Rows that are read and returned by each step and operator
stepCE4 fetchedCE4 joinedjoin resultaggr. result
024,603,42615,627,34846,882,044186
14,092,8402,599,5247,798,572817
2113,79672,864218,592618
32,8802,1606,480101
41441263787

The report-specific predicates reduce the number of rows fetched from CE3 from 62 million rows to 47 million rows. So, the report analyzes 75 percent of the available data. The number of rows fetched from CE4 in step 0 confirms that, initially, there are no restrictive predicates on this table. Only 15 million rows of 24 million rows fetched from CE4 meet the join criteria, which indicates that 37 percent of the data is either outdated or filtered indirectly by predicates on CE3.

Summarization levels

Summarization levels provide data from CE3 and CE4 as persisted aggregates. Using the KEDV transaction defines the level of detail by the chosen characteristics. The aggregate is materialized in two tables: K81n and K81(n+1), where n and n+1 are four-digit integers padded with zeros.

Table K81(n+1) is similar to CE3 and for the sake of clarity we named it CE3S. It contains all key figures but no characteristics besides period and year. All characteristics, including the remaining four that were originally stored in CE3, are written to K81n which is named CE4S for clarity in this paper. Reports read data from the aggregation level with the lowest level of detail to reduce the amount of data that must be processed.

Because it takes so long to create them, the number of summarization levels is supposed to be as low as possible. However, it's important to have a high level of detail in the summarization levels so that the data can be shared and reused by other reports. A summarization level is supposed to reduce the number of rows. Moving the segment-specific characteristics from CE3 to CE4S increases the number of characteristic combinations, while keeping the cardinality of CE3S constant. Thus, choosing a level of detail that is too high can increase the amount of data instead of reducing it.

Table 2 shows the cardinalities of the summarization level tables for each drill-down level of the example scenario. In addition to the corresponding drill-down characteristics, an additional characteristic was added to avoid too small tables.

Table 2. Comparison of original table cardinalities and the ones of summarization level tables for each drill-down step
scopeCE3SCE4S
original62,509,89624,603,426
step 013,392576
step 1429,33621,888
step 210,863,000950,616
step 354,721,80030,890,952
step 462,304,98461,293,816
at once62,487,79262,314,560

The summarization levels for steps 3 and 4 are too detailed. Step 2 qualifies for a generic summarization level, as it is the most detailed one were the cardinality of CE4S is significantly lower than the one of CE4. Depending on performance requirements, you might take step 1 into consideration as well.

Enable intrapartition parallelism

By default, SAP environments disable intrapartition parallelism. SAP Note 2047006 explains how to enable it system-wide or for certain statements or applications. SAP Note 2052896 describes SAP CO-PA-specific aspects.

A system-wide enablement sets the default degree of intrapartition parallelism to ANY. This means the degree is set to a value which is computed automatically based on the number of available CPU cores. If possible, setting the degree to ANY is the best choice as it requires the least amount of effort for setup and maintenance.

If enabled for certain applications, intrapartition parallelism is enabled on the database side, but the default degree is set to 1. This minimizes the impact on other applications or workloads. The default degree can be overridden by an appropriate guideline added to a query by the database shared library (DBSL). The SAP Optimizer profiles give you a way to add optimization guidelines to queries based on the corresponding database object and the query structure. Their general usage is described in SAP Note 1818503.

Intrapartition parallelism for SAP CO-PA queries is enabled by adding an optimizer guideline that sets a parallel degree for queries that read from CE3 and CE4. For this purpose, an optimizer guideline is created by adding an entry to table DB6_OPTPROFILE by using transaction SE16.

The column tab name corresponds to the object name displayed in performance traces. In this case, it is the name of the CE4 table that belongs to the operating concern. The pattern must match all reading queries against the CE4 and the corresponding CE3 table. The actual guideline is written to the column with the same name. Setting the degree value to ANY allows DB2 to set the value according to the available number of CPU cores. This results in a DB6_OPTPROFILE entry as shown in Table 3.

Table 3. DB6_OPTPROFILE example entry
FieldValue
TabnameCE4xxxx
Guideline<DEGREE VALUE="ANY"/>
Pattern+[SELECT%FROM%CE4xxxx%JOIN%CE3xxxx]

SAP CO-PA uses separate tables for each operating concern. You must use a separate profile entry for each operating concern and summarization level.

Because predicates of SAP CO-PA queries are defined on fields of CE4, it's smart to evaluate them first and then join CE3 by using the index. Nevertheless, the optimizer might choose other access plans, depending on the kind of predicates and the distribution of data. This behavior can negatively affect the performance and can cause varying response times. It might be necessary to extend the guideline to enforce a fixed access method of CE4 and CE3 as shown in the following example:

<DEGREE VALUE="ANY" /><NLJOIN><ACCESS TABLE='CE4xxxx' /><IXSCAN TABLE='CE3xxxx' /></NLJOIN>

Results

The results describe the database runtime of a query that fetches most of the data from CE3, CE4, or the corresponding summarization level tables. All aggregates are updated, and no additional data is loaded from CE1 and CE2.

A measurement with intrapartition parallelism enabled and the default degree set to "1" is used as reference. Based on the available hardware resources, a query degree of "16" is used for the measurements that use intrapartition parallelism.

Without summarization levels

The runtimes in Table 4 are measured without any summarization level and indicate how reports will perform on a large amount of data. However, building summarization levels does not only reduce the amount of data that must be processed, it also changes the relation and the ratio of table cardinalities as shown in Table 2. The comparability of a measurement on a plain operating concern and a summarization level of a similar size is limited.

Table 4. Drill-down runtimes reading on demand without summarization levels
stepdegree 1degree 16speedup
0482.63s61.49s7.85x
1106.45s14.24s7.47x
212.46s1.63s7.66x
39.79s1.30s7.51x
49.48s1.32s7.16x
SUM620.81s79.98s7.76x

Reading all data at once takes 3,014.93 seconds. Using intrapartition parallelism at a degree of 16, reduces the query runtime to 1,117.82 seconds, which is 2.70 times faster.

With a single summarization level

The most appropriate level of detail for a summarization level depends on the scenario and requirements. The way to determine the most detailed but still reasonable level of detail is described in Summarization levels. To tune a single report, a lower level of detail might have to be chosen.

The most detailed summarization level for this example scenario is at step 2. Table 5 shows the runtimes and performance improvements that are gained when you use intrapartition parallelism with a summarization level at step 2. Using intrapartition parallelism improves the average performance by 8.05 times.

Table 5. Drill-down runtimes read on-demand with a summarization level at step 2
stepdegree 1degree 16speedup
063.32s6.87s9.21x
113.30s2.40s5.55x
20.69s0.10s7.18x
39.79s1.30s7.51x
49.48s1.32s7.16x
SUM96.59s11.99s8.05x

To get the best performance out of this report, the summarization level's depth of detail must be as low as possible. The runtimes of each navigation step without summarization levels in Table 4 need to be examined. If database runtimes below two seconds are required for this scenario, step 1 must be chosen for the summarization level because it's the first one that does not meet the requirements. This procedure might have to be repeated for larger scenarios to determine other summarization levels. The results of a drill down with a summarization level at step 1 are shown in Table 6.

Table 6. Drill-down runtimes read on demand with a summarization level at step 1
stepdegree 1degree 16speedup
02.77s0.37s7.43x
10.37s0.19s1.99x
212.46s1.63s7.66x
39.79s1.30s7.51x
49.48s1.32s7.16x
SUM34.88s4.81s7.24x

The measurements show that using intrapartition parallelism is 7.24 times faster than without intrapartition parallelism. Although it's a little less than with a more detailed summarization level, the total runtime of the drill-down process is 2.49 times faster, and each query runtime is below 2 seconds.

The increased speed that's created by intrapartition parallelism with a more detailed summarization level indicates that the speed increases as the data that is being processed by the database increases. Depending on the amount of data, you can achieve even better performance improvements when you use a higher degree of parallelization and more CPU cores.

Summary

The use of intrapartition parallelism increases queries' performance significantly. Using intrapartition parallelism requires fewer summarization levels to meet the response-time requirements. Because the data model is based on the operating concern and shared among all reports, the definition of some summarization levels might still be necessary in cases where the application server reads all the data at once or if a certain report usually reads data on a coarse level of detail.

In general, it makes sense to read data on demand to reduce the resource consumption on the application server. In this way, summarization levels can be used wherever they are required for performance reasons, but they are no longer needed to cope with functional restrictions caused by memory limitations on the application server. Using intrapartition parallelism reduces the query processing time and provides acceptable response times to the user.

The measurements confirm that the benefit of intrapartition parallelism increases with the amount of analyzed data. The example scenarios demonstrate an improvement up to a factor of eight. We tried to use a realistic test setup, but you may see different results in your production environments.


Downloadable resources


Related topics

  • Access SAP Notes for more information about the SAP products. Notes related to this article are:
    • SAP Note 1818503: DB6: SAP Optimizer Profiles
    • SAP Note 2047006: DB6: Use of DB2 SMP Parallelism (INTRA_PARALLEL=YES)
    • SAP Note 2052896: DB6: Faster CO-PA reporting on database "IBM DB2 for Linux, UNIX, and Windows"
  • DB2 partitioning features on developerWorks
  • IBM Knowledge Center: Enabling intrapartition parallelism for queries
  • Performance in the Information System: Profitability Analysis (COPA) - SAP Library

Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Big data and analytics
ArticleID=1027813
ArticleTitle=Accelerating SAP CO-PA with DB2 for Linux, UNIX, and Windows
publish-date=03012016