This scenario illustrates how to interpret API client instrumentation data and server monitoring data to improve database application restore operations.
Kate, a Tivoli® Storage Manager administrator, notices a performance degradation in restore operations for an SAP application database
Kate expects to improve the performance of database restore operations.
During the restore operation, data is read from the tape drives, sent over the network to the Tivoli Storage Manager client, and is written to the client disk. The data is transferred from the tape system through a host bus adapter (HBA) that connects the tape library to the server. The server transfers the data to the client system through a local area network (LAN) connection. A Network Interface Card (NIC) connects the server to the LAN and a separate NIC connects the LAN to the client. The data is restored to the disk from the client through a separate HBA device.

Following the flow of data, Kate starts to review the server instrumentation data. First, Kate notes the time when the restore was started by reviewing the Tivoli Storage Manager activity log. Then, she locates the files that were collected by the server monitoring script, and finds the files that are time-stamped around the time when the restore operation was slow. The instrumentation file names have the following structure: YYYYMMDD-HHMM-YYYYMMDD-HHMM-instr.txt.
Kate examines the instrumentation output to find a thread that was reading from tape. She finds Thread 131787. Data for the thread shows that 9.100 seconds was spent in the Tape Read category and has a fast instantaneous throughput (InstTput) of 269584.5 KB per sec. She reviews the server instrumentation categories in Table 1 and finds that there is no problem reading from tape. The instantaneous throughput and high amount of thread wait time indicate that tape read is not the bottleneck. Kate sees that 99% of the time was spent in Thread Wait (1199.192 seconds):
Thread 131787 AgentThread parent=131782 13:16:25.938-->13:36:34.274
Operation Count Tottime Avgtime Mintime Maxtime InstTput Total KB
----------------------------------------------------------------------------
Tape Read 9583 9.100 0.001 0.000 0.354 269584.5 2453248
Thread Wait 9585 1199.192 0.125 0.000 267.561
Unknown 0.042
----------------------------------------------------------------------------
Total 1208.335 2030.3 2453248
Following the data flow, she looks for a thread with a Network Send category and a thread that processed approximately the same amount of data (2453248 KB). She finds Thread 131781, which processed a total of 2452368 KB. Thread 131781 is a psSessionThread thread, which shows the timing of the data as it was sent from the server, through the network to the client. Kate sees that the InstTput for the thread was 2052.8 KB per second, which was slower than expected. The Thread Wait phase took 0.786 seconds, which was less than 1% of the total time. The data indicates that most of the time was spent in the Network Send phase. The findings suggest performance degradation of the network on the client side.
Thread 131781 psSessionThread parent=299 13:16:25.938-->13:36:34.274
Operation Count Tottime Avgtime Mintime Maxtime InstTput Total KB
----------------------------------------------------------------------------
Data Copy 1 0.000 0.000 0.000 0.000
Network Recv 5 12.778 2.556 0.001 12.719 0.0 0
Network Send 19170 1194.666 0.062 0.000 267.561 2052.8 2452368
DB2 Fetch Prep 1 0.003 0.003 0.003 0.003
DB2 Fetch Exec 8 0.003 0.000 0.000 0.002
DB2 MFtch Exec 3 0.008 0.003 0.000 0.004
DB2 Fetch 8 0.000 0.000 0.000 0.000
DB2 MFetch 5 0.000 0.000 0.000 0.000
DB2 Commit 2 0.000 0.000 0.000 0.000
DB2 Reg Prep 2 0.003 0.002 0.000 0.003
DB2 Reg Exec 2 0.000 0.000 0.000 0.000
DB2 Reg Fetch 2 0.000 0.000 0.000 0.000
Thread Wait 9584 0.786 0.000 0.000 0.351
Unknown 0.085
----------------------------------------------------------------------------
Total 1208.335 2029.5 2452368
Next, Kate looks at the API client instrumentation report.
Detailed Instrumentation statistics for
Thread: 2911 Elapsed time = 1669.061 sec
Section Actual(sec) Average(msec) Frequency used
------------------------------------------------------------------
Waiting on App 1435.153 94.8 15131
API Get Data 233.909 15.5 15131
Other 0.000 0.0 0
------------------------------------------------------------------
Detailed Instrumentation statistics for
Thread: 2902 Elapsed time = 1913.868 sec
Section Actual(sec) Average(msec) Frequency used
------------------------------------------------------------------
Waiting on App 1681.437 110.4 15224
API Get Data 232.432 15.3 15224
Other 0.000 0.0 0
------------------------------------------------------------------
Detailed Instrumentation statistics for
Thread: 2893 Elapsed time = 2093.726 sec
Section Actual(sec) Average(msec) Frequency used
------------------------------------------------------------------
Waiting on App 1926.681 127.1 15153
API Get Data 167.045 11.0 15153
Other 0.000 0.0 0
------------------------------------------------------------------
By reviewing the data from the API client instrumentation, Kate sees that the longest amount of time was spent on the Waiting on App category. This category shows the time that was spent by the application to process the restore data. After Kate reviews the API report, she determines that most of the restore time is spent waiting for the application database. As a next step, Kate could verify the performance of the disk system.
Kate contacts the storage administrator and the SAP administrator to help resolve the problem. After further investigation, the storage administrator determines that the database layout is the source of the problem, and corrects it.