Using Smarter Performance Analysisi Suite to Analyze Oracle AWR Reports
Zhendong Li 270005N4J2 Visits (4788)
The Automatic Workload Repository (AWR) report is the commonly used tool for analyzing Oracle database performance. AWR looks periodically at the system performance (by default every 60 minutes) and stores the information found (by default up to 7 days). The processes of generating an AWR report is available on Oracle website , 60 minutes is a common time sample chosen to analysis performance issue but you can adjust it according your application workload.
The Oracle snapshot plugin in Smarter Performance Analysis Suit aims to find some potential performance issues, give alerts, summary explanations and corresponding improving advises.The plugin will analyze AWR reports and check the value of each item with some predefined rules, which are based on our knowledge on Oracle database performance. Each vailation will be marked as an alert and displayed with yellow or red color. Yellow color represents warning and red color represents critical warning. Please note that the default, out-of-box rule set is designed for Online Transaction Processing (OLTP) type of workload. The default rule set is created for general purpose, as performance is very much application dependent. As always, you can define your rules to better fit your application's workload and its performance consideration.
Below we will use two real cases to demonstrate the potential of this feature.
Case 1 - I/O performance issue
We open an AWR report with performance problem with the tool and the following is the steps to diagnose that problem.
Time Model measure the time spent in the database by operation type. The most important time model statistic is database time (DB time), which is calculated by aggregating the CPU time and non idle wait times. For our scenario, we get an alert which telling us the rule definition and potential reason.
Wait Class displays the time statistics which are grouped by wait class, such as DB CPU, User I/O and so on. we found there is I/O problem for our Oracle instance.
Wait Events displays the time statistics of each wait event. We can realize the reason of specific performance issues by it. There are three alerts db file sequential read, db file scattered read, direct path read in our scenario. If you have interest you can find official description about these three events.
A sequential read is a single-block read and single block I/O are usually the result of using indexes. A scattered read is usually a multi block read and it can occur for a fast full scan (of an index) in addition to a full table scan. If those value is high >10 ms, there may be physical I/O problem.
In 10g, full table scans for "large" tables used to go through buffer cache by scattered read (by default). In 11g, this decision to read via direct path or through cache is based on the size of the table, buffer cache size and various other stats. Direct path reads (into PGA) are faster than scattered reads and have less impact on other processes because they avoid latches. So high direct path read value may be caused by large table full scan. We found that direct path reads take over 20 percent of db time so that there probably is full table scan problem. That will bring significant influence on all kinds of disk read event, so the sequential read and scattered read are also very slow.
SQL Stats displays SQL statements that are producing the highest load on the system, based on criteria such as elapsed time, CPU time, DB time, disk reads. We can found the specific SQL statements that cause the I/O problem in previous description. For each SQL statement, if the value of the percentage of total DB time, total CPU time, total buffer gets, total disk reads is relatively high (>20%), the corresponding SQL statements should be checked by exa whether it has some room of improvements (such as changing index strategy) to reduce the total occupied proportions. The warning and critical threshold for elap
After adding the appropriate indexes, we take the AWR report of our environment again and the db_time percentage of user I/O is much less than previous, DB CPU takes most of the DB time, the full table scan problem has been solved.
Case 2 - Library latch and concurrency issue
The previous scenario is an I/O performance issue caused by inappropriate index strategy, and the following scenario is another AWR report with other kind of performance issue.
Report Summary contains overall performance info such as the cache size of each SGA components, CPU & Memory statistics, instance efficient percentages, share pool statistics and top 5 timed foreground events.
In our scenario, theare are four alerts in the efficient percentage statics, the most significant one is that the libr
Consider that four alerts together, it's obviously that this oracle instance has library cache contention and parsing problems.
In Time Model tap, we found that DB CPU only takes 8.4% of DB time but parse time elapsed takes up to 43% of DB time and hard parse elapsed time isn't very much(0.83% of DB time), obviously, there are too much soft parses. In a well tuning database, we hope that the service time (DB CPU and I/O wait) should be more than 90% of DB time, if some other event(parsing for example) take more than 5% of DB time, which may bring serious performance issue.
In Wait Class tap, we found that most of DB time time is spent by concurrency which represents the time spent on all kinds of latch & lock when sharing the memory in SGA.
Wait Event tap lists the detailed time statics of each kind wait event. We found DB time percentage of the share pool latch(should be acquired before allocating space in the share pool) and the library cache mutex(should be acquired before accessing objects in the library cache) are too high. As the same as previous tap told us, there are too much contention in the library cache, which may be caused by small SGA size or too much parsing.
In SQL Stats tap, we found that there are lots of version count alerts and sharable memory alerts.
Take the statement with 1782 version count for example, which indicates that there are 1782 child cursor under one parent cursor(with the same literal syntax). When you have unnecessary versions of a cursor, each time the cursor is executed, the parse engine has to search through the list of versions to see which is the cursor that you want. This wastes CPU cycles that you could be using on something else. High version counts can easily cause high contention for library cache latches.The statement has 431652072 bytes sharable memory indicates that single statement will take 411 MB space in the library cache, that too much for one statement and will probably result in memory shortage in library cache.
In fact, we have checked the cursor_sharing in the reprort summary tap which is require to be set with
SGA Advisory tap also indicates that SGA is in shortage, The size factor ==1.0 implies it's the current SGA size. The yellow color represents the recommend SGA size value(49152) which may get a good balance between resource and performance.