The Automatic Workload Repository (AWR) report is the commonly used tool for analyzing Oracle database performance. AWR looks periodically at the system performance (by default every 60 minutes) and stores the information found (by default up to 7 days). The processes of generating an AWR report is available on Oracle website , 60 minutes is a common time sample chosen to analysis performance issue but you can adjust it according your application workload.
The Oracle snapshot plugin in Smarter Performance Analysis Suit aims to find some potential performance issues, give alerts, summary explanations and corresponding improving advises.The plugin will analyze AWR reports and check the value of each item with some predefined rules, which are based on our knowledge on Oracle database performance. Each vailation will be marked as an alert and displayed with yellow or red color. Yellow color represents warning and red color represents critical warning. Please note that the default, out-of-box rule set is designed for Online Transaction Processing (OLTP) type of workload. The default rule set is created for general purpose, as performance is very much application dependent. As always, you can define your rules to better fit your application's workload and its performance consideration.
Below we will use two real cases to demonstrate the potential of this feature.
Case 1 - I/O performance issue
We open an AWR report with performance problem with the tool and the following is the steps to diagnose that problem.
Time Model measure the time spent in the database by operation type. The most important time model statistic is database time (DB time), which is calculated by aggregating the CPU time and non idle wait times. For our scenario, we get an alert which telling us the rule definition and potential reason.
Generally, we want SQL processing time is high, parsing and other stuff is low. If SQL elapsed time >> DB CPU, if the application is not I/O intensive, there may be wait problem, including I/O wait or concurrency wait.
Wait Class displays the time statistics which are grouped by wait class, such as DB CPU, User I/O and so on. we found there is I/O problem for our Oracle instance.
Wait Events displays the time statistics of each wait event. We can realize the reason of specific performance issues by it. There are three alerts db file sequential read, db file scattered read, direct path read in our scenario. If you have interest you can find official description about these three events.
A sequential read is a single-block read and single block I/O are usually the result of using indexes. A scattered read is usually a multi block read and it can occur for a fast full scan (of an index) in addition to a full table scan. If those value is high >10 ms, there may be physical I/O problem.
In 10g, full table scans for "large" tables used to go through buffer cache by scattered read (by default). In 11g, this decision to read via direct path or through cache is based on the size of the table, buffer cache size and various other stats. Direct path reads (into PGA) are faster than scattered reads and have less impact on other processes because they avoid latches. So high direct path read value may be caused by large table full scan. We found that direct path reads take over 20 percent of db time so that there probably is full table scan problem. That will bring significant influence on all kinds of disk read event, so the sequential read and scattered read are also very slow.
SQL Stats displays SQL statements that are producing the highest load on the system, based on criteria such as elapsed time, CPU time, DB time, disk reads. We can found the specific SQL statements that cause the I/O problem in previous description. For each SQL statement, if the value of the percentage of total DB time, total CPU time, total buffer gets, total disk reads is relatively high (>20%), the corresponding SQL statements should be checked by exa whether it has some room of improvements (such as changing index strategy) to reduce the total occupied proportions. The warning and critical threshold for elapsed_time_per_exec is 0.5 s and 1.0 s. The long elapsed time can be caused by query statement which brings about Full Table Scan or Full Index Scan. A modification statement brings long non-idle wait (eg. IO wait) could also brings long elapse time. For tables with more than a few rows, please consider redesigning the query by adding an index for one or more of the columns tested in the WHERE clause.
total_disk_reads_percentage will sort the statements by this column, we found that the top 3 statements take most of total disk reads and elapsed time per exec is quite long, but the execution time is not large, that's highly possible that there are index missing problem for table
wochange and table
pmcomsr, after examining the execution plan of those SQLs we found it's true! We add corresponding indexes to those two tables and the I/O problem has been much improved.
After adding the appropriate indexes, we take the AWR report of our environment again and the db_time percentage of user I/O is much less than previous, DB CPU takes most of the DB time, the full table scan problem has been solved.
Case 2 - Library latch and concurrency issue
The previous scenario is an I/O performance issue caused by inappropriate index strategy, and the following scenario is another AWR report with other kind of performance issue.
Report Summary contains overall performance info such as the cache size of each SGA components, CPU & Memory statistics, instance efficient percentages, share pool statistics and top 5 timed foreground events.
In our scenario, theare are four alerts in the efficient percentage statics, the most significant one is that the library_hit_percentage is much less than 95% (the critical threshold in the rules), which indicates that SQL is being pushed out of the shared pool early (could be due to a shared pool that is too small). The new coming SQL will be parsed again(lexical analysis, semantic analysis, execution plan generating and so on) and put into the library cache, that will cost lot of CPU resource.
Non-Parse_cpu_percentage small than 95% indicates that parsing consumes too much time, and we want most of CPU time used by SQL execution but not parse.
execute_to_parse_percentage is a measure of how many times a statement was executed as opposed to being parsed. This is a measure of how many times a statement was executed as opposed to being parsed. EXECUTE to PARSE = (1- parse/execute). The closer we get to 100, the more efficient we are.In a system where we parse, and then execute the statement, and never execute it again in the same session, this ratio will be 0 (bad). On a system where we parse a statement once and then execute it 100 times, we'll get a 99% ratio. The closer we get to 100, the more efficient we are.
parse_cpu_to_parse_elapsed-percentage indicates how much of the time spent parsing was due to the parse operation itself, rather than waiting for resources, such as latches. if the ratio was 100 percent, it would imply CPU time was equal to elapsed time and we processed without any waits. If you have some bad waits parsing -- which are typically due to contention on the shared pool but can be caused by other things as well.
Consider that four alerts together, it's obviously that this oracle instance has library cache contention and parsing problems.
In Time Model tap, we found that DB CPU only takes 8.4% of DB time but parse time elapsed takes up to 43% of DB time and hard parse elapsed time isn't very much(0.83% of DB time), obviously, there are too much soft parses. In a well tuning database, we hope that the service time (DB CPU and I/O wait) should be more than 90% of DB time, if some other event(parsing for example) take more than 5% of DB time, which may bring serious performance issue.
In Wait Class tap, we found that most of DB time time is spent by concurrency which represents the time spent on all kinds of latch & lock when sharing the memory in SGA.
Wait Event tap lists the detailed time statics of each kind wait event. We found DB time percentage of the share pool latch(should be acquired before allocating space in the share pool) and the library cache mutex(should be acquired before accessing objects in the library cache) are too high. As the same as previous tap told us, there are too much contention in the library cache, which may be caused by small SGA size or too much parsing.
In SQL Stats tap, we found that there are lots of version count alerts and sharable memory alerts.
Take the statement with 1782 version count for example, which indicates that there are 1782 child cursor under one parent cursor(with the same literal syntax). When you have unnecessary versions of a cursor, each time the cursor is executed, the parse engine has to search through the list of versions to see which is the cursor that you want. This wastes CPU cycles that you could be using on something else. High version counts can easily cause high contention for library cache latches.The statement has 431652072 bytes sharable memory indicates that single statement will take 411 MB space in the library cache, that too much for one statement and will probably result in memory shortage in library cache.
In fact, we have checked the cursor_sharing in the reprort summary tap which is require to be set with
FORCE in our predefined rules to get better performance in most of scenarios. However, there are still too much version count for some sqls which would cost much library cache space and consume much CPU resource when search through the list of versions. We should check in oracle system view
v$sqlarea(info about parent cursor) and
v$sql(info about child cursor) to seek specific reason. In Oracle 11g, a new view
v$sql_shared_cursor is provided to diagnose cursor sharing problem.
SGA Advisory tap also indicates that SGA is in shortage, The size factor ==1.0 implies it's the current SGA size. The yellow color represents the recommend SGA size value(49152) which may get a good balance between resource and performance.
Oracle® Database Performance Tuning Guide 11g Release 2 (11.2)
Oracle® Database Concepts 11g Release 2 (11.2)