How to diagnose ObjectServer performance log files using UNIX filtering functions and spreadsheets
GQ4Q_Simon_Knights 060000GQ4Q Comments (2) Visits (10883)
The ObjectServer writes performance information to log files. These logs can give information the time spent running SQL queries, identify which clients are using excessive resources and show CPU usage of individual triggers. In my experience it's often useful to get a high level picture before diving into details. One way of encouraging yourself to look at the overview before delving into details is to generate a graph of key performance indicators over time and focus on areas of interest highlighted by the graph. In this blog I will describe a 'quick and dirty' way generating graphs by using the UNIX 'grep' and 'awk' shell commands to filter log files, turning the resultant output into a CSV file and loading it into a spreadsheet.
I start off by looking at the profile log file ( $NCH
First I filter for this specific message using grep.
Then I filter the results with an AWK script which separates out the time and date, total time for report period and CPU time in period and outputs them in CSV format. ( I also remove brackets and 's' from the numbers to avoid confusing the spreadsheet).
Here is a sample script, and command line to execute it.
This gives a CSV format output.
I put that into a spreadsheet and graph the data. It doesn't show a high client CPU usage, so the next step is to look at the trigger logs. The trigger statistics are written to ( $NCH
First of all, I am interested in the following message (which gives the sum time for all triggers within the profiling period).
The format of the trigger statistics log is similar to the profiler logs. The awk script needs only a few modifications to format the trigger stats into a CSV file (the column offset for 'b' and 'c' has increased by 1). I filter using grep and the awk script below.
I load the output into a spreadsheet and graph CPU over time. This is a bit more interesting. It shows the total CPU time for each report period.
The graph shows a regular spike in CPU usage, which implies a temporal trigger. It narrows down the times we are interested into a number of specific periods. I can now focus on these specific times to try and isolate if any specific triggers are using excessive CPU at these times.
Sure enough, in each period a trigger called 'slowtrigger' is taking most of the CPU.
We still need to diagnose the reasons for the trigger running slowly. However we have made our job much easier by narrowing down the likely piece of code causing the problem.
To find out more about interpreting ObjectServer performance logs and key performance indicators, see the following section on the information center.