How to diagnose ObjectServer performance log files using UNIX filtering functions and spreadsheets
GQ4Q_Simon_Knights 060000GQ4Q Comments (2) Visits (7897)
The ObjectServer writes performance information to log files. These logs can give information the time spent running SQL queries, identify which clients are using excessive resources and show CPU usage of individual triggers. In my experience it's often useful to get a high level picture before diving into details. One way of encouraging yourself to look at the overview before delving into details is to generate a graph of key performance indicators over time and focus on areas of interest highlighted by the graph. In this blog I will describe a 'quick and dirty' way generating graphs by using the UNIX 'grep' and 'awk' shell commands to filter log files, turning the resultant output into a CSV file and loading it into a spreadsheet.
I deliberately coded a trigger to increase in CPU as the row count increases and use this to show how log files can be used to pin-point the cause of the failure.
I start off by looking at the profile log file ( $NCH
First I filter for this specific message using grep.
Then I filter the results with an AWK script which separates out the time and date, total time for report period and CPU time in period and outputs them in CSV format. ( I also remove brackets and 's' from the numbers to avoid confusing the spreadsheet).
Here is a sample script, and command line to execute it.
This gives a CSV format output.
First of all, I am interested in the following message (which gives the sum time for all triggers within the profiling period).
The format of the trigger statistics log is similar to the profiler logs. The awk script needs only a few modifications to format the trigger stats into a CSV file (the column offset for 'b' and 'c' has increased by 1). I filter using grep and the awk script below.
The graph shows a regular spike in CPU usage, which implies a temporal trigger. It narrows down the times we are interested into a number of specific periods. I can now focus on these specific times to try and isolate if any specific triggers are using excessive CPU at these times.
Sure enough, in each period a trigger called 'slowtrigger' is taking most of the CPU.
We still need to diagnose the reasons for the trigger running slowly. However we have made our job much easier by narrowing down the likely piece of code causing the problem.
To find out more about interpreting ObjectServer performance logs and key performance indicators, see the following section on the information cent