IBM TRIRIGA Microsoft SQL Server database Best Practice recommendations to be reviewed by MS SQL Server DBA
Provide suggestions and recommendations for the IBM TRIRIGA MS SQL Server database environment. Also, provide monitoring and trouble resolution steps to quickly discover and prevent performance and instability degradation.
This document is in response to performance related issues occurring in the installation of the TRIRIGA Application. Key areas to address:-
MS SQL Server performance problems can be difficult to pinpoint and require inspection into many aspects of the environment. There is no magic button or specific step by step actions to locate every performance issue. What exist are guidelines for diagnosing and troubleshooting common performance problems. The purpose of this document will be to provide a general methodology for diagnosing and troubleshooting MS SQL Server database instability & performance problems in common scenarios.
Configuration, tuning and sizing issues for MS SQL Server may lead to various instability and performance issues from database end. The instructions listed below are to be reviewed, confirmed and applied by customer’s DBA whenever necessary.
TRIRIGA Customer Typical findings
There are basic best practices that should be implemented on a production MS SQL Servers.
SQL Server Best Practices
SQL Service: The account that runs the SQL Server Service should be included in the two Local Security Policy Groups below:
“Lock Pages in Memory” – Prevents SQL Server allocated memory from being swapped to disk.
“Perform volume maintenance tasks” - Enables Instant File Initialization, the system will not initialize storage it allocates. Instead, the storage will remain uninitialized until SQL Server writes data to it. Preventing the server from zero-filling storage.
1. Click Start.
2. Go to Administrative tools | Local Security Policy.
3. Click Local Policies Click User Rights Assignment.
4. Scroll down to “Lock pages in memory”, and double-click it.
5. Click Add User or Group, and enter the SQL Server service account name.
6. Scroll down to “Perform volume maintenance tasks”, and double-click it.
7. Click Add User or Group, and enter the SQL Server service account name.
Server Memory: Change “Max Server Memory” to coincide with the chart below. This setting controls how much memory can be used by the SQL Server Buffer Pool. If you don’t set an upper limit for this value, other parts of SQL Server, and the operating system can be starved for memory, which can cause instability and performance problems. These settings are for x64, on a dedicated database server, only running the DB engine,
“Max Degree Parallelism”: Set to 4. Currently set at 0, unlimited parallelism. Takes effect immediately. This will limit parallelism to 4 threads and decrease CXPACKET waits.
EXEC dbo.sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
EXEC dbo.sp_configure 'max degree of parallelism', 4;
RECONFIGURE WITH OVERRIDE;
TempDB: Add data Files to match number of cores. Running with a single tempdb data file can create unacceptable latch contention and I/O performance bottlenecks. To mitigate these problems, allocate one tempdb data file per processor core. (This is different than the practice for user-defined database files, where the recommendation is 0.5 to 1 data file per core.) Change Growth rate to a fixed increment. Turn on AutoStats.
--Simple script to generate Temp file SQL, change for your path. Creates 2GB files with 512MB Growth. (Generate results as text, copy, paste run. Takes affect after SQL restart)
set nocount on
declare @filename int
select @filename = '1'
while @filename <= 32
Select 'ALTER DATABASE [tempdb] ADD FILE ( NAME = N''tempdev' + conv
Select @filename = @filename + 1
In Windows 2008, networking should be set for “Maximize settings for networked applications”.
Database Best Practices
Autogrow: Change Autogrow on data and log files, increase growth amount to 500MB+ per file depending on rate of growth.
Auto Update Stats: Change to Async. Prevents SQL from waiting on stats to update.
The optimizer will initiate a statistics update operation when about 20% of the rows in an index have changed. Normally, whatever query was being optimized at the time will then have to wait until the statistics are updated before it can finish the optimization phase and begin executing. With the 'async' option, a separate thread will update the stats, and the query that was being optimized will continue and use the older stats when its plan is generated.
Update Statistics: Consider running on a nightly or weekly basis with fullscan.
IBM TRIRIGA specific recommendations
Increase Tridata data files to 0.5 to 1 data file per core.
All files to be exact same size and growth settings.
Suggestion to equalize the current server:
Create a new filegroup and add 4-8 files large enough to hold the largest table in Tridata.
Rebuild the table clustered index on the new filegroup with drop existing. This would allow SQL to stripe the table data and indexes across the newly created files without having to unload and load the database. Repeat as needed for similar groupings. Maybe another group for reference and configuration tables and so on until the Primary filegroup can be reduced.
Snapshot Isolation to reduce blocking:
Consider testing with Snapshot_isolation and read
ALTER DATABASE << My Database >>
ALTER DATABASE << My Database >>
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
SELECT EmployeeID, LastName, FirstName, Title
Keep a close watch on unused indexes and missing indexes.
, 'CREATE NONCLUSTERED INDEX ix_IndexName ON ' + sys.objects.name COLLATE DATABASE_DEFAULT + ' ( ' + IsNu
ELSE CASE WHEN mid.
ELSE ',' END + mid.
ELSE 'INCLUDE (' + mid.
INNER JOIN sys.
INNER JOIN sys.
INNER JOIN sys.objects WITH (nolock) ON mid.OBJECT_ID = sys.
WHERE (migs.group_handle IN
SELECT TOP (500) group_handle
ORDER BY (avg
ORDER BY 2 DESC , 3 DESC
SELECT o.name, indexname=i.name, i.index_id , reads=user_seeks + user_scans + user_lookups
, writes = user_updates , rows = (SELECT SUM(p.rows) FROM sys.partitions p WHERE p.index_id = s.index_id AND s.object_id = p.object_id)
WHEN s.user_updates < 1 THEN 100
ELSE 1.00 * (s.user_seeks + s.user_scans + s.user_lookups) / s.user_updates
END AS reads_per_write
, 'DROP INDEX ' + QUOTENAME(i.name)
+ ' ON ' + QUOTENAME(c.name) + '.' + QUOT
INNER JOIN sys.indexes i ON i.index_id = s.index_id AND s.object_id = i.object_id
INNER JOIN sys.objects o on s.object_id = o.object_id
INNER JOIN sys.schemas c on o.schema_id = c.schema_id
AND s.database_id = DB_ID()
AND i.type_desc = 'nonclustered'
AND i.is_primary_key = 0
AND (SELECT SUM(p.rows) FROM sys.partitions p WHERE p.index_id = s.index_id AND s.object_id = p.object_id) > 10000
ORDER BY reads
SQL Server Monitoring
Management Data Warehouse
The management data warehouse should be turned on for monitoring sql. This tool will provide baseline information and allow for timeline snapshots of performance data. Near Real-time performance reporting and query history.
SQL Black Box Trace
Consider turning on the MS SQL server black box trace to run at startup. This trace data can be viewed when blocking occurs without having to start a new trace. Two rolling 25MB files. This data can be reviewed in profiler and also sent off to application providers for further analysis.
SELECT * FROM fn_trace_getinfo (2);
SELECT * FROM fn_trace_gettable (
'C:\Program Files\Microsoft SQL Serv
Autostart Blackbox with two rolling logs:
CREATE PROCEDURE StartBlackBoxTrace
DECLARE @TraceId int
DECLARE @maxfilesize bigint
SET @maxfilesize = 25
@options = 8,
@tracefile = NULL,
@maxfilesize = @maxfilesize
EXEC sp_trace_setstatus @TraceId, 1
The topic of chasing SQL performance issues has been documented by various SQL “Gurus” but all source the same document. This was written for SQL 2005, but nothing about the methodology had changed. This Microsoft document contains the areas of concern, how to determine the cause, and possible resolutions.
Highlights from the documentation:
Discover CPU Bottleneck, investigate further if “run
scheduler_id < 255
What running the most and recompiling, start at the top and work down:
select top 25
cross apply sys.
order by plan_generation_num desc
Create and use the sp_block proc:
create proc dbo.sp_block (@spid bigint=NULL)
-- This stored procedure is provided "AS IS" with no warranties, and
-- confers no rights.
-- Use of included script samples are subject to the terms specified at
-- T. Davidson
-- This proc reports blocks
-- 1. optional parameter @spid
'blk object' = t1.r
sys.dm_tran_locks as t1,
Check the “Analyzing operational index statistics” section in the appendix for script to evaluate index usage.
Monitor for Excessive compilation and recompilation
SQL Server: SQL Statistics: Batch Requests/sec
SQL Server: SQL Statistics: SQL Compilations/sec
SQL Server: SQL Statistics: SQL Recompilations/sec
SQL Server: Buffer Manager object
Low Buffer cache hit ratio
Low Page life expectancy
High number of Checkpoint pages/sec
High number Lazy writes/sec
PhysicalDisk Object: Avg. Disk Queue Length represents the average number of physical read and write requests that were queued on the selected physical disk during the sampling period. If your I/O system is overloaded, more read/write operations will be waiting. If your disk queue length frequently exceeds a value of 2 during peak usage of SQL Server, then you might have an I/O bottleneck.
Avg. Disk Sec/Read is the average time, in seconds, of a read of data from the disk. Any number
Less than 10 ms - very good
Between 10 - 20 ms - okay
Between 20 - 50 ms - slow, needs attention
Greater than 50 ms – Serious I/O bottleneck
Avg. Disk Sec/Write is the average time, in seconds, of a write of data to the disk. Please refer to the guideline in the previous bullet.
Physical Disk: %Disk Time is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. A general guideline is that if this value is greater than 50 percent, it represents an I/O bottleneck.
Avg. Disk Reads/Sec is the rate of read operations on the disk. You need to make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.
Avg. Disk Writes/Sec is the rate of write operations on the disk. Make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.
Microsoft 3rd Party supporting Information
Troubleshooting Performance Problems in SQL Server - http
SQL 2008 Management Data Warehouse (performance monitoring) - http
SQL Server 2005 Waits and Queues & SQL Server Best Practices Article - http
Chris K 270004Y3TR Visits (8184)
Being on the IBM TRIRIGA Support team, I have seen my share of PMRs where the customer is reporting a performance issue. The intent of this blog is provide you with sufficient information so that you can at least be ahead of the ball when it comes down to determining the root cause of your IBM TRIRIGA performance issues.
First and foremost, I need to make it clear that performance can be affected by a wide range of components. Network latency, insufficient memory or CPUs on the key servers, older Java releases either being used by the application and process servers as well as on client machines, and the list goes on. We will focus only on the application in this blog post. Other components should be addressed by the appropriate IT groups like the network support, database support, and server support teams.
Second, you should review and understand the Best Practices for System Performance. The URL below will take you to a wiki page where you can get to a copy of the PDF document. Share this guide with your IT teams as it describes some detailed information regarding tuning of those other components. Ultimately, however, your IT teams may determine that their parts of the TRIRIGA infrastructure and optimally tuned. The rest of this blog post will deal with actions TRIRIGA administrators can take to identify bottlenecks and begin the process of tuning the application.
URL to Best Practices for System Performance:
Third, as a TRIRIGA Administrator, you should review performance on a regular basis. If your installation is used heavily, your performance tuning activities may need to be done on a weekly basis. If your TRIRIGA implementation is relatively small, then you might consider performance tuning activities on a monthly, quarterly, annually or some other basis. As the title of this blog post suggests, you should not simply assume that you need to do performance tuning when you first install the product and never do it again. As the database grows, performance may degrade. The application takes steps to minimize the degradation through the daily cleanup process, but that is only a small part of the tuning. Just as your car requires periodic maintenance to perform optimally, so to does the TRIRIGA application.
Your best tool for analyzing TRIRIGA performance is the performance log. In a future post, I will provide more detail on how to analyze the performance log. For now, I will simply provide instructions on how to get the data required for further analysis. Below is a step by step process to follow for gathering performance related information.
1) Login to the Admin Console
2) Click on Platform Logging in the Managed Objects section on the left side of the screen.
3) On the right side of the screen is a list of categories that have a hierarchical structure. Scroll down to the Performance Timings category.
4) Click on the check box immediately in front of Performance Timings. This will cause all of the subcategories to be checked.
5) Scroll down to the bottom of the page and click on the Apply button.
6) At this point, performance logging is turned on and the performance.log file should appear in the TRIRIGA directory structure in the log sub-directory.
7) Perform activities that users have indicated are poorly performing. This action, along with other actions taking place at the time of the testing, will get logged to the performance.log file.
8) Once you have completed the process of recreating the performance issue, log back into the Admin Console and turn off the performance logging.
9) Perform analysis of the resulting performance.log. (The details of this will be the subject of my next blog post.)