Statistics are essential for query optimizer to generate an optimal access plan. The runstats utility collects necessary statistics and optimizer makes use of these statistics during query compilation. In this blog, I would like to briefly describe different ways in which a user can collect statistics.
User can execute the runstats command manually or through admin command with appropriate option to collect statistics on a particular table and/or index. Refer to the following link to know various options supported under runstats to gather table and index statistics.
Deciding which statistics to collect for a workload is complex, and maintaining up-to-date statistics is time consuming. Out-of-date or incomplete table or index statistics might lead the optimizer to select a sub optimal plan, which slows down query execution.
To overcome this situation in version V8.2 we introduced a concept called automatic table maintenance, in short we address this as auto runstats.This feature is enabled by default in which, database manager will take care of maintaining up-to-date statistics for tables automatically.
Auto runstats evaluation happens at every 2 hrs,5 min of time interval. It examines each table under active database to consider for statistics collection . If table statistics are missing or stale it will decide to collect statistics during the next maintenance window. By default the maintenance window will be set to 24x7. User has the privilege to set maintenance window depending on the system workload. One can also control statistics collection by setting maintenance policy. For example, If you want to exclude nicknames from automatic statistics collection process, then you can set a policy to accomplish the task.
User can always deactivate auto runstats by setting db cfg parameter AUTO_RUNSTATS=OFF.
RTS (Real Time Statistics):
This feature was first introduced in version v9.5. It will collect statistics synchronously at statement compilation time. This feature is enabled by default when you create a database. Query optimizer determines how statistics are collected based on query need and table activity (amount of update/insert/delete operation). RTS collects statistics during statement compilation there by providing more timely and accurate statistics for the optimizer which results in better query execution plan.
RTS collects statistics via synchronous/fabrication and make it available immediately to optimizer by updating it in statistics cache.
RTS daemon evaluate every 5 min of time interval and process any pending request from previous runstats activity Ex: Writing statistics cache value to disk or collecting statistics asynchronously.
User can always deactivate RTS by setting db cfg parameter AUTO_STMT_STATS=OFF
learn more about automatic table maintenance process (Auto runstats and RTS) in the following link.
User can collect statistics during 'LOAD' by specifying 'STATISTICS USE PROFILE'. To collect statistics via LOAD, one has to create a profile using runstats command before executing LOAD. To know more on how to create runstats profile refer to the link mentioned under 'Manual runstats'.
Similar to LOAD command, user can collect statistics via 'REDISTRIBUTE' by specifying 'STATISTICS USE PROFILE' option. User should first create a runstats profile before executing REDISTRIBUTE command to collect statistics.
DB2 Text Search supports 2 types of server configuration - integrated and stand alone server( DB2 V10 onwards). By default Text Search installs an integrated text server. Text Search also supports a standalone text server which can be configured on an independent machine other than the DB2 server. What this allows customers is, to separate out the resource intensive text index maintenance and free resources on the DB2 server.
From DB2 V105 fp4 onwards Text Search allows users to configure multiple standalone servers for each DB2 instance. This scalability is extremely beneficial especially in cases of a multi-node DB2 setup, where DB2 Text Search customers were limited to use only one TS server for processing the data from all DB2 nodes. With support for multiple standalone servers for Text Search, TS users can decide the server for each of the TS indexes at index creation and willfully distribute the indexes. The users can distribute the TS indexes based on the index usage , index size and machine capability. This flexibility enables TS users also to scale better on a multi-node setup and not limit their text indexing workload of all DB2 nodes on a single TS server.
The below steps will guide to setup DB2 Text Search server with multiple standalone servers.
Install standalone Text Search servers on required systems.
At minimum, configure the following on the standalone Text Search server using the configTool:
configureHTTPListener: Configures the DB2 Text Search server port and host name
generateToken: Generates the authentication token and encryption key
defaultDataDirectory: Configures the parameters for the collection
The server configuration of the setup is now complete. Repeat the above steps for all standalone servers to be added to the setup.
The create index statement has a new option SERVERID to specify the the server details. The serverid should correspond to a valid standalone server , details of which should be available in SYSIBMTS.SYSTSSERVERS.
Eg: To create an index on serverid as 3 in SYSIBMTS.SYSTSSERVERS issue create index command as below:
db2ts "create index db2ext.index for text on test(text) serverid 3"
NOTE: When no value is specified for serverid option in the create index statement, serverid 1 is the default server and index will be created on this server.
serverid is not required to be specified with the update index or the search statements.
The serverid corresponding to the index can be found in the SYSIBMTS.SYSTSINDEXES.
On Feb 1st, 2012, I gave a seminar on my experience working with IBM Optim Tool. This session was well attended;
In the presentation, I talked about my expriences with the product and the way we customised the tool for our needs. The most intriguing thing about IBM Optim tool is that how well it binds with the backend database. We used DB2 for zOS but I'm sure Optim is as well integrated good enough with other databases.
We were able to implement a comprehensive Archial and Purge Solution in less than 3 months. This is a tremendous achievement keeping in mind that everyday data archival was in the range of about 0.3 million rows!
Do share your thoughts, experiences on IBM Optim tool in this blog.
Do you face issues with automatic runstats? Do you have questions on what criteria automatic runstats evaluates tables? Do you wonder how to get the current UDI (Update/Delete/Insert) counter, catalog cache and statistics cache information? If yes,then you are at the right place. In this blog I will introduce you to different db2pd options which can be used to gather information about TCB (Table Control Block), Statistics Cache and Catalog Cache.
This will generate UDI and RTSUDI counter information. These counters are referred by auto-runstats and RTS to make decision on statistics collection. You can get the TCB information by executing ..
db2pd -d test -tcbstats
Statistics collected by RTS will be stored first in statistics cache before writing into system catalog tables. If you face issues with the RTS causing incorrect stats,memory leak or crash during statistics collection/Hardening (writing the cached values to catalog tables) process, then analyzing the statistic cache will help to narrow down the issue.
db2pd -d test -statisticscache
Specify 'detail' clause to get more information which includes formatted packed descriptor output.
db2pd -d test -statisticscache detail
Statistics cache is part of catalog cache. Understanding details of catalog cache will sometimes helps to narrow down issues with statistics cache.
db2pd -d test -catalogcache
Statistics cache is part of catalog cache. Understanding details of catalog cache will sometimes helps to narrow down issues with statistics cache.
How To Map Table File Object Id with Table Name in SMS Table Space
TABLEID Column in SYSCAT.TABLES
TAB_FILE_ID Column in MON_GET_TABLE Function
TAB_FILE_ID Column in SYSIBMADM.ADMINTEMPTABLES View
db2pd -db dbname -tcbstats tbspid
How db2 get instance displays incorrect instance
How to Switch between multiple instances
Execute /home/db2inst1/sqllib/db2profile for switching instances
Issue db2 terminate when switching between multiple instances
So you are smart and have a heart of Gold. You want to use your talent for social good but do not know how or where to start.
How about we, give you a chance to showcase your smartness for a social cause and let you win dollars too!
Sounds exciting? Read on.
Natural or man-made, disasters bring chaos. developerWorks gives you a chance to put your coding and testing skills to become a part of the rescue team from miles away. Participate in the contest and help ease the communication and logistics breakdown that happens when disaster stricks. Use your own idea or copy and improvize someone elses.
How To Restrict Non Unicode clients accessing Unicode Database.
Create Connections Table based Event Monitor.
Create Stored Procedure Raising Error for Non Unicode Clients.
Update connect_proc DB CFG parameter.
How Table Size increased after Table Reorganization
Check for PCTFREE Table Attribute;
If set to 50 then after table reorganization it will double the table size.
Also, When loading 10GB of Data, it will occupy 20GB of Table Size.
From DB2 version 10.5 on wards there are few new information added to this file which are specific to pureScale on Linux and AIX platform. These information are really useful and helpful for the customer and DBA to take a decision on the method of applying fixpack. Let us look at the meaning of each row and how it will be helpful in this tutorial.
Looking at the image, you can decide what type fixpack update (online or offline ) is possible when you want to apply the image in the DB2 fixpack CD. This would help to decide on whether you need a maintenance window to take an outage of the cluster or it does not need an outage to your pureScale instance at all.
The content of the file looks like :
[image root directory on CD /server / db2]# cat spec
The architecture level is a new tag that is associated with every level of shipped DB2 code starting from DB2 10.5 ( including special builds and fixpack). This field lists architecture of db2 engine , and this number gets changed between version to version only when there are some fixes related online fixpack updates are made on the db2 server code.
Indicates the code version of the DB2 in the CD image
This number get changed between version to version only when there are changes on “how DB2 creates the static sections in the catalog”.
This flag indicates whether this fixpack can be applied online on top of the existing DB2 pureScale version directly or not. The value 'Y' indicates online fixpack update is possible.
This field indicates minimum version of DB2 on top of which this particular DB2 version in the CD can be used for an online fixpack update. Basically it indicates the minimum required base version, on top which this image can be used for an online fixpack update.
In the example show in the image , the DB2 code level is 10.5.0.5. This fixpack image can be applied online to the existing DB2 pureScale version 10.5.0.3 or 10.5.0.4 only.
This field indicates whether any pervious level of version is need to apply this fixpack online. If the value is “Y” , it indicates that there was a fix in this particular version that requires two level of codes to be installed completely. The first level of code is the toleration level (and is marked as Toleration Yes), and must be installed before the second level of code. If the value of the field is “N” means, there is no fixes in this version needs another level.
This field indicates the minimum version of DB2 on top of which this DB2 version in the CD can be used for traditional manual method of fixpack update or an offline method.
This indicate the efix level of GPFS that is bundled with the DB2 version in CD image.
GPFS version that is bundled in the CD image.
TSA level that is shipped with the DB2 in this CD image
Efix level of the TSA that is bundled along with DB2 in CD image.
Looking at gpfs_level, gpfs_efix , tsa_level and tsa_efix_build level fields, DBA can decide whether there will be an impact to the other products that are using the same TSA or GPFS. User can compare the current version of TSA and GPFS in their system and version mentioned this spec file. If there are difference, it means applying this fixpack will need GPFS and TSA in to maintenance mode apart from DB2.
For customers who are having “user managed GPFS” for their DB2 pureScale instance, these fields indicate the required version of GPFS and its efix level one needs to have before applying the DB2 fixpack.
DB2 text search indexing is a resource consuming task and it takes times based on number of documents to be indexed, the type and size of the document etc. In this blog, I would like to briefly explain how we can check the progress of ongoing DB2 Text search index updates.
To list the number of documents indexed so far in MYSCHEMA.MYTITLEIDX index:
./adminTool.sh status -configPath ~/sqllib/db2tss/config -collectionName lappanna_TSDB_TS435205_0000 CollectionName IndexSize Documents Deleted Base 300 B 0 0 lappanna_TSDB_TS435205_0000 419 B 24 0
If there was any ongoing index updates, then running the same command again after some time will show the updated number of documents.
./adminTool.sh status -configPath ~/sqllib/db2tss/config -collectionName lappanna_TSDB_TS435205_0000 CollectionName IndexSize Documents Deleted Base 300 B 0 0 lappanna_TSDB_TS435205_0000 490 B 27 0
The total number of documents that has to be indexed can be found by checking the number of rows in base table (in this case it is table books) on which index is created or if it is incremental index update that is in progress, then users can check the number of rows in the staging table to get the total number of documents that should be there in collection altogether.
Note: We can get the collection name for any text search index, by querying the SYSIBMTS.TSCOLLECTIONNAMES administrative view.
db2 "select COLLECTIONNAME from SYSIBMTS.TSCOLLECTIONNAMES where INDNAME='MYSCHEMA' and INDSCHEMA='MYTITLEIDX'"
Thus we can check the progress of DB2 Text Search index updates by comparing the number of rows in base table(on which index is created) for initial updates or in staging table for incremental updates with the number of documents in collection got by adminTool.
To list the number of documents indexed in all text search indexes:
./adminTool.sh status -configPath ~/sqllib/db2tss/config CollectionName IndexSize Documents Deleted Base 300 B 0 0 lappanna_TSDB_TS435205_0000 867 B 40 0 lappanna_TSDB_TS435409_0000 604 B 95 0
DB2 Text search event view provides information about indexing status and error events. Events like starting, committing and finishing of index update are also availabe in this event table. Users can get the event view name for a given text search index by querying the SYSIBMTS.TSINDEXES administrative view.
db2 "SELECT EVENTVIEWSCHEMA, EVENTVIEWNAME FROM SYSIBMTS.TSINDEXES WHERE INDSCHEMA='MYSCHEMA' AND INDNAME='MYTITLEIDX'"
We are all seeing it ! The databases are getting bigger and bigger and the queries are becoming more and more complex. Under these circumstances, if a particular SQL query does not perform well then it is very critical to find and remove the performance bottleneck as soon as possible. In this blog, I would like to share a systematic step-by-step method to tune the given query.
Prerequisite: We need to know the most important resources required for query processing.
A query is processed in two phases: compilation and execution. During compilation phase, the most important work done is query optimization. The optimizer considers various access plans and chooses the least expensive one for executing the given query. During execution phase, the query is actually executed by following the steps mentioned in the access plan.
For query-tuning purposes, it is very important to know the major resources required for the query compilation and execution phases. The compilation phase will use/access statement-heap, catalog-cache, package-cache, etc. The execution phase will use/access package-cache, sort-heap, shared-sort-heap, lock-list, buffer-pool, etc. The following block-diagram shows these resources. It also shows which resources are part of agent's private memory and which resources are part of database global memory.
Step 1: Monitor the actual resources consumed while DB2 was processing the given query.
By using access plan ( db2exfmt output ), DB2 Snapshots and/or DB2 SQL Monitoring Functions, we can quickly identify the heavily utilized resources for processing our SQL query. Sometimes, it is enough to know which resource was not sufficient for our query. Let us consider a simple example.
Example 1: If the db2exfmt output shows the following message, then it tells us that statement heap was not enough for query compiler: SQLCA : (Warning SQLCA from compile)
SQLCODE 437; Function SQLNO***; Message token '1'; Warning 'None'
In this case, the access plan could be sub-optimal. Increasing statement heap should help here.
However, many times, it is not enough to just know the heavily utilized resource. We also need to find out the context in which, the resource was utilized. This is where access plan analysis helps a lot !
Step 2: Analyze the access plan to get more insights.
The access plan ( generated by using db2exfmt tool ) provides good insights for query-tuning. We can find out the “context” in which , the given resource was heavily consumed. Let us take an example:
Example 2: If the snapshots/monitoring-functions show one or more of the following symptoms, then it means the sort-heap was very heavily consumed:
- sort overflows
- rows written for a SELECT query
- temporary data logical/physical reads
- hash loops
- hash join overflows
In this case, we need to use section-actuals and find out if the input cardinality of SORT/HSJOIN operation was underestimated or not. If underestimated, then we need to provide more statistics to help optimizer choose a better access plan. Otherwise, we can create an index, which may help us avoid the SORT/HSJOIN operations altogether.
In this way, we can find out the reason behind slow running query and then figure out ways to improve the query performance. In this blog, I have given a high level overview of the step-by-step query tuning method. Stay tuned for some concrete examples ! In my next blog, I will share a scenario, where a statistical view helped improve query performance.
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions.
Payload method is one of the install method which is used by advanced customer, wherein you don't install DB2 using commands instead just uncompress the binaries. Below are the steps do create pureScale instance using the payload method
1)Download and install DB2
a) Log on as a root user.
b)Download the image and unzip to a location.
c)Uncompress the tar ball of each component which are present in /<image_location>/db2/platform/FILES/ directory.
Note:Only extract the signature tar of the product you want to install.
To uncompress the tar balls, run the appropriate command:
gunzip -c /<image_location>/db2/platform/FILES/filename.tar.gz | tar -xvf -
tar xzvf /<image_location>/db2/platform/FILES/filename.tar.gz
InstallPath is the full path name where you are installing:
2) Change the embedded path
where InstallPath is the full path name where you are installing.
3) Create global registry and create an entry for you installation
4) Create a directory sd in the install path(InstallPath) and copy the entire image ino the directory