S&P agent reports SQLCODE=-960, SQLSTATE=57011

Technical Blog Post

Abstract

Body

mon

Summarization and Pruning agent reports SQLCODE=-960, SQLSTATE=57011

In some cases where partitioning is running on the warehouse database and the following error is seen in the S&P logs:

== 13237562 t=work7 EXCEPTION: com.tivoli.twh.ksy.core.KSYException:
com.ibm.db2.jcc.am.SqlException: The maximum number of objects have been
created in the table spaces with the following names or identifiers
"USERSPACE1".. SQLCODE=-960, SQLSTATE=57011, DRIVER=3.64.106

Or the S&P agent is not working correctly and we see in the DB2daig.log:

FUNCTION: DB2 UDB, catalog services, sqlrlAlterPartitionOps, probe:100
MESSAGE : ZRC=0x85020007=-2063466489=SQLB_NTOK "no file token available"
DIA8307C Maximum file tokens were used.
DATA #1 : String, 9 bytes

These errors indicate that the maximum number of table objects in the table space is reached, and thus Db2 is not able to add another partition.

Once the limit is reached, it cannot be increased. The only option is to move data to another tablespace, which is below the limit or remove some partitions.

Since the S&P agent can only work with one table space it is not possible to send the data to another tables space, so the number of partitions needs to be reduced.

To clear down the database to be able to work again the empty partitions need to be removed, and older partitions need to be pruned.

One way to do this is given below but first some information on why the limit can be hit.

Why it happens:

The issue is due to the number of partitions that are needed if the data is not pruned in a realistic fashion.

Note: It would be possible to revert back to a non-partitioned database, however performance will suffer greatly.

This is especially true as time goes by due to the large amount of processing required on the Database Server's part to process all the rows.

With Partitioning we have measured a 50% improvement in performance, without Partitioning it may take DAYS to process all the table aggregations.

Also any Tivoli Common Reporting reports would also gradually take a longer and longer time to be produced.

To explain how the partition number can increase too high lets look at how the partitioning works:

The WPA and S&P are configured to use partitioning and a value for "KSY_PARTITONS_UPWARD" (Number of Partitions into the future) is set, the default is 10, and for this discussion that is the value we will be use.

So when Range Partitioning is enabled, the WPA and the KSY Agent create ten (10) Partitions in the WAREHOUS Database for the Raw Table,

For example lets use the agent attribute: KUD_DB2_Database00, and each of the Aggregation Tables (KUD_DB2_Database00_H, KUD_DB2_Database00_D, KUD_DB2_Database00_W, KUD_DB2_Database00_M, KUD_DB2_Database00_Q, KUD_DB2_Database00_Y) that are defined by the customer in for their TEP Historical Collection Configuration.

For the Detail (Raw) table KUD_DB2_Database00 the WPA will create one Partition to hold today's data, one Partition for all data received from Agents before today (also known as the _MV Partition), and ten (10) Partitions into the future. This means that initially there are twelve (12) Partitions created for the Raw Table.

So for the Weekly Aggregation table KUD_DB2_Database00_W we create one Partition (P20180121_MV) for data collected from Agents that is older than this week, one Partition to contain Agent data for this week (P20180128) and ten Partitions to hold Agent data into the future P20180204, P20180211, P20180218, P20180225, P20180304, P20180311, P20180318, P20180325, P20180401, and P20180408.

For again a total of twelve (12) Partitions.

Therefore there are twelve Partitions for each of these Aggregation tables.

As periods of time, Days, Weeks, Months, Quarters and Years, pass the KSY Agent rotates the Partitions in these tables such that the number of Partitions into the future is maintained.
So when next week arrives, what was the Partition _MV which held Agent data older than last week, becomes the new _MV Partition, the previous _MV Partition is dropped and a new future Partition is added.

Again retaining a total Partition count of twelve.

This rotation is repeated for all Raw and Aggregation tables that the customer has configured via the TEP Desktop Historical Collection Configuration dialog.

This all works fine for any particular table while the configuration for the retention of data is less than the twelve units of Partitions data initially allocated.

The configuration for the retention of data is set by the value of when the tables are pruned.
However when the configuration of Agent data retention value exceeds the Number of Future Partitions, the rotation algorithm changes.

For example in this case below of what is set for the historical collection; the Number of Partitions into the Future was set as ten (10) but the retention values in the historical collection of several tables exceed this value.

............Table : KPH06FILES, KPH_FILE_SYSTEMS
................Summarize for (hour)
................Summarize for (day)
................Summarize for (week)
................Summarize for (month)
................Summarize for (quarter)
................Summarize for (year)
................Prune (raw) table when (4 day) old
................Prune (hour) table when (60 day) old
................Prune (day) table when (180 day) old
................Prune (week) table when (12 month) old
................Prune (month) table when (60 month) old
................Prune (quarter) table when (5 year) old
................Prune (year) table when (5 year) old

Here it has been requested that KSY retains Daily aggregation data for 180 days (6 months).
Since KSY has no choice but to honour the request, KSY must retain 10 Partitions into the future, one for today and 180 into the past, for a grand total of 191 Partitions to hold all the customer requested data.
This same retention strategy is applied to the Raw as well as Aggregation tables. So when the 192^nd day arrives, then the KSY Agent will rotate the Partitions, still retaining 180 days worth of data.

This happens for every agent attribute that is saved into the warehouse, so keeping the values in check keeps the number of partitions down and so stops the database from hitting the limit.

It may be that the amount of data kept will change from agent to agent but areas to think about are:

1) What is needed for any reports that are run?

2) What data is required for TEPS queries?
3) Once the reports are run is the data still needed?
4) Once the data is summarized, is the stage lower than that needed?
for example once raw is summarized into hourly data, how far back do you need the raw data?
Once hourly data is summarized as daily data, will you just be using the daily data or do you need to review hourly, or raw data for a week, two weeks or for a month?
5) Obviously what the business requires also needs to be taken into account, but if data in any form is needed for other reasons than to service IBM Tivoli Monitoring requests then the data may need to be moved else where, to avoid the issue discussed here.

How to fix:

Basically the number of partitions needs to be reduced. Your Database Administrator may be able to do this for you or here are some steps that will help.

NOTE: As with any change, make sure there is a good backup of the database before you start.

Once the partition error is hit, you may be having issues with the WPA putting data into the warehouse and with the S&P agent running and pruning as well.
Therefore stop both of the agents for the moment. There may be more than one WPA, so stop all of them.

To make space for new partitions the first step is to check for any empty partitions and dropping these for the moment, then run maintenance on the database.

How to check for empty tables (this is assuming ITMUSER has been used):

1. Run runstats on all tables in WAREHOUS

db2 connect to WAREHOUS
db2 -x "select 'runstats on table',substr(rtrim(tabschema)||'.'||rtrim(tabname),1,50),' and indexes all;'from \ syscat.tables where type = 'T' " > runstats.out
db2 -tvf runstats.out

Then we can find which tables have 0 rows :
select tabname, card from syscat.tables where tabschema = 'ITMUSER > tablecount.out

Once tables with 0 rows are discovered then then follow these steps:

if customer has found aggregated tables (_H, _D, _W, _M. _Q, _Y) that contain no rows, then these can be dropped:

su - <db2instance name>
db2 connect to WAREHOUS

db2 drop table ITMUSER.XXXX_H;
db2 drop table ITMUSER.XXXX_D;
db2 drop table ITMUSER.XXXX_W;
db2 drop table ITMUSER.XXXX_M;
db2 drop table ITMUSER.XXXX_Q;
db2 drop table ITMUSER.XXXX_Y;

2) The maintenance steps for the warehouse are discussed here:
https://www.ibm.com/support/knowledgecenter/SSTFXA_6.3.0/com.ibm.itm.doc_6.3/install/dbdesign_six.htm

This will reduce down the number of partitions, however a fresh run of the S&P will create some if not all of these so it is important that the pruning is changed as discussed above.

3) Modify the pruning settings to reduce the amount of data that is held.

4) Start only the S&P agent and let it complete a run.
Note this may take longer than normal, as with the changed pruning values, there should be more to remove.

Once the run has complete, the correct partitions should also now be set up by the S&P agent for future runs, so the WPA (or multiple WPAs) can now be restarted so that new data can be stored.

Overtime if more agent types are added to the environment then more partitions will be needed and so values for data retention might need to be reviewed.

For example, :

****** PRODUCT CODE: KLZ ******

KLZ_Disk,Detailed,8,D,Hourly,1,M,Daily,3,M,Weekly,1,Y,Monthly,1,Y,Quarterly,-1,Y,Yearly,-1,Y,KLZDISK

For the table KLZ_Disk, the object is "KLZ_Disk"

Once aggregated tables are dropped, and entry for that table is deleted in WAREHOUSEMARKER do the following:

Next steps:

1. Create a new tablespace of pagesize 8k
su - db2apm
db2 connect to WAREHOUS
db2 "CREATE BUFFERPOOL ITMBUF8K IMMEDIATE SIZE 250 PAGESIZE 8 K"
db2 "CREATE REGULAR TABLESPACE ITMREG8K PAGESIZE 8 K MANAGED BY SYSTEM USING ('itmreg8k') BUFFERPOOL ITMBUF8K"

2. Check the tablespace "ITMREG8K" has been created
db2 list tablespaces

3.Stop S&P and all the WPA :
/opt/ibm/sy/bin/itmcmd agent stop sy

/opt/ibm/sy/bin/itmcmd agent stop hd

4a.Modify the configuration of ksy for hd and sy to create the tables in this new Tablespace.
KSY_ON_DEMAND=Y
KSY_DEFAULT_TABLE_CONTAINER=ITMREG8K
KSY_DEFAULT_INDEX_CONTAINER=ITMREG8K

Modify the configuration of khd

KHD_DEFAULT_TABLE_CONTAINER=ITMREG8K
KHD_DEFAULT_INDEX_CONTAINER=ITMREG8K

4b. Alternatively, use the Schema Tool ($CANDLEHOME/<arch>/bin/tdwschema.sh) to create tables, in the Tablespace you specify manually. This method makes it easier to create the correct SQL commands to perform the actions needed to create the tables required.

5. Restart ksy and khd
/opt/ibm/sy/bin/itmcmd agent start sy

/opt/ibm/sy/bin/itmcmd agent start hd

6.Check the tables have been recreated
db2 list tables for schema ITMUSER

7. Check that the partition was created in the new tablespace:

Example for the hourly table, replace XXX by the table name

SELECT
D.SEQNO,
D.TABNAME,
D.DATAPARTITIONNAME,
T.TBSPACE
FROM
SYSCAT.DATAPARTITIONS D,
SYSCAT.TABLESPACES T
WHERE
D.TABSCHEMA = 'ITMUSER' AND
D.TABNAME='XXX_H' AND
D.TBSPACEID= T.TBSPACEID
ORDER BY SEQNO;

You should see the TBSPACE = ITMREG8K

The index for the aggregated table should also be created in this new tablespace. The index name is the short name of the table followed by _HX for an hourly aggregated table

SELECT
T.TBSPACE,
P.DATAPARTITIONID,
P.DATAPARTITIONNAME,
I.INDNAME,
I.TABNAME
FROM
SYSCAT.INDEXPARTITIONS I,
SYSCAT.DATAPARTITIONS P,
SYSCAT.TABLESPACES T
WHERE
I.TABSCHEMA = 'ITMUSER' AND
I.INDNAME = 'KLZDISK_HX' AND
P.DATAPARTITIONID = I.DATAPARTITIONID AND
P.PARTITIONOBJECTID= I.INDPARTITIONOBJECTID AND
P.TABSCHEMA=I.TABSCHEMA AND
P.TABNAME=I.TABNAME AND
P.TBSPACEID= T.TBSPACEID
ORDER BY
I.INDNAME,
P.DATAPARTITIONID

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVJUL","label":"IBM Application Performance Management"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

UID

ibm11278148

IBM Support

Tips

S&P agent reports SQLCODE=-960, SQLSTATE=57011

Technical Blog Post

Abstract

Body

Summarization and Pruning agent reports SQLCODE=-960, SQLSTATE=57011

Why it happens:

How to fix:

UID

Share your feedback

Need support?