Technical Blog Post
S&P agent reports SQLCODE=-960, SQLSTATE=57011
Summarization and Pruning agent reports SQLCODE=-960, SQLSTATE=57011
There have been a number of cases where partitioning has been running on the warehouse database and the following error has been seen in the S&P logs:
== 13237562 t=work7 EXCEPTION: com.tivoli.twh.ksy.core.KSYException:
com.ibm.db2.jcc.am.SqlException: The maximum number of objects have been
created in the table spaces with the following names or identifiers
"USERSPACE1".. SQLCODE=-960, SQLSTATE=57011, DRIVER=3.64.106
Or the S&P agent has not be working correctly and this has been seen in the DB2daig.log:
FUNCTION: DB2 UDB, catalog services, sqlrlAlterPartitionOps, probe:100
MESSAGE : ZRC=0x85020007=-2063466489=SQLB_NTOK "no file token available"
DIA8307C Maximum file tokens were used.
DATA #1 : String, 9 bytes
These errors indicate that the maximum number of table objects in the table space has been reached, and thus DB2 is not able to add another partition.
Once the limit has been hit it cannot be increased and the only option is to move data to another tablespace which is below the limit or remove some partitions.
Since the S&P agent can only work with one table space it is not possible to send the data to another tables space, so the number of partitions needs to be reduced.
To clear down the database to be able to work again the empty partitions need to be removed and older partitions need to be pruned.
One way to do this is given below but first some information on why the limit can be hit.
Why it happens:
The issue is due to the number of partitions that are needed if the data is not pruned in a realistic fashion.
Note: It would be possible to revert back to a non partitioned database, however performance will suffer greatly.
This is especially true as time goes by due to the large amount of processing required on the Database Server's part to process all the rows.
With Partitioning we have measured a 50% improvement in performance, without Partitioning it may take DAYS to process all the table aggregations.
Also any TCR (tTvoli Common Reporting) reports would also gradually take a longer and longer time to be produced.
To explain how the partition number can increase too high lets look at how the partitioning works:
The WPA and S&P are configured to use partitioning and a value for "KSY_PARTITONS_UPWARD" (Number of Partitions into the future) is set, the defualt is 10, and for this discussion that is the value we will be use.
So when Range Partitioning is enabled, the WPA and the KSY Agent create ten (10) Partitions in the WAREHOUS Database for the Raw Table,
For example lets use the agent attribute: KUD_DB2_Database00, and each of the Aggregation Tables (KUD_DB2_Database00_H, KUD_DB2_Database00_D, KUD_DB2_Database00_W, KUD_DB2_Database00_M, KUD_DB2_Database00_Q, KUD_DB2_Database00_Y) that are defined by the customer in for their TEP Historical Collection Configuration.
For the Detail (Raw) table KUD_DB2_Database00 the WPA will create one Partition to hold today's data, one Partition for all data received from Agents before today (also known as the _MV Partition),
and ten (10) Partitions into the future. This means that initially there are twelve (12) Partitions created for the Raw Table.
So for the Weekly Aggregation table KUD_DB2_Database00_W we create one Partition (P20180121_MV) for data collected from Agents that is older than this week, one Partition to contain Agent data for this week (P20180128) and ten Partitions to hold Agent data into the future P20180204, P20180211,P20180218, P20180225, P20180304, P20180311, P20180318, P20180325, P20180401, and P20180408.
For again a total of twelve (12) Partitions.
Therefore there are twelve Partitions for each of these Aggregation tables.
As periods of time, Days, Weeks, Months, Quarters and Years, pass the KSY Agent rotates the Partitions in these tables such that the number of Partitions into the future is maintained.
So when next week arrives, what was the Partition _MV which held Agent data older than last week, becomes the new _MV Partition, the previous _MV Partition is dropped and a new future Partition is added.
Again retaining a total Partition count of twelve.
This rotation is repeated for all Raw and Aggregation tables that the customer has configured via the TEP Desktop Historical Collection Configuration dialog.
This all works fine for any particular table as long as the configuration for the retention of data is less than the twelve units of Partitions data initially allocated.
The configuration for the retention of data is set by the value of when the tables are pruned.
However when the configuration of Agent data retention value exceeds the Number of Future Partitions, the rotation algorithm changes.
For example in this case below of what is set for the historical collection; the Number of Partitions into the Future was set as ten (10) but the retention values in the historical collection of several tables exceeds this value.
............Table : KPH06FILES, KPH_FILE_SYSTEMS
................Summarize for (hour)
................Summarize for (day)
................Summarize for (week)
................Summarize for (month)
................Summarize for (quarter)
................Summarize for (year)
................Prune (raw) table when (4 day) old
................Prune (hour) table when (60 day) old
................Prune (day) table when (180 day) old
................Prune (week) table when (12 month) old
................Prune (month) table when (60 month) old
................Prune (quarter) table when (5 year) old
................Prune (year) table when (5 year) old
Here it has been requested that KSY retain Daily aggregation data for 180 days ( 6 months).
Since KSY has no choice but to honour the request, KSY must retain 10 Partitions into the future, one for today and 180 into the past, for a grand total of 191 Partitions to hold all the customer requested data.
This same retention strategy is applied to the Raw as well as Aggregation tables. So when the 192nd day arrives, then the KSY Agent will rotate the Partitions, still retaining 180 days worth of data.
This happens for every agent attribute that is saved into the warehouse, so keeping the values in check keeps the number of partitions down and so stops the database from hitting the limit.
It may be that the amount of data kept will change from agent to agent but areas to think about are:
1) What is needed for any reports that are run?
2) What data is required for TEPS queries?
3) Once the reports are run is the data still needed?
4) Once the data is summarized is the stage lower than that needed,
for example once raw is summarized into hourly data, how far back do you need the raw data?
once hourly data is summarized as daily data, will you just be using the daily data or do you need to review hourly, or raw data for a week , two weeks or for a month?
5) Obviously what the business requires also needs to be taken into account, but if data in any form is needed for other reasons than to service ITM requests then the data may need to be moved else where, to avoid the issue discussed here.
How to fix:
Basically the number of partitions needs to be reduced. Your Database Administrator may be able to do this for you or here are some steps that will help.
NOTE: As with any change, make sure there is a good backup of the database before you start.
Once the partition error is hit, you may be having issues with the WPA putting data into the warehouse and with the S&P agent running and pruning as well.
Therefore stop both of the agents for the moment. There may be more than one WPArunning in the environment, so stop all of them.
To make space for new partitions the first step is to check for any empty partitions and dropping these for the moment, then run maintenance on the database.
How to check for empty tables (this is assuming ITMUSER has been used )
1. Run runstats on all tables in WAREHOUS
db2 connect to WAREHOUS
db2 -x "select 'runstats on table',substr(rtrim(tabschema)||'.'||rtrim(tabname),1,50),' and indexes all;'from \ syscat.tables where type = 'T' " > runstats.out
db2 -tvf runstats.out
Then we can find which tables have 0 rows :
select tabname, card from syscat.tables where tabschema = 'ITMUSER > tablecount.out
Once tables with 0 rows are discovered these are the steps to follow:
if customer has found aggregated tables (_H, _D, _W, _M. _Q , _Y) that contain no rows, then these can be dropped:
su - <db2instance name>
db2 connect to WAREHOUS
db2 drop table ITMUSER.XXXX_H;
db2 drop table ITMUSER.XXXX_D;
db2 drop table ITMUSER.XXXX_W;
db2 drop table ITMUSER.XXXX_M;
db2 drop table ITMUSER.XXXX_Q;
db2 drop table ITMUSER.XXXX_Y;
2) The maintenance steps for the warehouse are discussed here:
This will reduce down the number of partitions, however a fresh run of the S&P will create some if not all of these so it is important that the pruning is changed as discussed above.
3) Modify the pruning settings to reduce the amount of data that is held.
4) Start only the S&P agent and let it complete a run.
Note: this may take longer than normal, as with the changed pruning values, there should be more to remove.
Once the run has complete, the correct partitions should also now be set up by the S&P agent for future runs, so the WPA (or multiple WPAs) can now be restarted so that new data can be stored.
Overtime if more agent types are added to the environment then more partitions will be needed and so values for data retention may need to be reviewed.
Subscribe and follow us for all the latest information directly on your social feeds:
|Academy Twitter :||https://goo.gl/HnTs0w|