Technical Blog Post
Maintenance of log files.
Maintenance of log files
I have seen recently a number of pmrs where there has been issues with log files that have affected how we have been able to investigate and resolve issues.
When we ask for a pdcollect, one of the main areas collected is the logs directory.
There logs can highlight a lot of different issues to us, as it can show how the system starts, how long it has been running, other errors that may have been overlooked, or just not shown up in day today running.
If these logs are not sent correctly then we will need to ask for them again and so slow down the problem resolution.
Although ITM does regulate the number of logs files via configuration commands, sometimes there are system wide controls also set that can override these.
Firstly a quick word about pdcollect:
I have seen issues with the pdcollect being too large to be sent and timeouts occurring when it is sent to the IBM ftp server.
The initial response to this is usually to split the compressed file and send in sections, but this can cause issues.
As firstly we are not always told how many sections are being sent, and if they are still taking time to load up we can assume we have all sections when we don't.
Then even once we have all the sections, I have seen the uncompression still failing, which results in a new request to send logs.
Rather than split the full pdcollect file it would be better to do a pdcollect, then uncompress it and recompress it into separate sections and send that way.
Also please let us know if you have issues, and how you are sending the files as this avoids confusion!
If you have to just send logs instead ; make sure you send all the files in an invocation of the process.
We need the 01 logs as well as the latest one written to.
The <time> section as below is the same for each log in the invocation.
Now lets look at the logs directory itself.
This can be large and can be the main size with the issue of pdcollect, so here are some issues on maintenance...
1) The RAS1 logs,
These are the logs that rotate as default round 5 files before the 2-5 files are overwritten, the 01 file always being left.
They have the names such as
so <server_name>_<process>_<time>_<log number>.log
What we are looking for is a complete set of files per invocation.
The product does have controls over the maximum number of RAS1 files overall as well as the number per invocation and the location and is set in the config file.
Refer to this link to the Trouble shooting guide for all the detail:
This should give a good balance between keeping logs and filling up disk space. Let you system administrators know that ITM does monitor the files and that is the way to keep most under control.
However I have seen more than once that there has only been one or two files on the system and no *01.log. This has usually due to there being an automatic job that removes files over a certain age.
I even saw that I had a *05 and *02 where had it rotated round but no history of what had happened in between.
Saying that, also check with above link that the KBB_RAS1_LOG variable is set correctly in the config files.
Remember the *01.log is not written to after it fills in size, so it does age, so to delete files in the log directory based on age is NOT a good idea.
We expect to see all logs for at least the latest invocation of the process.
On the other hand the RAS1 logs from this time last year or the year before might not be of any use (unless there has been a very long running issue),
in fact last months logs may be just taking up space, especially if traces were set but the issue is now solved.
As with all maintenance it is a balance but please don't go too mad and delete files that may be helpful.
2) Tracing on logs,
Depending on the tracing and activity these logs may be large and rotate frequently, or be smaller with fewer files in the sequence present.
One area to note is to remember to set tracing off again when it is not longer required.
If tracing is requested and there is no information about when to set traces back to ERROR - ask the engineer before the pmr is closed.
Note that if all ITM processes in the RAS1 command flow are at ITM v6.2.3 Fix Pack 2 (or higher), it is possible to set traces off and on via tacmd settrace
and there are other dynamic ways of doing most traces, and setting them off again.
See this video blog: https://www.youtube.com/watch?v=PBBY0wCKU1A
A stop and start is not always needed to change trace settings.
On most systems there will be a number of kuiras1* logs in the directory, these are from tacmd commands.
Again the location, number of files in an invocation, the number of overall files are all controlled via the process.
This is in the KUIENV on windows and the tacmd command file on Unix/Linux.
However it is worth noting, that these are defined by the user id that runs them, so every user using tacmd commands has a separate limit.
Usually they are not large files but if there are a large number of tacmd commands run then they do build u, so it may be worth checkign on the number of files you have.
4) other logs
there are usually a number of install logs and other logs (migrates, exports) , most of these are best left.
So it is worth letting your system administrators know that ITM does monitor the files and that is the way to keep most under control,
I have seen systems where there was a overall system cron job that deleted files based on age which as already stated causes delays when trying to debug issues.
Subscribe and follow us for all the latest information directly on your social feeds:
|Academy Twitter Handle:||http://ow.ly/Dj35c|