Gathering logging events |
runDataCollector should be used to extract and report the current state of
your Control Center Director deployment to help IBM Support troubleshoot an issue. It collects
logs, configurations, system metrics, and performs backup of the database without interrupting its
operations and stores the information in a .zip archive file. You can then send the archive file to
IBM Support to help diagnose and fix problems.
|
Deployment-related log files |
Log file is available at: ControlCenterDirector\log
Look for file name with format: Engine_YYYYMMDD_TIMESTAMP
|
Control Center Director Engine
(Hearbeat)-related log files |
- Log file is available at: ControlCenterDirector\log
- Look for
[NODENAME]INFO ServiceMonitor for further details
- Alternatively, look into
CC_SERVER_COMPONENT table updated with details
ServerID with status UP
|
License Data-related log files |
Look for LicenseDataCollector string in the log file for more
details. |
Customizing logging levels to suit logging requirements |
Set logs levels to DEBUG to start recording logs and issue the
stopEngine/runEngine utility for changes to come into effect.
File path:
-
Control Center Director Engine
logs
/Conf/enginelogger.xml
|
Agent state is down, whereas initial registration occurred via Agent.
|
Agent polling interval is 5 min. Invoke pollAgent.sh to validate if Agent is
running on Connect:Direct Server.To troubleshoot look for agent.log file
available at following log location:
\cdinstall_dir\install\logs |
How does agent communicate with Control Center Director? |
At startup, agent posts an OSA to communicate that Agent is up. This helps Control Center
Director in Server auto-discovery. During the upgrade process, agent sends OSA notifying Control Center Director about the upgrade
status.
|
How does Agent distinguish between different upgrades it performs? |
Agent receives a unique key, correlation-id with each upgrade and uses it to distinguish and
also prefix to the log files generated with each upgrade. |
Auto Discovery-related failure |
- Inspect
agent.log for further diagnosis and to isolate the issue.
- One possible reason why Control Center Director could not discover
Connect:Direct Server in your deployment is because it was not configured correctly.
- Other possible reason could be to do with updates made to the
osa.url field
that ensures Connect:Direct is auto discovered by Control Center Director. Agent provides a file
polling mechanism that runs at a preconfigured polling interval (cpiPollTime ) and
detects any changes in osa.url field. If the osa.url field is
modified during the polling interval the changes will only take effect at the end of the scheduled
run.
- Another possible reason could be due to multiple Connect:Direct instances, you’re likely to run
into port conflict issues unless you allocate a unique Agent listening port per instance. It is also
recommended that having upgraded an instance, its unique port number must be applied before
upgrading the next instance. This prevents potential errors that you could encounter during an
upgrade process due to port conflict.
- One possible reason could be due to incorrect certificate-based configuration that is, either
the Connect:Direct certificate is not trusted by Control Center Director or vice versa.
|
Will a bulk server upgrade operation, when stopped, bear an impact on servers upgrade for all
including servers in the operation? |
- Control Center Director handles a bulk
server upgrade process in batches. Batch upgrade groups servers together so that Control Center Director can execute upgrade
operations in parallel.
- If an error occurs during the upgrade process Control Center Director will continue to process
remaining upgrade operations in the batch.
- Batch size is set to 25 servers by default and can be modified by the Administrator.
- Edit the
batch-size property in DeploymentService.xml
available at: conf/services/system.
stopEngine/runEngine utility for changes to come into effect.
|
Some servers under Failed Category in Job details view show up as Suspended by System.
|
- This occurs is when a server was not upgraded due to multiple failure during a bulk upgrade
process
- When an error occurs during bulk upgrade process, Control Center Director will continue to process
remaining upgrade operations provided it meets the failure percentage threshold set.
- For example, for a batch size of 30 servers, if 15 jobs fail (50%) server status is displayed as
Suspended by System .
- Failure percentage is set to 40% by default and can be modified by the Administrator.
- Edit the
failurePercentage property in DeploymentService.xml
available at:conf/services/system.
stopEngine/runEngine utility for changes to come into effect.
|
Server upgrade Job scheduled returns the following error: Job marked failed as OSA
from Agent not received. |
- Control Center Director is set to poll
Agent and wait for any response for up to 3600 seconds (default) to verify if the upgrade request is
being processed. When this threshold is exceeded the Job is marked as failed with error:
Job
marked failed as OSA from Agent not received .
- To modify the interval edit
lastOSARecivedTimeDifferenceInSecToMarkJobFailed
property in DeploymentService.xml available
at:conf/services/system.
stopEngine/runEngine utility for changes to come into effect.
|
Is Agent polling interval by ICC Director Engine configurable? |
|