Troubleshooting deployment scenarios

Control Center Director provides tools for troubleshooting and recovering from errors, such as log files. When you encounter an issue with your deployment log files should be the first place you look to assist you with the troubleshooting process.

The following table describes:
  • Relationship between common deployment scenarios and their associated log files
  • Common deployment scenarios-related FAQs
Table 1. FAQs and Log locations
Scenario/Question Details
Gathering logging events runDataCollector should be used to extract and report the current state of your Control Center Director deployment to help IBM Support troubleshoot an issue.

It collects logs, configurations, system metrics, and performs backup of the database without interrupting its operations and stores the information in a .zip archive file. You can then send the archive file to IBM Support to help diagnose and fix problems.

Deployment-related log files Log file is available at: ControlCenterDirector\log

Look for file name with format: Engine_YYYYMMDD_TIMESTAMP

Control Center Director Engine (Hearbeat)-related log files
  • Log file is available at: ControlCenterDirector\log
  • Look for [NODENAME]INFO ServiceMonitor for further details
  • Alternatively, look into CC_SERVER_COMPONENT table updated with details ServerID with status UP
License Data-related log files Look for LicenseDataCollector string in the log file for more details.
Customizing logging levels to suit logging requirements Set logs levels to DEBUG to start recording logs and issue the stopEngine/runEngine utility for changes to come into effect.

File path:
  • Control Center Director Engine logs

    /Conf/enginelogger.xml

  • Connect:Direct Agent logs

    CDInstallationDirectory/install/agent/bin/log4j2.xml

Agent state is down, whereas initial registration occurred via Agent.

Agent polling interval is 5 min. Invoke pollAgent.sh to validate if Agent is running on Connect:Direct Server.

To troubleshoot look for agent.log file available at following log location:

\cdinstall_dir\install\logs
How does agent communicate with Control Center Director? At startup, agent posts an OSA to communicate that Agent is up. This helps Control Center Director in Server auto-discovery.

During the upgrade process, agent sends OSA notifying Control Center Director about the upgrade status.

How does Agent distinguish between different upgrades it performs? Agent receives a unique key, correlation-id with each upgrade and uses it to distinguish and also prefix to the log files generated with each upgrade.
Auto Discovery-related failure
  1. Inspect agent.log for further diagnosis and to isolate the issue.
  2. One possible reason why Control Center Director could not discover Connect:Direct Server in your deployment is because it was not configured correctly.
  3. Other possible reason could be to do with updates made to the osa.url field that ensures Connect:Direct is auto discovered by Control Center Director.

    Agent provides a file polling mechanism that runs at a preconfigured polling interval (cpiPollTime) and detects any changes in osa.url field. If the osa.url field is modified during the polling interval the changes will only take effect at the end of the scheduled run.

  4. Another possible reason could be due to multiple Connect:Direct instances, you’re likely to run into port conflict issues unless you allocate a unique Agent listening port per instance. It is also recommended that having upgraded an instance, its unique port number must be applied before upgrading the next instance. This prevents potential errors that you could encounter during an upgrade process due to port conflict.
  5. One possible reason could be due to incorrect certificate-based configuration that is, either the Connect:Direct certificate is not trusted by Control Center Director or vice versa.
Will a bulk server upgrade operation, when stopped, bear an impact on servers upgrade for all including servers in the operation?
  • Control Center Director handles a bulk server upgrade process in batches. Batch upgrade groups servers together so that Control Center Director can execute upgrade operations in parallel.
  • If an error occurs during the upgrade process Control Center Director will continue to process remaining upgrade operations in the batch.
  • Batch size is set to 25 servers by default and can be modified by the Administrator.
  • Edit the batch-size property in DeploymentService.xml available at: conf/services/system.
  • stopEngine/runEngine utility for changes to come into effect.

Some servers under Failed Category in Job details view show up as Suspended by System.

  • This occurs is when a server was not upgraded due to multiple failure during a bulk upgrade process
  • When an error occurs during bulk upgrade process, Control Center Director will continue to process remaining upgrade operations provided it meets the failure percentage threshold set.
  • For example, for a batch size of 30 servers, if 15 jobs fail (50%) server status is displayed as Suspended by System.
  • Failure percentage is set to 40% by default and can be modified by the Administrator.
  • Edit the failurePercentage property in DeploymentService.xml available at:conf/services/system.
  • stopEngine/runEngine utility for changes to come into effect.
Server upgrade Job scheduled returns the following error: Job marked failed as OSA from Agent not received.
  • Control Center Director is set to poll Agent and wait for any response for up to 3600 seconds (default) to verify if the upgrade request is being processed. When this threshold is exceeded the Job is marked as failed with error: Job marked failed as OSA from Agent not received.
  • To modify the interval edit lastOSARecivedTimeDifferenceInSecToMarkJobFailed property in DeploymentService.xml available at:conf/services/system.
  • stopEngine/runEngine utility for changes to come into effect.
Is Agent polling interval by ICC Director Engine configurable?
  • Control Center Director Engine is set to make 3 attempts every 60 seconds to poll Agent to verify Agent activity.
  • To modify the polling interval and number of attempts edit the following parameters in DeploymentService.xml available at:conf/services/system:
    • AgentRestCallMaxNumberOfTime
    • timeBeforeNextRestCallInSec
  • stopEngine/runEngine utility for changes to come into effect.
    Note: Configuring Agent polling interval only applies to servers that are configured to be auto-discovered by Control Center Director.