Troubleshooting deployment scenarios

Control Center Director provides tools for troubleshooting and recovering from errors, such as log files. When you encounter an issue with your deployment log files should be the first place you look to assist you with the troubleshooting process.

The following table describes:
  • Relationship between common deployment scenarios and their associated log files
  • Common deployment scenarios-related FAQs
Table 1. FAQs and Log locations
Scenario/Question Details
Gathering logging events runDataCollector should be used to extract and report the current state of your Control Center Director deployment to help IBM Support troubleshoot an issue.

It collects logs, configurations, system metrics, and performs backup of the database without interrupting its operations and stores the information in a .zip archive file. You can then send the archive file to IBM Support to help diagnose and fix problems.

Deployment-related log files Log file is available at: ControlCenterDirector\log

Look for file name with format: Engine_YYYYMMDD_TIMESTAMP

Control Center Director Engine (Hearbeat)-related log files
  • Log file is available at: ControlCenterDirector\log
  • Look for [NODENAME]INFO ServiceMonitor for further details
  • Alternatively, look into CC_SERVER_COMPONENT table updated with details ServerID with status UP
License Data-related log files Look for LicenseDataCollector string in the log file for more details.
Customizing logging levels to suit logging requirements Set logs levels to DEBUG to start recording logs and issue the stopEngine/runEngine utility for changes to come into effect.

File path:
  • Control Center Director Engine logs

    /Conf/enginelogger.xml

  • Connect:Direct Agent logs

    CDInstallationDirectory/install/agent/bin/log4j2.xml

Agent state is down, whereas initial registration occurred via Agent.

Agent polling interval is 5 min. Invoke pollAgent.sh to validate if Agent is running on Connect:Direct Server.

To troubleshoot look for agent.log file available at following log location:

\cdinstall_dir\install\logs
How does agent communicate with Control Center Director? At startup, agent posts an OSA to communicate that Agent is up. This helps Control Center Director in Server auto-discovery.

During the upgrade process, agent sends OSA notifying Control Center Director about the upgrade status.

How does Agent distinguish between different upgrades it performs? Agent receives a unique key, correlation-id with each upgrade and uses it to distinguish and also prefix to the log files generated with each upgrade.
Auto Discovery-related failure
  1. Inspect agent.log for further diagnosis and to isolate the issue.
  2. One possible reason why Control Center Director could not discover Connect:Direct Server in your deployment is because it was not configured correctly.
  3. Other possible reason could be to do with updates made to the osa.url field that ensures Connect:Direct is auto discovered by Control Center Director.

    Agent provides a file polling mechanism that runs at a preconfigured polling interval (cpiPollTime) and detects any changes in osa.url field. If the osa.url field is modified during the polling interval the changes will only take effect at the end of the scheduled run.

  4. Another possible reason could be due to multiple Connect:Direct instances, you’re likely to run into port conflict issues unless you allocate a unique Agent listening port per instance. It is also recommended that having upgraded an instance, its unique port number must be applied before upgrading the next instance. This prevents potential errors that you could encounter during an upgrade process due to port conflict.
  5. One possible reason could be due to incorrect certificate-based configuration that is, either the Connect:Direct certificate is not trusted by Control Center Director or vice versa.
Will a bulk server upgrade operation, when stopped, bear an impact on servers upgrade for all including servers in the operation?
  • Control Center Director handles a bulk server upgrade process in batches. Batch upgrade groups servers together so that Control Center Director can execute upgrade operations in parallel.
  • If an error occurs during the upgrade process Control Center Director will continue to process remaining upgrade operations in the batch.
  • Batch size is set to 25 servers by default and can be modified by the Administrator.
  • Edit the batch-size property in DeploymentService.xml available at: conf/services/system.
  • stopEngine/runEngine utility for changes to come into effect.

Some servers under Failed Category in Job details view show up as Suspended by System.

  • This occurs is when a server was not upgraded due to multiple failure during a bulk upgrade process
  • When an error occurs during bulk upgrade process, Control Center Director will continue to process remaining upgrade operations provided it meets the failure percentage threshold set.
  • For example, for a batch size of 30 servers, if 15 jobs fail (50%) server status is displayed as Suspended by System.
  • Failure percentage is set to 40% by default and can be modified by the Administrator.
  • Edit the failurePercentage property in DeploymentService.xml available at:conf/services/system.
  • stopEngine/runEngine utility for changes to come into effect.
Server upgrade Job scheduled returns the following error: Job marked failed as OSA from Agent not received.
  • Control Center Director is set to poll Agent and wait for any response for up to 3600 seconds (default) to verify if the upgrade request is being processed. When this threshold is exceeded the Job is marked as failed with error: Job marked failed as OSA from Agent not received.
  • To modify the interval edit lastOSARecivedTimeDifferenceInSecToMarkJobFailed property in DeploymentService.xml available at:conf/services/system.
  • stopEngine/runEngine utility for changes to come into effect.
Is Agent polling interval by ICC Director Engine configurable?
  • Control Center Director Engine is set to make 3 attempts every 60 seconds to poll Agent to verify Agent activity.
  • To modify the polling interval and number of attempts edit the following parameters in DeploymentService.xml available at:conf/services/system:
    • AgentRestCallMaxNumberOfTime
    • timeBeforeNextRestCallInSec
  • stopEngine/runEngine utility for changes to come into effect.
    Note: Configuring Agent polling interval only applies to servers that are configured to be auto-discovered by Control Center Director.
SEAS properties do not reflect from existing setup on upgrading IBM Sterling Control Center product from version 1.2 to 6.2 or later

External authentication properties held in existing installer setup will not automatically reflected after upgrade.

You can reconfigure the external authentication attributes by running script config.bat and set SEAS properties at “Config step: External Authentication Server configuration ...” prompt.

How can I change product entitlement? To perform change entitlement you must upgrade to version 6.2 or later. Run configureEntitlement.bat. Make sure all process of Control Center Director (to be upgraded) are down otherwise it will give error. While running, select the required type of product to install namely IBM Sterling Control Center Director , IBM Sterling Control Center Monitor or All Product(Combined).

CD server not dynamically discoverable or Upgrade not working on already discovered CD servers after removing a protocol TLS v1.0 on Control Center Director

Error :
 cd.admincenter.agent.osa.DispatchOsaService - Error encounterred in connecting with OSA server
javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
Append the JVM parameter as follows and restart Agent after making the changes:
  • UNIX: Append JVM parameter in script startAgent.sh present at path ~install/agent/bin/startAgent.sh {parameter -Dcom.ibm.jsse2.overrideDefaultTLS=true}. Sample file:
    if [ $CD_ARCH = "1" ]
                     then
                                $JAVA_PATH/java -cp $AGENT/bin/$PROCESS_NAME:$AGENT/bin/lib/*:$CD_INSTALL_DIR/ndm/bin/SPAdmin.jar $HPUXIT_64BIT_FLAG -Dcom.ibm.jsse2.overrideDefaultTLS=true  -Dlog4j.configurationFile=$AGENT/bin/log4j2.xml cd.admincenter.agent.server.App>> ${LOG_DIR}/agent.log 2>&1 &
                     else
                                $JAVA_PATH/java -cp $AGENT/bin/$PROCESS_NAME:$AGENT/bin/lib/*:$CD_INSTALL_DIR/ndm/bin/SPAdmin.jar -Dcom.ibm.jsse2.overrideDefaultTLS=true  -Dlog4j.configurationFile=$AGENT/bin/log4j2.xml cd.admincenter.agent.server.App>> ${LOG_DIR}/agent.log 2>&1 &
                     fi
    
  • Windows: Add the JVM parameter to Lax file path ~install\agent\bin\InstallAgent.lax {parameter -Dcom.ibm.jsse2.overrideDefaultTLS=true}. Sample file:
    lax.nl.java.option.additional=-Dcom.ibm.jsse2.overrideDefaultTLS=true 
    -Djava.library.path="C:\\Program Files\\IBM\\Connect Direct v6.1.0\\Server\\Secure+" 
    -Dlog4j.configurationFile="C:\\Program Files\\IBM\\Connect Direct v6.1.0\\install\\agent\\bin\\log4j2.xml"
High Availability Environment
Issue with client auth while setting-up the Load Balancer
  • Check the discovery/new-install with the individual node of the cluster, set server.use-forwarded-for-with-proxy=false. If the connection is successful then there is problem in Load Balancer setup.
  • Make sure SSL client certificate is downloaded using /new-install/download-ca-public-pem.

    For more details, check the prerequisite section of Load Balancer in Preparing For High Availability Environment.
When accessing from client you may face multiple issues:
  • In Tools > Audit Log, if you see Logged in from IP is Load Balancer IP and not the client IP
  • C:D installed using new-install feature is getting discovered by Load Balancer IP rather than its own IP/hostname
  • In case of dynamic discovery using wild-card IP, the server node gets discovered with Load Balancer IP and not its own IP/hostname
  • Base URL IP on swagger is not the Load Balancer IP
  • User gets value of Remote IP as Load Balancer IP in-place of incoming request IP in log <ccwebusage_currentDate>.log
Set server.use-forwarded-for-with-proxy to true, when using Load Balancer.
When connected to Load Balancer, getting error in accessing functionality intermittently.
  • Check if all the nodes on added in Load Balancer are working on optimum speed.
  • Check individual nodes in cluster and verify each node is working fine, check if any node is slow.
  • Either resolve the cause of slow speed of the node or detach from cluster.