Primary role moves between cluster caching facility hosts
The primary role is not on the same host as the last time the db2instance -list command was run, that is, the primary role has moved to another cluster caching facility host. This host change indicates a problem with the cluster caching facility at some point in the past that might need to be investigated.
This is a sample output from the db2instance -list command
showing a three member,
two cluster caching facility environment:
ID TYPE STATE HOME_HOST
-- ---- ----- ---------
0 MEMBER STARTED hostA
1 MEMBER STARTED hostB
2 MEMBER STARTED hostC
128 CF PEER hostD
129 CF PRIMARY hostE
CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
------------ ----- ---------------- ------------ -------
hostA NO 0 0 hostA-ib0
hostB NO 0 0 hostB-ib0
hostC NO 0 0 hostC-ib0
hostD NO - 0 hostD-ib0
hostE NO - 0 hostE-ib0
HOSTNAME STATE INSTANCE_STOPPED ALERT
-------- ----- ---------------- -----
hostA ACTIVE NO NO
hostB ACTIVE NO NO
hostC ACTIVE NO NO
hostD ACTIVE NO NO
hostE ACTIVE NO NO
Like members,
each cluster caching facility will
log information into the cfdiag*.log and dump
more diagnostic data when required. The files will reside in the
directory set by the database manager configuration parameter cf_diagpath or
if not set, the diagpath or $INSTHOME/sqllib_shared/db2dump/
$m by default.- cluster caching facility Diagnostic
Log Files (cfdiag-timestamp.cf_id.log)
- Each of these files keep a log of the activities related to a cluster caching facility. Events, errors, warnings, or additional debugging information will be logged here. This log has a similar appearance to the db2diag log file. A new log is created each time a cluster caching facility starts.
- Note that there is a single static cluster caching facility diagnostic log name that always points to the most current diagnostic logging file for each cluster caching facility and has the following format: cfdiag.cf_id.log
- cluster caching facility Output
Dump Diagnostic Files (cfdump.out.cf_pid.hostname.cf_id)
- These files contain information regarding cluster caching facility startup and stop. There might be some additional output shown here.
- Management LWD Diagnostic Log File (mgmnt_lwd_log.cf_pid)
- This log file displays the log entries of a particular cluster caching facility's LightWeight Daemon (LWD) process. Errors presented in this log file indicate the LWD has not started properly. A successful start will not have ERROR messages in the log.
- cluster caching facility stack
files (CAPD.cf_pid.tid.thrstk)
- These are stack files produced by the cluster caching facility when it encounters a signal. These files are important for diagnosing a problem with the cluster caching facility.
- cluster caching facility trace
files (CAPD.tracelog.cf_pid)
- A default lightweight trace is enabled for the cluster caching facility. These trace files appear whenever the cluster caching facility terminates or stops. These might indicate a problem with the cluster caching facility, only in combination with other diagnostic data can these files be useful in diagnosing any errors.
A startup and initialization message will be shown in the cluster caching facility dump
files. For example, the message for cfdump.out.1548476.host04.128 contains
the message that shows a successful process start:
CA Server IPC component Initialised: LWD BG buffer count: 16
Session ID: 1d
CA Server IPC component Acknowledged LWD Startup Message
Waiting for LWD to Configure Server
Processors: (4:4) PowerPC_POWER5 running at 1498 MHz
Cluster Accelerator initialized
Cluster Accelerator Object Information:
OS: AIX 64-bit
Compiler: xlC VRM (900)
SVN Revision: 7584
Built on: Oct 12 2009 at 17:00:54
Executable generated with symbols
Model Components Loaded: CACHE LIST LOCK
Transport: uDAPL
Number of HCAs: 1
Device[0]: hca0
CF Port[0]: 50638
Mgmnt Port Type: TCP/IP
Mgmnt Port: 50642
IPC Key: 0xe50003d
Total Workers: 4
Conn/Worker: 128
Notify conns: 256
Processor Speed: 1498.0000 MHz
In this example, cfdiag-20091109015035000037.128.log contains
a successful process start. If the cluster caching facility did
not start properly, this log might be either empty or contain error
messages. For example:
2009-11-09-01.50.37.0051837000-300 E123456789A779 LEVEL : Event
PID : 688182 TID : 1
HOSTNAME : host04
FUNCTION : CA svr_init, mgmnt_cfstart
MESSAGE : CA server log has been started.
DATA #1 :
Log Level: Error
Debugging : active
Cluster Accelerator Object Information
AIX 64-bit
Compiler: xlC VRM (900)
SVN Revision: 7584
Built on Oct 12 2009 at 17:00:59
Executable generated with symbols.
Executable generated with asserts.
Model Components Loaded: CACHE, LIST, LOCK
Transport: uDAPL
Number of HCAs: 1
Device[0]: hca0
CF Port[0]: 50638
Total Workers: 4
Conn/Worker: 128
Notify conns: 256
Processor Speed: 1498.000000 Mhz.
Allocatable Structure memory: 170 MB
Look for the relevant cluster caching facility diagnostic
log files by looking for the cfdiag log that
has the same CF ID
as the failed cluster caching facility.
For example, if CF ID
128 failed (as it did in the previous db2instance -list command),
use the following command:$ ls cfdiag*.128.log
cfdiag.128.log -> cfdiag-20091109015035000215.128.log
cfdiag-20091110023022000037.128.log
cfdiag-20091109015035000215.128.log
Note that cfdiag.128.log always
points to the most current cfdiag log for CF 128.
Look into cfdiag-20091109015035000037.128.log (the
previous cfdiag log) and the db2diag log
file at a time corresponding to 2009-11-10-02.30.22.000215 for errors.The system error log for the affected host can also be consulted if the cause of the error is still unknown. Log onto the unstartedcluster caching facility host and view the system error log by running the errpt -a command (on Linux®, look in the /var/log/messagesfile). In the example shown here, log in to hostD because CF 128 experienced the failure.