IBM Support

在开启了多路径功能的SUSE或者Redhat Linux上创建DB2 Purescale实例失败

Troubleshooting


Problem

在开启了多路径功能的SUSE或者Redhat Linux上创建DB2 Purescale实例可能会失败。

Symptom

db2icrt.log显示如下信息:
=========================
Creation of the RSCT peer domain, "db2domain_xxxxxxxxxxxxxx", succeeded on the host "test1".

ERROR: The shared file system cluster could not be created. Check the diagnostic logs for more information.
A diagnostic log has been saved to '/tmp/ibm.db2.cluster.xxxxxx'.
ERROR: DBI20027E The GPFS cluster creation failed on host "test1".
Failed command: "/ibmdb2/db2/V10.5/bin/db2cluster -create -cfs -host
test1 -domain db2cluster_xxxxxxxxxxxxxx -disk /dev/db2-disk2".

Creation of the IBM General Parallel File System (GPFS) cluster failed on the
host "test1". The instance is not created.

DBI20000E A DB2 instance could not be created as specified. A rollback of the
instance creation will be started.

Deletion of the RSCT peer domain, "db2domain_xxxxxxxxxxxxxx", succeeded.

Deletion of the IBM General Parallel File System (GPFS) cluster succeeded.

A db2cluster command has failed. Collect the logs from /tmp/ibm.db2.cluster.*.

Configuring DB2 instances :.......Failure

=========================

文件/tmp/ibm.db2.cluster.xxxxxx显示:

=========================


xxxx-xx-xx-xx.xx.xx.xxxxxx+480 I4635E502 LEVEL: Severe
PID : 28306 TID : 140673617291040 PROC : db2cluster
INSTANCE: NODE : 000
HOSTNAME: test1
FUNCTION: DB2 UDB, high avail services, sqlhaCFSCallFunctionDirect, probe:114
MESSAGE : ECF=0x9000061D=-1879046627=ECF_SQLHA_CFS_OUTPUT_NOT_RECOGNIZED
The output from the CFS call is not recognized.
DATA #1 : String, 22 bytes
DB2CFS function error.
DATA #2 : signed integer, 4 bytes
31

xxxx-xx-xx-xx.xx.xx.xxxxxx+480 I5138E364 LEVEL: Error
PID : 28306 TID : 140673617291040 PROC : db2cluster
INSTANCE: NODE : 000
HOSTNAME: test1
FUNCTION: DB2 UDB, high avail services, sqlhaUICreateCFSDomain, probe:138
MESSAGE : ECF=0x9000052E=-1879046866=ECF_SQLHA_FAILED
SQLHA API call error

xxxx-xx-xx-xx.xx.xx.xxxxxx+480 I5503E409 LEVEL: Warning
PID : 28306 TID : 140673617291040 PROC : db2cluster
INSTANCE: NODE : 000
HOSTNAME: test1
FUNCTION: DB2 UDB, oper system services, sqloMessage, probe:1
MESSAGE : Cannot obtain registry variables
DATA #1 : Hexdump, 4 bytes
0x00007FFF256211D8 : B400 0F87 ....

xxxx-xx-xx-xx.xx.xx.xxxxxx+480 E5913E363 LEVEL: Error
PID : 28306 TID : 140673617291040 PROC : db2cluster
INSTANCE: NODE : 000
HOSTNAME: test1
FUNCTION: DB2 UDB, high avail services, sqlhaUICreateDomains, probe:1168
RETCODE : ECF=0x9000052E=-1879046866=ECF_SQLHA_FAILED
SQLHA API call error

xxxx-xx-xx-xx.xx.xx.xxxxxx+480 E6277E353 LEVEL: Error
PID : 28306 TID : 140673617291040 PROC : db2cluster
INSTANCE: NODE : 000
HOSTNAME: test1
FUNCTION: DB2 UDB, high avail services, sqlhaUIMain, probe:616
MESSAGE : ECF=0x9000052E=-1879046866=ECF_SQLHA_FAILED
SQLHA API call error

Cause

这是GPFS配置问题导致,在没有使用nsddevices用户退出脚本的情况下,GPFS不能够检测到一些块设备。一个典型的例子是RedHat在/dev/mapper/mpath*下的DM-MP设备名。

截至GPFS 3.4.0.3, GPFS所认识的设备有

For Linux

Device NameGPFS Device typeDescription
dm-dmmDevice-Mapper Multipath (DMM)
vpathvpathIBM virtual path disk
Sd/hdGenericDevice having no unique failover or multipathing characteristic (predominantly Linux devices).
emcpowerpowerdiskEMC power path disk

针对上面的例子,多路径设备/dev/db2-disk2是用户自定义的名字,不在这个列表中,所以GPFS不能检测到它。

Environment

操作系统: Linux

Diagnosing The Problem

1. 创建一个文件,比如/tmp/nodefile,内容如下
test1:manager-quorum

2. 以root用户在test1上运行
/usr/lpp/mmfs/bin/mmcrcluster -C cluster1 -N /tmp/nodefile -p test1

3. 创建一个文件,比如/tmp/disk2,内容如下
/dev/db2-disk2

4. 以root用户在test1上运行
/usr/lpp/mmfs/bin/mmcrnsd -F /tmp/disk2
echo $?

输出会显示:
/usr/lpp/mmfs/bin/mmcrnsd -F /tmp/disk2
mmcrnsd: Processing disk db2-disk2
mmcrnsd: db2-disk2 was not found in /proc/partitions.
mmcrnsd: Failed while processing disk descriptor /dev/db2-disk2 on node test1.
mmcrnsd: Command failed. Examine previous error messages to determine cause.

Resolving The Problem

既然/dev/db2-disk2是一个多路径设备名, 应该使用nsddevices脚本来帮助GPFS发现这个设备:

1) 将nsddevices sample文件拷贝到/var/mmfs/etc:

# cp /usr/lpp/mmfs/samples/nsddevices.sample /var/mmfs/etc/nsddevices

2) 修改文件属性,使其可以执行:

# chmod +x /var/mmfs/etc/nsddevices

3) 编辑nsddevices脚本,只编辑操作系统是Linux的部分,如下所示:

if [[ $osName = Linux ]]

echo "db2-disk2 generic"

fi

# To continue with the GPFS disk discovery steps,
return 1

4) 之后,运行mmcrnsd或者创建Purescale实例将会成功

[{"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"High Availability - PureScale","Platform":[{"code":"PF016","label":"Linux"}],"Version":"9.8;10.1;10.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21687403