Example of a GLVM-replicated Enterprise Edition cluster

Problem

We will show what "normal" looks like in a GLVM-replicated Enterprise Edition cluster

Symptom

Stale partitions in the replicated volume group

Cause

WAN outage, remote site outage, remote node down

Environment

AIX 7.1 with PowerHA 7.1.3

Resolving The Problem

We will show what "normal" looks like in a GLVM-replicated Enterprise Edition cluster
in the various states:

I. The basic configuration of the cluster
A) HA configuration
1) topology
a) HA topology
b) Site configuration

2) resources
a) Site relationship
b) Service IP labels
c) GLVM-replicated VG

B) GLVM configuration
1) RPV client hdisks
2) RPV server devices

II. Example output from a working cluster
A) PowerHA stopped on both local and remote nodes.
B) PowerHA running on the local node but stopped on the remote node.
C) PowerHA running on the remote node but stopped on the local node.
D) PowerHA running on both local and remote nodes.
E) Resource group online on the remote node with mirroring to the local node running
F) RPV device attributes
G) What happens when you stop HA

III. Basic troubleshooting
A) Determine if HA working automatically can bring the GLVM replication online
B) How to manually start the RPV devices, LVM and filesystem components outside of PowerHA control.
-----------------------------------------------------------------------------------------------
I. Basic Cluster Configuration

A) HA Configuration
1) Topology
You need a minimum of one XD_data network for GLVM replication.
An XD_ip network can be configured for additional heartbeating.

a) HA topology
The HA topology can be viewed from the cllsif -p output:

cllsif -p output:

Adapter Type Network Net Type Attribute Node IP Address Interface Name Netmask Length

halabc1 boot net_ether_01 XD_data public halabc1 10.10.20.30 en0 255.255.255.0 24
abc1service service net_ether_01 XD_data public halabc1 10.10.30.30 255.255.255.0 24
c1persist persistent net_ether_01 XD_data public halabc1 10.10.30.31 255.255.255.0 24
halabk1 boot net_ether_01 XD_data public halabk1 10.10.20.31 en0 255.255.255.0 24
abc1service service net_ether_01 XD_data public halabk1 10.10.30.30 255.255.255.0 24
k1persist persistent net_ether_01 XD_data public halabk1 10.10.30.41 255.255.255.0 24

Here is a diagram:

 ______________________                          ______________________

| halabc1              |                        | halabk1              |

|                      |                        |                      |

|                      |                        |                      |

| boot             en0 |                        | en0             boot |

|           halabc1    |----   net_ether_01 ----| halabk1              |

|       10.10.20.30/24 |        (XD_data)       | 10.10.20.31/24       |

|                      |         public         |                      |

|                      |                        |                      |

|                      |                        |                      |

|______________________|                        |______________________|

shared
ip_label: abc1service
ip_address: 10.10.30.30
network: net_ether_01

b) Site Configuration
The site configuration can be seen in the cllssite output:

---------------------------------------------------
Sitename Site Nodes Dominance Protection Type
---------------------------------------------------
siteC halabc1 yes NONE
siteK halabk1 no NONE

This shows that there are two sites with one node per site.
SiteC is set to be "dominant" which makes it the primary site.
The ramifications of being the primary site are discussed in regards
to "Site Relationship" options below.

2) Resources
The resource configuration can be viewed in the output from clshowres:

clshowres
Resource Group Name RG_A
Participating Node Name(s) halabc1 halabk1
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback
Site Relationship Online On Either Site
Node Priority
Service IP Label abc1service
Filesystems ALL
Filesystems Consistency Check fsck
Filesystems Recovery Method sequential
Filesystems/Directories to be exported (NFSv3)
Filesystems/Directories to be exported (NFSv4)
Filesystems to be NFS mounted
Network For NFS Mount
Filesystem/Directory for NFSv4 Stable Storage
Volume Groups dbvg
Concurrent Volume Groups
Use forced varyon for volume groups, if necessary true
Disks
Raw Disks
Disk Error Management? no
GMVG Replicated Resources dbvg
<snip>

A few things to note in the resource group configuration:

a) Site Relationship
The site relationship causes behavior similar to the startup and fallback policies
except in regards to the sites.

The other options their rough equivalents:

ignore = "Online On All Available Nodes" (cannot be used with a non-concurrent RG)
Prefer Primary Site = "Online on Home Node Only" and "Fallback to Higher Priority
Node"
Online On Either Site = start on either site and "Never FallbacK"
Online On Both Site = "Online On All Available Nodes"

The main issue to be aware of is the "Prefer Primary Site" because this overrides the "Never Fallback", Fallback setting so that the resources will move back to the Primary site when a node at the Primary Site is started, despite the Fallback Policy being set to "Never Fallback".

b) Service IP Labels and GLVM-Replicated volume groups

The service IP address is handled as in a Standard Edition cluster.
The Label placed in this field is used to resolve to an IP address which
is configured upon a boot interface when the resources are acquired.

Enterprise Edition also has an option for Site-Specific Service addresses where the address for the site
is configured only when the resources reside at that site.

However, site-specific service addresses are required to be in different subnets
which was not necessary with the single-subnet used in this example.

c) The GLVM-replicated volume group is placed in the resource group similarly to any
shared VG. If the shared volume group is also the GMVG replicated volume group
(and it is assumed this will almost always be true), the name of the VG is placed in

both the "Volume Groups" and "GMVG Replicated Resources" fields.

The shared volume group is called dbvg:

lsvg dbvg
VOLUME GROUP: dbvg VG IDENTIFIER: 00c2d43a00004c00000001530ee9eff0
VG STATE: active PP SIZE: 8 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 1534 (12272 megabytes)
MAX LVs: 256 FREE PPs: 1334 (10672 megabytes)
LVs: 1 USED PPs: 200 (1600 megabytes)
OPEN LVs: 1 QUORUM: 1 (Disabled)
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 1 STALE PPs: 2
ACTIVE PVs: 1 AUTO ON: no
Concurrent: Enhanced-Capable Auto-Concurrent: Disabled
VG Mode: Non-Concurrent
MAX PPs per VG: 32512
MAX PPs per PV: 1016 MAX PVs: 32
LTG size (Dynamic): 128 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
DISK BLOCK SIZE: 512 CRITICAL VG: no

B. GLVM Configuration
GLVM replication works by allowing us to use LVM mirroring in a volume group
where one copy of the data is placed on a local disk and another copy of the data is
placed on a remote disk.

One way to conceptualize GMVG replication is to consider only writes for now.

With LVM mirroring, we are writing to both disks in the mirrored logical volume.
One disk is a local "physical" hdisk while the other disk is an RPVclient hdisk.

The write to the local "physical" disk occurs in the usual fashion.

The write to the remote physical disk involves writing to the RPV client hdisk
which then encapsulating the block(s) received from LVM on to IP packets which
are then sent to the designated IP address used by the RPV server device at the
other site.

At the remote site, an RPV Server device strips off the IP packet header and trailer
and writes the data to the physical disk which has the designated PVID that is
stored on the RPV server device.

The direction of the mirrored writes (local to remote or remote to local) is handled
by placing the devices in either the available or defined states in order to turn them
on or off.

If an RPV client hdisk is in the available state it is capable of sending a copy of the
LVM writes to the other site.

If an RPV server device (rpvserverN) is in the available state, it is capable of
accepting write from the other site and in turn writing the data to its physical disk.

This control is necessary to prevent data divergence.

II. Example output from a working cluster

A) Resource group offline on both local and remote nodes with no cross site mirroring:

halabc1 /# clRGinfo
Cluster IPC error: The cluster manager on node halabc1 is in ST_INIT or NOT_CONFIGURED state and cannot process the IPC request.
halabc1 /# lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "@(#)36 1.135.1.119 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1509A_hacmp713 9/11/1"
build = "Sep 30 2015 20:26:58 1527C_hacmp713"

halabk1 /# lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "@(#)36 1.135.1.119 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1509A_hacmp713 9/11/1"
build = "Sep 30 2015 20:26:58 1527C_hacmp713"

halabc1 /# lsdev -C | grep rpv
rpvserver0 Defined Remote Physical Volume Server

halabc1 /# lsdev -c disk | grep hdisk23
hdisk23 Defined Remote Physical Volume Client

halabk1 /# lsdev -C | grep rpv
rpvserver0 Defined Remote Physical Volume Server

halabk1 /# lsdev -c disk | grep hdisk23
hdisk23 Defined Remote Physical Volume Client

(preferred state before starting HA/bringing resource group online)

B) Resource group online on the local node (halabc1) but mirroring to the remote node (halabk1) is
stopped:

halabc1 /# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
RG_A ONLINE halabc1@siteC
OFFLINE halabk1@siteK

halabc1 /# lssrc -ls clstrmgrES | head
Current state: ST_STABLE
sccsid = "@(#)36 1.135.1.119 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1509A_hacmp713 9/11/1"
build = "Sep 30 2015 20:26:58 1527C_hacmp713"
i_local_nodeid 0, i_local_siteid 1, my_handle 1
ml_idx[1]=0 ml_idx[3]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 15
local node vrmf is 7134
cluster fix level is "4"

halabk1 /# lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "@(#)36 1.135.1.119 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1509A_hacmp713 9/11/1"
build = "Sep 30 2015 20:26:58 1527C_hacmp713"

halabc1 /# lsdev -C | grep rpv
rpvserver0 Defined Remote Physical Volume Server

halabc1 /# lsdev -c disk | grep hdisk23
hdisk23 Available Remote Physical Volume Client

halabk1 /# lsdev -c disk | grep hdisk23
hdisk23 Defined Remote Physical Volume Client

halabk1 /# lsdev -C | grep rpv
rpvserver0 Defined Remote Physical Volume Server

C) Resource group online on the remote node (halabk1) but mirroring to the local node (halabc1)
stopped by stopping PowerHA:

halabk1 /# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
RG_A OFFLINE halabc1@siteC
ONLINE halabk1@siteK

halabk1 /# lssrc -ls clstrmgrES | head
Current state: ST_STABLE
sccsid = "@(#)36 1.135.1.119 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1509A_hacmp713 9/11/1"
build = "Sep 30 2015 20:26:58 1527C_hacmp713"
i_local_nodeid 1, i_local_siteid 2, my_handle 3
ml_idx[1]=0 ml_idx[3]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 15
local node vrmf is 7134
cluster fix level is "4"

halabc1 /# lssrc -ls clstrmgrES
Current state: ST_INIT
sccsid = "@(#)36 1.135.1.119 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1509A_hacmp713 9/11/1"
build = "Sep 30 2015 20:26:58 1527C_hacmp713"

halabc1 /# lsdev -C | grep rpv
rpvserver0 Defined Remote Physical Volume Server

halabc1 /# lsdev -c disk | grep hdisk23
hdisk23 Defined Remote Physical Volume Client

halabk1 /# lsdev -c disk | grep hdisk23
hdisk23 Available Remote Physical Volume Client

halabk1 /# lsdev -C | grep rpv
rpvserver0 Defined Remote Physical Volume Server

D) Resource group online on local node (halabc1) with mirroring to the remote node (halabk1)
running:

halabc1 /# clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
RG_A ONLINE halabc1@siteC
ONLINE SECONDARY halabk1@siteK

halabc1 /# lssrc -ls clstrmgrES | head
Current state: ST_STABLE
sccsid = "@(#)36 1.135.1.119 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1509A_hacmp713 9/11/1"
build = "Sep 30 2015 20:26:58 1527C_hacmp713"
i_local_nodeid 0, i_local_siteid 1, my_handle 1
ml_idx[1]=0 ml_idx[3]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 15
local node vrmf is 7134
cluster fix level is "4"

halabk1 /# lssrc -ls clstrmgrES | head
Current state: ST_STABLE
sccsid = "@(#)36 1.135.1.119 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1509A_hacmp713 9/11/1"
build = "Sep 30 2015 20:26:58 1527C_hacmp713"
i_local_nodeid 1, i_local_siteid 2, my_handle 3
ml_idx[1]=0 ml_idx[3]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 15
local node vrmf is 7134
cluster fix level is "4"

halabc1 /# lsdev -C | grep rpv
rpvserver0 Defined Remote Physical Volume Server

halabc1 /# lsdev -c disk | grep hdisk23
hdisk23 Available Remote Physical Volume Client

halabk1 /# lsdev -c disk | grep hdisk23
hdisk23 Defined Remote Physical Volume Client

halabk1 /# lsdev -C | grep rpv
rpvserver0 Available Remote Physical Volume Server

E) Resource group online on the remote node (halabk1) with mirroring to the local node (halabc1)
running:

halabk1 /# clRGinfo
----------------------------------------------------------------------------
Group Name Group State Node
----------------------------------------------------------------------------
RG_A ONLINE SECONDARY halabc1@siteC
ONLINE halabk1@siteK

halabc1 /# lssrc -ls clstrmgrES | head
Current state: ST_STABLE
sccsid = "@(#)36 1.135.1.119 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1509A_hacmp713 9/11/1"
build = "Sep 30 2015 20:26:58 1527C_hacmp713"
i_local_nodeid 0, i_local_siteid 1, my_handle 1
ml_idx[1]=0 ml_idx[3]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 15
local node vrmf is 7134
cluster fix level is "4"

halabk1 /# lssrc -ls clstrmgrES | head
Current state: ST_STABLE
sccsid = "@(#)36 1.135.1.119 src/43haes/usr/sbin/cluster/hacmprd/main.C,hacmp.pe,61haes_r713,1509A_hacmp713 9/11/1"
build = "Sep 30 2015 20:26:58 1527C_hacmp713"
i_local_nodeid 1, i_local_siteid 2, my_handle 3
ml_idx[1]=0 ml_idx[3]=1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 15
local node vrmf is 7134
cluster fix level is "4"

halabc1 /# lsdev -C | grep rpv
rpvserver0 Available Remote Physical Volume Server

halabc1 /# lsdev -c disk | grep hdisk23
hdisk23 Defined Remote Physical Volume Client

halabk1 /# lsdev -C | grep rpv
rpvserver0 Defined Remote Physical Volume Server
halabk1 /# lsdev -c disk | grep hdisk23
hdisk23 Available Remote Physical Volume Client

F) RPV device attributes:

The RPV client hdisks have an additional state besides merely available or defined.
These devices can also be in the available but stopped state as an added precaution
against data divergence. This is discussed in more detail below.

The RPV devices contain information to receive and direct their respective output.
In the case of the RPV server device, it receives its data from an IP address
and sends its output (after stripping off the IP packet) to its local hdisk.

The client_addr is the IP address used to reach the RPV client hdisk at the other
site. HA routinely removes this address if there is an error condition as part of
resource processing.

The RPV server directs its output to the hdisk which has the PVID as indicated
in the rpvs_pvid field.

halabc1 /# lsattr -El rpvserver0

auto_online n Configure at System Boot True
client_addr 10.10.30.41 Client IP Address True
rpvs_pvid 00c2d43a0eaf5af80000000000000000 Physical Volume Identifier True

In a somewhat similar fashion, the RPV client hdisk contains both the local IP
address as well as the IP address used to reach the RPV server at the other site:

halabc1 /#lsattr -El hdisk23

io_timeout 360 I/O Timeout Interval True
local_addr 10.10.30.31 Local IP Address (Network 1) True
local_addr2 none Local IP Address (Network 2) True
local_addr3 none Local IP Address (Network 3) True Local_addr4 none Local IP Address (Network 4) True
pvid 00c2d43a7be3b8260000000000000000 Physical Volume Identifier True
server_addr 10.10.30.41 Server IP Address (Network 1) True
server_addr2 none Server IP Address (Network 2) True
server_addr3 none Server IP Address (Network 3) True
server_addr4 none Server IP Address (Network 4) True

halabk1 /# lsattr -El rpvserver0
auto_online n Configure at System Boot True
client_addr 10.10.30.31 Client IP Address True
rpvs_pvid 00c2d43a7be3b8260000000000000000 Physical Volume Identifier True

halabk1 /# lsattr -El hdisk23
io_timeout 360 I/O Timeout Interval True
local_addr 10.10.30.41 Local IP Address (Network 1) True
local_addr2 none Local IP Address (Network 2) True
local_addr3 none Local IP Address (Network 3) True
local_addr4 none Local IP Address (Network 4) True
pvid 00c2d43a0eaf5af80000000000000000 Physical Volume Identifier True
server_addr 10.10.30.31 Server IP Address (Network 1) True
server_addr2 none Server IP Address (Network 2) True
server_addr3 none Server IP Address (Network 3) True
server_addr4 none Server IP Address (Network 4) True

Note that the rpvserver device points (via the PVID) to the local physical disk
(hdisk3 with ...b826).

halabk1 /# lspv | grep dbvg
hdisk3 00c2d43a7be3b826 dbvg active
hdisk23 00c2d43a0eaf5af8 dbvg active

Also note that the lspv output shows that the RPV client hdisk (hdisk23) has the
PVID of the opposite site's physical disk:

halabc1 /# lspv | grep dbvg
hdisk3 00c2d43a0eaf5af8 dbvg active
hdisk23 00c2d43a7be3b826 dbvg active

When the RPV devices are available and running, the RPV client hdisk will show
as active in the lsvg -p vgname output:

lsvg -p dbvg
dbvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk23 active 767 667 154..53..153..153..154
hdisk3 active 767 667 154..53..153..153..154

G) What happens when you stop HA

PowerHA places the RPV device(s) in the available or defined state via built-in pre
and post events during resource processing

If HA is stopped on the remote node down first, the RPV client hdisk (hdisk23)
in the shared VG, dbvg, still available but LVM marks it as missing from the volume
group because the remote RPV server device is no longer accepting writes:

lsvg -p dbvg
dbvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk3 active 767 667 154..53..153..153..154
hdisk23 missing 767 667 154..53..153..153..154

The logical volume shows stale partitions because the mirroring cannot be done:

lsvg -l dbvg
dbvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
dblv1 jfs2 100 200 2 open/stale /db

The dblv1 logical volume HAS to be configured as Superstrict so that the LVM mirroring
also accomplishes the remote replication:

lslv dblv1
LOGICAL VOLUME: dblv1 VOLUME GROUP: dbvg
LV IDENTIFIER: 00c2d43a00004c00000001530ee9eff0.1 PERMISSION: read/write
VG STATE: active/complete LV STATE: opened/stale
TYPE: jfs2 WRITE VERIFY: off
MAX LPs: 512 PP SIZE: 8 megabyte(s)
COPIES: 2 SCHED POLICY: parallel
LPs: 100 PPs: 200
STALE PPs: 2 BB POLICY: relocatable
INTER-POLICY: maximum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 32
MOUNT POINT: /db LABEL: /db
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes (superstrict)
Serialize IO ?: NO
INFINITE RETRY: no PREFERRED READ: 0

Two utilities, rpvstat and gmvgstat are provided to check the state and performance
of the GLVM replication:

rpvstat

Remote Physical Volume Statistics:

Comp Reads Comp Writes Comp KBRead Comp KBWrite Errors
RPV Client cx Pend Reads Pend Writes Pend KBRead Pend KBWrite
------------------ -- ----------- ----------- ------------ ------------ ------
hdisk23 0 0 0 0 0 0
0 0 0 0

The gmvgstat utility shows the TOTAL PPs and that we have 2 stale PPs:

gmvgstat
GMVG Name PVs RPVs Tot Vols St Vols Total PPs Stale PPs Sync
--------------- ---- ---- -------- -------- ---------- ---------- ----
dbvg 1 1 2 1 1534 2 99%

If the cluster is working properly, starting PowerHA cluster services on both the
local and the remote node should result in the remote mirroring starting and the
stale partitions will eventually be reduced to zero.

III.
A) Determine if HA working automatically can bring the GLVM replication online

Before starting PowerHA, ensure that all rpvserverN and RPV client hdisks are
in the defined state.

PowerHA EXPECTS the RPV devices to be in the defined state when is begins
to manage them:

rmdev -l hdisk23
rmdev -l rpvserver0

(remember NOT to use the -d option as this will remove the IP addresses and/or
PVID from the RPV devices)

If we start HA on the remote node, HA brings the RPV server device back to
the available state while leaving the RPV client hdisk in the defined state
so that this node can receive the data for the remote mirror:

rpvserver0 Available Remote Physical Volume Server

The rpv client hdisk is in the defined state since it is not used while the node is receiving the copy data:

hdisk23 Defined Remote Physical Volume Client

Then, when we start HA on the local node, (halabc1), the devices are in the opposite
available/defined states:

rpvserver0 Defined Remote Physical Volume Server

hdisk23 Available Remote Physical Volume Client

Using lsvg -p vgname and gmvgstat, we should be able to determine if the replication
is working.

B. How to manually start the RPV devices, LVM and filesystem components outside of PowerHA control.

Very careful consideration should be made before starting RPV devices outside of HA
control. This is because it is quite easy to cause data divergence or even catastrophic
data corruption if the replication is done from the wrong side.

Starting the mirror from the wrong side can result in writing old data over newer data
which results in the loss of the newer data.

Find out which node last had the resources.
If this is unknown, it may be possible to force (varyonvg -f) varyon the volume group,
(force needed due to the RPV client hdisk being missing) and mount the filesystem.

If this is carefully done at both nodes, the application owner may be able to determine
which node (and consequently which site's local disk) contains the most recent data.

In this example we will start node halabc1 first:

halabc1 /# lsattr -El hdisk23
io_timeout 360 I/O Timeout Interval True
local_addr 10.10.30.31 Local IP Address (Network 1) True
local_addr2 none Local IP Address (Network 2) True
local_addr3 none Local IP Address (Network 3) True
local_addr4 none Local IP Address (Network 4) True
pvid 00c2d43a7be3b8260000000000000000 Physical Volume Identifier True
server_addr 10.10.30.41 Server IP Address (Network 1) True
server_addr2 none Server IP Address (Network 2) True
server_addr3 none Server IP Address (Network 3) True
server_addr4 none Server IP Address (Network 4) True

Verify the local_addr and server_addr, and PVID fields are populated.

Do the same for the RPV server device at the other site:

halabk1 /# lsattr -El rpvserver0
auto_online n Configure at System Boot True
client_addr 10.10.30.31 Client IP Address True
rpvs_pvid 00c2d43a7be3b8260000000000000000 Physical Volume Identifier True

Documentation made when the cluster was working properly may be needed to
determine which IP address to configure if it is missing.

If needed this is done via the chdev command:

chdev -l hdisk23 -a server_addr=10.10.30.41

The same general format is used for any of the other fields on either RPV client
or RPV server devices.

Generally, it is better to start the RPV server device on the opposite node first:

halabk1 /# mkdev -l rpvserver0
rpvserver0 Available

Then, bring the RPV server client hdisk to the available state:

halabc1 /# mkdev -l hdisk23
hdisk23 Available

Check if you can read from the RPV client hdisk which means it is communicating
with the RPV server device on the opposite site which is in turn communicating
with its physical disk:

halabc1 /# lquerypv -h /dev/hdisk23 80 10
00000080 00C2D43A 7BE3B826 00000000 00000000 |...:{..&........|

If there is no output from the lquerypv command, and it is in the available state,
do the following to start the RPV client hdisk again:

halabc1 /# chdev -l hdisk23 -a resume=yes
hdisk23 changed

If all is well, the various devices and network/WAN will be working and the
lquerypv command will return output.

If the volume group is not already varied on, you can now do so:

varyonvg dbvg

Verify that LVM is working properly:

dbvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk3 active 767 667 154..53..153..153..154
hdisk23 active 767 667 154..53..153..153..154

If the volume group was left varied on while the RPV client hdisk went into the defined
state, LVM will mark the volume as missing:

halabc1 /# lsvg -p dbvg
dbvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk3 active 767 667 154..53..153..153..154
hdisk23 missing 767 667 154..53..153..153..154

Inform LVM the disk should be active:

halabc1 /# chpv -va hdisk23
halabc1 /# varyonvg dbvg

lsvg -p dbvg should now show both PVs in the active state (as shown in the "working properly" example above) and the filesystem(s) can be mounted.

Since the volume group is varied on in serial mode (varyonvg with no options), it will have to be
varied off plus the RPV devices placed in the defined state before PowerHA is started.

Tips

Example of a GLVM-replicated Enterprise Edition cluster

Troubleshooting

Problem

Symptom

Cause

Environment

Resolving The Problem

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?