Integrating IBM GPFS on IBM AIX with IBM Storwize V7000 Unified Global Mirror and IBM FlashCopy

A disaster recovery solution replicating GPFS structures on AIX

The author presents a guide on how to use the advanced copy features of the new IBM® Storwize V7000® Unified storage system to build a disaster recovery (DR) solution for an IBM General Parallel File System (IBM GPFS™) cluster on the IBM AIX® operating system. The concepts of integrating a storage unit-based replication with the GPFS software are introduced and exemplified.

Rogerio de Rizzio (rrizzio@br.ibm.com), Advisory Software Engineer , IBM

Rogerio Rizzio PhotoRogerio de Rizzio is an Advisory Software Engineer working at IBM Brazil Software Lab in the Core Engineering team for IBM Smart Analytics Systems. He holds a B.S. degree in Physics from the Universidade de São Paulo (USP), a Masters’ degree in Computer Networks, and an MBA in information technology. He has 15 years of experience working with UNIX and Linux, storage, high availability and disaster recovery (HA/DR), and databases.You can reach him at rrizzio@br.ibm.com



17 August 2012

Also available in Chinese Russian

Solution overview

We can consider a typical GPFS cluster, in which a pair of nodes are the redundant managers of a given file system and all the other servers in the cluster mount that file system as clients. In Figure 1, only the GPFS managers for the primary and secondary sites are represented. The data that needs to be replicated from all the servers in the primary GPFS cluster are stored in a shared GPFS mounted on the /repvol directory on each server. The servers named server03 and server04 are the GPFS managers of this file system and also they are the Network Shared Disk (NSD) servers of the underlying shared disk. The underlying shared disk is built using a zero percent thin-provisioned virtual disk (VDisk) inside the Storwize V7000 Unified storage system.

To keep the configuration symmetric between the primary and secondary sites, two thin-provisioned VDisks of 200 GB are created inside the Storwize V7000 Unified system for each site. At the primary site, one of the VDisks is kept idle, while the other Vdisk is presented to the servers server03 and server04 through the storage area network (SAN), on which the /repvol GPFS is built. This VDisk is replicated to the secondary site using the Storwize V7000 Unified Global Mirror feature, which is an asynchronous type of replication. The peer VDisk at the secondary site is never presented to the servers named server05 and server06. Instead, it is used as a source for an incremental IBM Tivoli® Storage FlashCopy® Manager image, which is refreshed from time to time and mounted on the /repvol directory on these servers. The FlashCopy image is created using the second thin-provisioned VDisk of 200 GB. Every time the /repvol directory is mounted at the secondary, the data contained in it can be used for all the servers in this cluster. The physical connection between the primary and secondary sites is made by configuring E_Ports in the SAN switches and interconnecting them using a link of low latency, such as dense wavelength division multiplexing (DWDM) or dark fiber.

Figure 1. A representation of the configuration used throughout this article

Names used in the procedures

To make the following procedures clearer to understand, instead of using generic statements, some specific names are adopted:

Primary Storwize V7000 name: V7000_7
Primary Storwize V7000 VDisk names: mdisk1_vdiskP1, mdisk1_vdiskP2
Primary Foundation Modules host names: server03 server04

Secondary Storwize V7000 name: V7000_6
Secondary Storwize V7000 VDisk names: mdisk1_vdiskS1, mdisk1_vdiskS2
Secondary Foundation Modules host names: server05 server06

Along with the procedures described ahead, also pay attention to the prompts. The strings in the prompts show where you have to run the commands.

Creating the fundamental components

This procedure creates the necessary components (which are the VDisks inside the Storwize V7000 Unified storage system at the primary and secondary sites), present them to the respective servers and creates one GPFS NSD and file system that will be replicated.

Recalling from the Storwize V7000 Unified documentation, a managed disk (MDisk) is a logical entity representing a physical storage capacity, which can be an array of direct attached drives configured in a given Redundant Array of Independent Disks (RAID) format or logical unit numbers (LUNs) from another storage system that are accessible through the SAN. In order to use the storage capacity provided by the MDisk, it has to be inserted in to a storage pool (MDisk group). The main property of a given storage pool is that it divides all the MDisks that belong to it in to extents of the same size. Extent sizes can be 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, or 8192 MB. A Storwize V7000 Unified system can manage 2^22 extents. For example, with a 16 MB extent size, the system can manage up to: 16 MB multiplied by 4,194,304 = 64 TB of storage.

The extents from a storage pool are grouped to form a volume, also named VDisk. The Storwize V7000 Unified system maps the VDisks to the hosts (servers). MDisks and drives are never seen by the hosts.

So, assuming that we have already defined an MDisk group (storage pool), represented by the ID 1, we proceed with available free extents to create the VDisks at the size we require.

At the primary site, create the VDisks and map them to the hosts:

IBM_2076:V7000_7:superuser> mkvdisk -mdiskgrp 1 -iogrp 0 -size 200
-unit gb -rsize 0% -autoexpand -name mdisk1_vdiskP1
                                  
IBM_2076:V7000_7:superuser> mkvdisk -mdiskgrp 1 -iogrp 0 -size 200
-unit gb -rsize 0% -autoexpand -name mdisk1_vdiskP2
                                     
IBM_2076:V7000_7:superuser> mkvdiskhostmap -force -host server03
mdisk1_vdiskP2
                
IBM_2076:V7000_7:superuser> mkvdiskhostmap -force -host server04
mdisk1_vdiskP2

The VDisks are created having 200 GB in size, but no extent is allocated at the moment of the creation, as represented by the argument -rsize 0%. The extents are allocated on demand, but the hosts see these volumes as if they had the full capacity. This is what features a thin-provisioned volume.

As we intent to create a GPFS NSD using mdisk1_vdiskP2, which means that this VDisk has to be accessible simultaneously from server03 and server04, the -force option is necessary in the mkvdiskhostmap command.

After the mapping inside the Storwize V7000 Unified system is done, the AIX in the servers has to perform a rescan in order to identify the new disk.

server03# cfgmgr -s                
server04# cfgmgr -s

To discover which AIX hdisk corresponds to the mapped VDisk, first we need to list the unique ID of the VDisk:

IBM_2076:V7000_7:superuser>lsvdisk disk1_vdiskP2
vdisk_UID 60050768028282292800000000000003

Then, we can use a simple script (as shown in the following code) to find the referred vdisk_UID among all the hdisks identified by the AIX operating system:

*********************

#!/bin/ksh
 
USAGE="usage :
get_hdisk_ids.ksh <first hdisk number> <last hdisk
number>"
 if (($# != 2))
 then
 print "$USAGE"
 exit 1
 fi
                
 i=$1
 last=$2
               
 while ((i <= last))
 do
 print -n "hdisk$i"
 /usr/sbin/lsattr -El hdisk$i -a unique_id | awk '{print $2}'
 ((i += 1))
 done
*********************
                
server03# ./get_hdisk_ids.ksh 2 125 |grep
60050768028282292800000000000003
hdisk101 33213 60050768028282292800000000000003 04214503IBMfcp

So, at server03, mdisk1_vdiskP2 is recognized as hdisk101. At server04, mdisk1_vdiskP2 is recognized as hdisk103.

We proceed the same way at the secondary site, creating the VDisks:

IBM_2076:V7000_6:superuser>mkvdisk -mdiskgrp 1 -iogrp 0 -size 200
-unit gb -rsize 0% -autoexpand -name mdisk1_vdiskS1

IBM_2076:V7000_6:superuser>mkvdisk -mdiskgrp 1 -iogrp 0 -size 200
-unit gb -rsize 0% -autoexpand -name mdisk1_vdiskS2

IBM_2076:V7000_6:superuser> mkvdiskhostmap -force -host server05
mdisk1_vdiskS2

IBM_2076:V7000_6:superuser> mkvdiskhostmap -force -host server06
mdisk1_vdiskS2
server05# cfgmgr -s
server06# cfgmgr -s

At server05, mdisk1_vdiskS2 is recognized as hdisk142. At server06, mdisk1_vdiskS2 is recognized as hdisk158.

At the primary site, using a description file, named DescFile, as in the following code, create the GPFS NSD and file system:

server03# cat DescFile
hdisk101:server03,server04::dataAndMetadata::nsd_repvol:
 
server03# mmcrnsd -F DescFile                
 
server03# cp DescFile DescFileFS
server03# cat DescFileFS
# hdisk101:server03,server04::dataAndMetadata::nsd_repvol:
nsd_repvol:server03:server04:dataAndMetadata:4011::system
 
server03# mmcrfs /repvol repvolfs -F DescFileFS -B 64k
 
server03# mmmount repvolfs -N "server03,server04"

At this point, we have a GPFS created and mounted on the /repvol directory.

Establishing the relationship between the primary and secondary sites for the first time

The servers at the secondary site (server05, server06) have to be reachable through an IP network to the servers at the primary site (server03, server04).

  1. Verify that the /repvol directory is mounted in both the primary servers:
    server03# mmlsnsd
    File system Disk name NSD servers
    ------------------------------------------ repvolfs nsd_repvol server03,server04 server03# mmlsmount repvolfs -L File system repvolfs is mounted on 2 nodes: 172.23.1.15 server04 172.23.1.14 server03

    Propagate the information related to the GPFS repvolfs file system to the servers at the secondary site:


  2. server03# mmfsctl repvolfs syncFSconfig -n targetNodesFile
    server03# cat targetNodesFile
    server05
    server06

    The Storwize V7000 Unified storage units at the primary and secondary sites must reach each other through the extended SAN:
    IBM_2076:V7000_6:superuser>lspartnershipcandidate
    id configured name
    00000200A0208A2C no V7000_7             
                    
    IBM_2076:V7000_7:superuser>lspartnershipcandidate
    id configured name
    00000200A0A08A4A no V7000_6
                    
    IBM_2076:V7000_6:superuser>mkpartnership -bandwidth 2000 V7000_7
    
    IBM_2076:V7000_6:superuser>lspartnership
    id name location partnership bandwidth
    00000200A0A08A4A V7000_6 local
    00000200A0208A2C V7000_7 remote fully_configured 2000
                    
    IBM_2076:V7000_7:superuser>mkpartnership -bandwidth 2000 V7000_6
                    
    IBM_2076:V7000_7:superuser>lspartnership
    id name location partnership bandwidth
    00000200A0208A2C  V7000_7 local
    00000200A0A08A4A V7000_6 remote   fully_configured 2000
  3. Establish a Global Mirror replication between a VDisk in the primary site and a VDisk in the secondary site:
    IBM_2076:V7000_7:superuser>mkrcrelationship -master mdisk1_vdiskP2 -aux 
    mdisk1_vdiskS1 -cluster V7000_6 -global
    IBM_2076:V7000_7:superuser>startrcrelationship rcrel1
  4. Create the incremental FlashCopy image at the secondary Storwize V7000 Unified system:
    IBM_2076:V7000_6:superuser> mkfcmap -source mdisk1_vdiskS1 -target
    mdisk1_vdiskS2 -copyrate 0 -cleanrate 0
  5. At the primary site, flush and suspend the I/O on the /repvol file system:
    server03# mmfsctl repvolfs suspend
  6. At the secondary site, verify that the replication is in the consistent_synchronized state:
    IBM_2076:V7000_6:superuser>lsrcrelationship
    id name
    master_cluster_id master_cluster_name master_vdisk_id
    master_vdisk_name aux_cluster_id aux_cluster_name aux_vdisk_id
    aux_vdisk_name primary consistency_group_id consistency_group_name
    state bg_copy_priority progress copy_type cycling_mode
    5  rcrel1 00000200A0A08A4A  V7000_7 5
    mdisk1_vdiskP2     00000200A0208A2C   V7000_6 5 mdisk1_vdiskS1 
    master  consistent_synchronized       
    50 global none
  7. At the secondary site, start the FlashCopy image:
    IBM_2076:V7000_6:superuser> startfcmap -prep fcmap1
    IBM_2076:V7000_6:superuser>lsfcmap

    The state must be copying. The copy progress remains at zero
    percent for incremental FlashCopy images (snapshot).
  8. At the primary site, resume the I/O for the /repvol file system:
    server03# mmfsctl repvolfs resume
  9. At the secondary site, update the GPFS NSD information:
    server05# mmchnsd "nsd_repvol:server05,server06"
    server05# mmmount repvolfs -o ro -N "server05,server06"

    At this point, it is possible to access the data in /repvol at the secondary site for the first time.

Periodic refreshing of the data at the secondary site

When both the sites are active, the replicated data can be updated for use at the secondary site by performing the following steps.

  1. Unmount the /repvol at the secondary site:
    server05# mmumount repvolfs -N "server05,server06"
  2. At the primary site, flush and suspend the I/O on the /repvol file system:
      server03# mmfsctl repvolfs suspend

    At the secondary, verify if the replication is in the consistent_synchronized state:
    IBM_2076:V7000_6:superuser>lsrcrelationship
    id name
    master_cluster_id master_cluster_name master_vdisk_id master_vdisk_name aux_cluster_id aux_cluster_name aux_vdisk_id aux_vdisk_name primary consistency_group_id consistency_group_name state bg_copy_priority progress copy_type cycling_mode 5 rcrel1 00000200A0A08A4A V7000_7 5 mdisk1_vdiskP2 00000200A0208A2C V7000_6 5 mdisk1_vdiskS1 master consistent_synchronized 50 global none
  3. At the secondary site, refresh the incremental FlashCopy image:
    IBM_2076:V7000_6:superuser>stopfcmap fcmap1
    IBM_2076:V7000_6:superuser>startfcmap -prep fcmap1
  4. At the primary site, resume the I/O on the /repvol file system:
    server03# mmfsctl repvolfs resume
  5. At the secondary site, mount /repvol:
    server05# mmmount repvolfs -o ro -N "server05,server06"

Accessing the data at the secondary site when the primary site is inactive

If the primary site becomes inactive, unmount the /repvol at the secondary site:

server05# mmumount repvolfs -N "server05,server06"

At the secondary site, even if the replication is in the consistent_synchronized state, this does not mean that the files in the replicated volume are consistent. When the primary site became inactive, and if that happened by accident, some files that were being written into the /repvol file system might be corrupted. Considering that the replication occurs at the physical level (at the VDisk level), the VDisks at the primary and secondary sites might be sector-by-sector equal, but its logical contents, the files in it, might be inconsistent. Therefore, their integrity have to be verified before use.

At the secondary site, refresh the incremental FlashCopy image:

IBM_2076:V7000_6:superuser>stopfcmap fcmap1
IBM_2076:V7000_6:superuser>startfcmap -prep fcmap1

At the secondary site, mount the /repvol file system:

server05# mmmount repvolfs -o ro -N "server05,server06"

Now, before using the files stored in /repvol, check their consistency by using the tools relevant to the respective applications. For instance, if you are replicating database-archived transaction logs, usually the databases provide the tools to verify the integrity of those files.

Product versions

The IBM product versions used to create this article were:

  • AIX 7.1
  • GPFS 3.4.0.9
  • Storwize V7000 Unified storage – code_level 6.3.0.0

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=830360
ArticleTitle= Integrating IBM GPFS on IBM AIX with IBM Storwize V7000 Unified Global Mirror and IBM FlashCopy
publish-date=08172012