Creating new primary-secondary filesets, failover and failback to old primary

This use case describes creating new primary-secondary filesets, checking RPO snapshot creation after RPO timeout, failing over to secondary, and failing back to old primary.

For this scenario, the secondary's file system must be remote mounted via cross-cluster mount.
Note: Parallel data transfer is not functional during a scenario of failover and failback.

You might choose to run md5sum or any other third-party utility to check consistency of the migrated files.

  1. Create primary using the mmcrfileset command:

    mmcrfileset fs1 drp12 --inode-space=new -p afmtarget=gpfs:///gpfs/remotefs1/drs12 -p afmmode=primary --inode-limit=1024000 -p afmAsyncDelay=15 -p afmRPO=720

    
        Fileset drp12 created with id 23 root inode 7864323.
        Primary Id (afmPrimaryID) 14228043454022319638-C0A8037555D2994D-23
    
    
        Primary:
    
    mmlsfileset fs1 drp12 -L --afm
        Filesets in file system 'fs1':
    
        Attributes for fileset drp12:
        ==============================
        Status                                  Unlinked
        Path                                    --
        Id                                      23
        Root inode                              7864323
        Parent Id                               --
        Created                                 Tue Aug 25 01:16:06 2015
        Comment
        Inode space                             15
        Maximum number of inodes                1024000
        Allocated inodes                        500736
        Permission change flag                  chmodAndSetacl
        afm-associated                          Yes
        Target                                  gpfs:///gpfs/remotefs1/drs12
        Mode                                    primary
        Async Delay                             15
        Recovery Point Objective                15 minutes
        Last pSnapId                            0
        Number of Gateway Flush Threads         4
        Primary Id                              14228043454022319638-C0A8037555D2994D-23
    
  2. Create secondary on the secondary cluster:

    /usr/lpp/mmfs/bin/mmcrfileset fs1 drs12 --inode-space=new --inode-limit=1024000 -p afmMode=secondary -p afmPrimaryID=14228043454022319638-CA8037555D2994D-23

    
        Fileset drs12 created with id 43 root inode 11010051.
    
    
    /usr/lpp/mmfs/bin/mmlinkfileset fs1 drs12 -J /fs1/drs12
        Fileset drs11 linked at /fs1/drs12
    
    
    /usr/lpp/mmfs/bin/mmlsfileset fs1 drs12 -L --afm
    
        Filesets in file system 'fs1':
    
        Attributes for fileset drs12:
        ==============================
        Status                                  Linked
        Path                                    /fs1/drs12
        Id                                      43
        Root inode                              11010051
        Parent Id                               0
        Created                                 Tue Aug 25 01:26:51 2015
        Comment
        Inode space                             21
        Maximum number of inodes                1024000
        Allocated inodes                        501504
        Permission change flag                  chmodAndSetacl
        afm-associated                          Yes
        Associated Primary ID                   14228043454022319638-C0A8037555D2994D-23
        Mode                                    secondary
        Last pSnapId                            0
    
  3. Link primary to create psnap0:

    mmlinkfileset fs1 drp12 -J /fs1/drp12

    
        Fileset drp12 linked at /fs1/drp12
        First snapshot name is psnap0-rpo-C0A8037555D2994D-23
        Flushing dirty data for snapshot drp12::psnap0-rpo-C0A8037555D2994D-23...
        Quiescing all file system operations.
        Snapshot drp12::psnap0-rpo-C0A8037555D2994D-23 created with id 36.
    
        Primary State:
    
    
    mmafmctl fs1 getstate -j drp12
        Fileset Name  Fileset Target              Cache State    Gateway Node   Queue Length   Queue numExec
        ------------  --------------             -------------   ------------   ------------   -------------
        drp12        gpfs:///gpfs/remotefs1/drs12   Active         c3m3n06          0              2
    
  4. Create data from primary and see that it goes to secondary:

    /usr/lpp/mmfs/samples/perf/gpfsperf create seq /fs1/drp12/file_pri_1

    
          recSize 10K nBytes 100M fileSize 100M
          nProcesses 1 nThreadsPerProcess 1
          file cache flushed before test
          not using direct I/O
          offsets accessed will cycle through the same file segment
          not using shared memory buffer
          not releasing byte-range token after open
          no fsync at end of test
            Data rate was 24599.12 Kbytes/sec, thread utilization 0.979
    
    
    /usr/lpp/mmfs/samples/perf/gpfsperf create seq /fs1/drp12/file_pri_2
    
          recSize 10K nBytes 100M fileSize 100M
          nProcesses 1 nThreadsPerProcess 1
          file cache flushed before test
          not using direct I/O
          offsets accessed will cycle through the same file segment
          not using shared memory buffer
          not releasing byte-range token after open
          no fsync at end of test
            Data rate was 25954.27 Kbytes/sec, thread utilization 0.999
    
        Primary contents:
    
    
    ls -l /fs1/drp12
    
        total 204800
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
    
        Secondary contents:
    
    
    ls -l /fs1/drs12
    
        total 409600
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
    
  5. Check psnap after the time set as RPO interval passes:

    mmlssnapshot fs1 -j drp12

    
        Snapshots in file system fs1:
        Directory                SnapId    Status  Created                   Fileset
        psnap0-rpo-C0A8037555D2994D-23 36        Valid   Tue Aug 25 01:27:45 2015  drp12
        psnap-rpo-C0A8037555D2994D-23-15-08-25-01-41-51 37        Valid   Tue Aug 25 13:27:45 2015  drp12
    
        Create more data at primary:
    
    
    /usr/lpp/mmfs/samples/perf/gpfsperf create seq /fs1/drp12/file_pri_3
    
          recSize 10K nBytes 100M fileSize 100M
          nProcesses 1 nThreadsPerProcess 1
          file cache flushed before test
          not using direct I/O
          offsets accessed will cycle through the same file segment
          not using shared memory buffer
          not releasing byte-range token after open
          no fsync at end of test
            Data rate was 21322.77 Kbytes/sec, thread utilization 0.978
    
    
    /usr/lpp/mmfs/samples/perf/gpfsperf create seq /fs1/drp12/file_pri_4
    
          recSize 10K nBytes 100M fileSize 100M
          nProcesses 1 nThreadsPerProcess 1
          file cache flushed before test
          not using direct I/O
          offsets accessed will cycle through the same file segment
          not using shared memory buffer
          not releasing byte-range token after open
          no fsync at end of test
            Data rate was 21617.21 Kbytes/sec, thread utilization 1.000
    
        Primary State:
    
    
    mmafmctl fs1 getstate -j drp12
    
        Fileset Name   Fileset Target                Cache State     Gateway Node   Queue Length   Queue numExec
        ------------   --------------              -------------    ------------    ------------   -------------
        drp12        gpfs:///gpfs/remotefs1/drs12       Dirty           c3m3n06            4           40963
    
    Note: The second RPO snapshot is not triggered.
  6. Unlink primary feigning primary going down:

    /usr/lpp/mmfs/bin/mmunlinkfileset fs1 drp12 -f

    Fileset drp12 unlinked.
  7. Failover - convert secondary to acting primary. Run on the secondary.

    /usr/lpp/mmfs/bin/mmafmctl fs1 failoverToSecondary -j drs12 --restore

    
        mmafmctl: failoverToSecondary restoring from psnap psnap-rpo-C0A8037555D2994D-23-15-08-25-01-41-51
        [2015-08-25 01:48:07] Restoring fileset "drs12" from snapshot "psnap-rpo-C0A8037555D2994D-23-15-08-25-01-41-51" of filesystem "/dev/f1"
    
        [2015-08-25 01:48:09] Scanning inodes, phase 1 ...
        [2015-08-25 01:48:25] 11511552 inodes have been scanned, 50% of total.
        [2015-08-25 01:48:41] 23023104 inodes have been scanned, 100% of total.
        [2015-08-25 01:48:41] Constructing operation list, phase 2 ...
        [2015-08-25 01:48:41] 0 operations have been added to list.
        [2015-08-25 01:48:41] 2 operations have been added to list.
        [2015-08-25 01:48:41] Deleting the newly created files, phase 3 ...
        [2015-08-25 01:48:42] Deleting the newly created hard links, phase 4 ...
        [2015-08-25 01:48:43] Splitting clone files, phase 5 ...
        [2015-08-25 01:48:43] Deleting the newly created clone files, phase 6 ...
        [2015-08-25 01:48:44] Moving files, phase 7 ...
        [2015-08-25 01:48:45] Reconstructing directory tree, phase 8 ...
        [2015-08-25 01:48:46] Moving files back to their correct positions, phase 9 ...
        [2015-08-25 01:48:46] Re-creating the deleted files, phase 10 ...
        [2015-08-25 01:48:47] Re-creating the deleted clone parent files, phase 11 ...
        [2015-08-25 01:48:48] Re-creating the deleted clone child files, phase 12 ...
        [2015-08-25 01:48:49] Re-creating the deleted hard links, phase 13 ...
        [2015-08-25 01:48:50] Restoring the deltas of changed files, phase 14 ...
        [2015-08-25 01:48:50] Restoring the attributes of files, phase 15 ...
        [2015-08-25 01:48:51] Restore completed successfully.
        [2015-08-25 01:48:51] Clean up.
        Primary Id (afmPrimaryID) 5802564250705647455-C0A8286F55B0E3EE-43
        Fileset drs12 changed.
        Promoted fileset drs12 to Primary
    
        Acting primary (Please note that target is blank in mmlsfileset output):
    
    
    /usr/lpp/mmfs/bin/mmlsfileset fs1 drs12 -L --afm
    
        Filesets in file system 'fs1':
    
        Attributes for fileset drs12:
        ==============================
        Status                                  Linked
        Path                                    /fs1/drs12
        Id                                      43
        Root inode                              11010051
        Parent Id                               0
        Created                                 Tue Aug 25 01:26:51 2015
        Comment
        Inode space                             21
        Maximum number of inodes                1024000
        Allocated inodes                        501504
        Permission change flag                  chmodAndSetacl
        afm-associated                          Yes
        Target                                  --
        Mode                                    primary
        Async Delay                             15 (default)
        Recovery Point Objective                disable
        Last pSnapId                            0
        Number of Gateway Flush Threads         4
        Primary Id                              5802564250705647455-C0A8286F55B0E3EE-43
    
        Acting primary contents after failover:
    
    
    ls -l /fs1/drs12
        total 409600
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
    
  8. Create sample data from acting primary:

    /usr/lpp/mmfs/samples/perf/gpfsperf create seq /fs1/drs12/file_actingpri_1

    
          recSize 10K nBytes 100M fileSize 100M
          nProcesses 1 nThreadsPerProcess 1
          file cache flushed before test
          not using direct I/O
          offsets accessed will cycle through the same file segment
          not using shared memory buffer
          not releasing byte-range token after open
          no fsync at end of test
            Data rate was 48282.25 Kbytes/sec, thread utilization 0.991
    
    
    /usr/lpp/mmfs/samples/perf/gpfsperf create seq /fs1/drs12/file_actingpri_2
    
          recSize 10K nBytes 100M fileSize 100M
          nProcesses 1 nThreadsPerProcess 1
          file cache flushed before test
          not using direct I/O
          offsets accessed will cycle through the same file segment
          not using shared memory buffer
          not releasing byte-range token after open
          no fsync at end of test
            Data rate was 58104.41 Kbytes/sec, thread utilization 0.999
    
        Contents from acting primary:
    
    
    ls -l /fs1/drs12/
    
        total 812544
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_2
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
    
  9. Link old primary when it is back

    mmlinkfileset fs1 drp12 -J /fs1/drp12

                 Fileset drp12 linked at /fs1/drp12
  10. Run failback start on old primary:

    mmafmctl fs1 failbackToPrimary -j drp12 --start

    
        Fileset drp12 changed.
        mmafmctl: failbackToPrimary restoring from psnap psnap-rpo-C0A8037555D2994D-23-15-08-25-01-41-51
        [2015-08-25 01:50:52] Restoring fileset "drp12" from snapshot "psnap-rpo-C0A8037555D2994D-23-15-08-25-01-41-51" of filesystem "/dev/f1"
    
        [2015-08-25 01:50:54] Scanning inodes, phase 1 ...
        [2015-08-25 01:51:03] 8365056 inodes have been scanned, 50% of total.
        mmlssn[2015-08-25 01:51:12] 16730112 inodes have been scanned, 100% of total.
        [2015-08-25 01:51:12] Constructing operation list, phase 2 ...
        [2015-08-25 01:51:12] 0 operations have been added to list.                  2015-08-25 01:51:12] 2 operations have been added to list.
        [2015-08-25 01:51:12] Deleting the newly created files, phase 3 ...         [201-08-25 01:51:13] Deleting the newly created hard links, phase 4 ...
        [2015-08-25 01:51:13] Splitting clone files, phase 5 ...
        [2015-08-25 01:51:14] Deleting the newly created clone files, phase 6 ...
        [2015-08-25 01:51:15] Moving files, phase 7 ...
        [2015-08-25 01:51:16] Reconstructing directory tree, phase 8 ...
        [2015-08-25 01:51:16] Moving files back to their correct positions, phase 9 ...
        [2015-08-25 01:51:17] Re-creating the deleted files, phase 10 ...
        [2015-08-25 01:51:18] Re-creating the deleted clone parent files, phase 11 ...
        [2015-08-25 01:51:18] Re-creating the deleted clone child files, phase 12 ...
        [2015-08-25 01:51:19] Re-creating the deleted hard links, phase 13 ...
        [2015-08-25 01:51:20] Restoring the deltas of changed files, phase 14 ...
        [2015-08-25 01:51:21] Restoring the attributes of files, phase 15 ...
        [2015-08-25 01:51:21] Restore completed successfully.
        [2015-08-25 01:51:21] Clean up.
    
        Primary contents:
    
    
    ls -l /fs1/drp12
    
        total 204800
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
    
  11. Create more data on acting primary:

    /usr/lpp/mmfs/samples/perf/gpfsperf create seq /fs1/drs12/file_actingpri_3

    
          recSize 10K nBytes 100M fileSize 100M
          nProcesses 1 nThreadsPerProcess 1
          file cache flushed before test
          not using direct I/O
          offsets accessed will cycle through the same file segment
          not using shared memory buffer
          not releasing byte-range token after open
          no fsync at end of test
            Data rate was 51110.26 Kbytes/sec, thread utilization 0.996
    
  12. Run applyUpdates to sync up primary to acting primary:

    mmafmctl fs1 applyUpdates -j drp12

    
        [2015-08-25 01:51:39] Getting the list of updates from the acting Primary...
        [2015-08-25 01:52:21] Applying the 9 updates...
        [2015-08-25 01:52:25] 9 updates have been applied, 100% of total.
        mmafmctl: Creating the failback psnap locally. failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-51-38
        Flushing dirty data for snapshot drp12::failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-51-38...
        Quiescing all file system operations.
        Snapshot drp12::failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-51-38 created with id 38.
    
        Primary contents:
    
    
    ls -l /fs1/drp12
    
        total 512000
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_2
        -rw-r--r-- 1 root root 104857600 Aug 25 01:51 file_actingpri_3
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
    
  13. Create more data on acting primary to show applications continue and then applications stop:

    /usr/lpp/mmfs/samples/perf/gpfsperf create seq /fs1/drs12/file_actingpri_4

    
          recSize 10K nBytes 100M fileSize 100M
          nProcesses 1 nThreadsPerProcess 1
          file cache flushed before test
          not using direct I/O
          offsets accessed will cycle through the same file segment
          not using shared memory buffer
          not releasing byte-range token after open
          no fsync at end of test
            Data rate was 54504.44 Kbytes/sec, thread utilization 0.991
    
        Acting primary contents:
    
    
    ls -l /fs1/drs12
    
        total 1227264
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_2
        -rw-r--r-- 1 root root 104857600 Aug 25 01:51 file_actingpri_3
        -rw-r--r-- 1 root root 104857600 Aug 25 01:52 file_actingpri_4
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
    
  14. Run last applyUpdates on old primary after applications stop.

    mmafmctl fs1 applyUpdates -j drp12

    
        [2015-08-25 01:52:43] Getting the list of updates from the acting Primary...
        [2015-08-25 01:53:25] Applying the 3 updates...
        [2015-08-25 01:53:26] 3 updates have been applied, 100% of total.
        mmafmctl: Creating the failback psnap locally. failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-52-42
        Flushing dirty data for snapshot drp12::failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-52-42...
        Quiescing all file system operations.
        Snapshot drp12::failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-52-42 created with id 39.
        mmafmctl: Deleting the old failback psnap. failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-51-38
        Invalidating snapshot files in drp12::failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-51-38...
        Deleting files in snapshot drp12::failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-51-38...
         100.00 % complete on Tue Aug 25 01:53:26 2015  (    500736 inodes with total          0 MB data processed)
        Invalidating snapshot files in drp12::failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-51-38/F/...
        Delete snapshot drp12::failback-psnap-rpo-C0A8037555D2994D-23-15-08-25-01-51-38 complete, err = 0
    
        Primary contents:
    
    
    ls -l /fs1/drp12
    
        total 614400
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_2
        -rw-r--r-- 1 root root 104857600 Aug 25 01:51 file_actingpri_3
        -rw-r--r-- 1 root root 104857600 Aug 25 01:52 file_actingpri_4
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
    
    
  15. Complete failback process on old primary:

    mmafmctl fs1 failbackToPrimary -j drp12 --stop

                 Fileset drp12 changed.
  16. Convert the acting primary back to secondary and establish the relationship again:

    /usr/lpp/mmfs/bin/mmunlinkfileset fs1 drs12 -f

    
        Fileset drs12 unlinked.
    
    
    /usr/lpp/mmfs/bin/mmchfileset fs1 drs12 -p afmmode=drs,
    
        afmPrimaryID=14228043454022319638-C0A8037555D2994D-23
        Fileset drs12 changed.
    
    
    /usr/lpp/mmfs/bin/mmlinkfileset fs1 drs12 -J /fs1/drs12
    
        Fileset drs11 linked at /fs1/drs12
    
        Primary contents:
    
    
    ls -l /fs1/drp12
    
        total 614400
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_2
        -rw-r--r-- 1 root root 104857600 Aug 25 01:51 file_actingpri_3
        -rw-r--r-- 1 root root 104857600 Aug 25 01:52 file_actingpri_4
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
    
        Secondary contents:
    
    
    ls -l /fs1/drs12
    
        total 1228800
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_2
        -rw-r--r-- 1 root root 104857600 Aug 25 01:51 file_actingpri_3
        -rw-r--r-- 1 root root 104857600 Aug 25 01:52 file_actingpri_4
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
    
  17. Create data from failed back primary:

    /usr/lpp/mmfs/samples/perf/gpfsperf create seq /fs1/drp12/file_pri_5

    
          recSize 10K nBytes 100M fileSize 100M
          nProcesses 1 nThreadsPerProcess 1
          file cache flushed before test
          not using direct I/O
          offsets accessed will cycle through the same file segment
          not using shared memory buffer
          not releasing byte-range token after open
          no fsync at end of test
            Data rate was 21235.12 Kbytes/sec, thread utilization 0.985
    
    
    /usr/lpp/mmfs/samples/perf/gpfsperf create seq /fs1/drp12/file_pri_6
    
          recSize 10K nBytes 100M fileSize 100M
          nProcesses 1 nThreadsPerProcess 1
          file cache flushed before test
          not using direct I/O
          offsets accessed will cycle through the same file segment
          not using shared memory buffer
          not releasing byte-range token after open
          no fsync at end of test
            Data rate was 22658.18 Kbytes/sec, thread utilization 1.000
    
        Primary contents:
    
    
    ls -l /fs1/drp12
    
        total 819200
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_2
        -rw-r--r-- 1 root root 104857600 Aug 25 01:51 file_actingpri_3
        -rw-r--r-- 1 root root 104857600 Aug 25 01:52 file_actingpri_4
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
        -rw-r--r-- 1 root root 104857600 Aug 25 01:55 file_pri_5
        -rw-r--r-- 1 root root 104857600 Aug 25 01:55 file_pri_6
    
        Secondary contents:
    
    
    ls -l /fs1/drs12
    
        total 1638400
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:49 file_actingpri_2
        -rw-r--r-- 1 root root 104857600 Aug 25 01:51 file_actingpri_3
        -rw-r--r-- 1 root root 104857600 Aug 25 01:52 file_actingpri_4
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_1
        -rw-r--r-- 1 root root 104857600 Aug 25 01:29 file_pri_2
        -rw-r--r-- 1 root root 104857600 Aug 25 01:55 file_pri_5
        -rw-r--r-- 1 root root 104857600 Aug 25 01:55 file_pri_6