Troubleshooting failed tape replace

This topic describes how to troubleshoot eeadm tape replace command failure and possible alternative recovery action.

Confirm the step where failure occurred

The eeadm tape replace command in IBM Storage® Archive Enterprise Edition first runs tape reconcile tasks. These subtasks begin as within the tape replace task. When the tape reconcile subtasks fail, so does the tape replace task.

The successful tape reconcile subtask is followed by the reclaim subtasks.

The reclaim sub tasks begin as subtasks of the tape replace task. The reclaim sub subtasks are created for each tape and processed one by one.

The eeadm tape show -v command displays messages that can be used to determine which steps in the tape replace task failed.

Failure at tape_reconcile subtask

When failure happened at tape_reconcile subtask, identify error message from tape_reconcile subtask output and refer Troubleshooting a failed reconcile.

The following command output for eeadm task show -v command when the tape_replace task fails at the tape_reconcile subtask.


=== Task Information ===
Task ID:              5135
Task Type:            tape_replace
Command Parameters:   eeadm tape replace JD0009JD -p JP1 -l JLIB1
Status:               completed
Result:               failed
Accepted Time:        Mon May 16 17:35:51 2022 (+0900)
Started Time:         Mon May 16 17:35:51 2022 (+0900)
Completed Time:       Mon May 16 17:36:56 2022 (+0900)
Workload:             0 tapes
Progress:             0/0 tapes completed
Result Summary:       (GLESL757E) Failed to reconcile for replace.
Messages:
 2022-05-16 17:35:51.463115 GLESL755I: Start a reconcile before starting a replace against 1 tapes.
 2022-05-16 17:36:55.076089 GLESS098E: Reconciling tape JD0009JD failed because orphan files are found.
 2022-05-16 17:36:56.581187 GLESL757E: Failed to reconcile for replace.


    --- Subtask(level 1) Info ---
    Task ID:              5136
    Task Type:            tape_reconcile
    Status:               completed
    Result:               failed
    Accepted Time:        Mon May 16 17:35:51 2022 (+0900)
    Started Time:         Mon May 16 17:35:51 2022 (+0900)
    Completed Time:       Mon May 16 17:36:56 2022 (+0900)
    Work load:            1 tapes
    Progress:             Phase: complete (1/1 tapes completed.)
    Result Summary:
    Messages:
     2022-05-16 17:35:51.494182 GLESS016I: Reconciliation requested.
     2022-05-16 17:35:52.246002 GLESS050I: GPFS file systems involved: /mnt/gpfs .
     2022-05-16 17:35:52.246634 GLESS210I: Valid tapes in the pool: JD0006JD JD0005JD JD0009JD JD0004JD JD0001JD .
     2022-05-16 17:35:52.246869 GLESS049I: Tapes to reconcile: JD0009JD .
     2022-05-16 17:35:52.247058 GLESS134I: Reserving tapes.
     2022-05-16 17:35:52.248972 GLESS269I: JD0009JD is mounted. Moving to homeslot.
     2022-05-16 17:35:52.369285 GLESS135I: Reserved tapes: JD0009JD .
     2022-05-16 17:35:52.438789 GLESS054I: Creating GPFS snapshots:
     2022-05-16 17:35:52.438969 GLESS055I: Deleting the previous reconcile snapshot and creating a new one for /mnt/gpfs ( fsd ).
     2022-05-16 17:35:53.927497 GLESS056I: Searching GPFS snapshots:
     2022-05-16 17:35:53.935386 GLESS057I: Searching GPFS snapshot of /mnt/gpfs ( fsd ).
     2022-05-16 17:36:49.763496 GLESS060I: Processing the file lists:
     2022-05-16 17:36:49.799062 GLESS061I: Processing the file list for /mnt/gpfs ( fsd ).
     2022-05-16 17:36:53.864704 GLESS141I: Removing stale DMAPI attributes:
     2022-05-16 17:36:53.864930 GLESS142I: Removing stale DMAPI attributes for /mnt/gpfs ( fsd ).
     2022-05-16 17:36:54.019848 GLESS063I: Reconciling the tapes:
     2022-05-16 17:36:54.055975 GLESS248I: Reconcile tape JD0009JD.
     2022-05-16 17:36:55.104043 GLESS249I: Releasing reservation of tape JD0009JD.
     2022-05-16 17:36:55.104337 GLESS058I: Removing GPFS snapshots:
     2022-05-16 17:36:55.104511 GLESS059I: Removing GPFS snapshot of /mnt/gpfs ( fsd ).


        --- Subtask(level 2) Info ---
        Task ID:              5137
        Task Type:            reconcile_tape
        Status:               completed
        Result:               failed
        Accepted Time:        Mon May 16 17:36:54 2022 (+0900)
        Started Time:         Mon May 16 17:36:54 2022 (+0900)
        Completed Time:       Mon May 16 17:36:55 2022 (+0900)
        Workload:             -
        Progress:             Tape: JD0009JD Phase: reconcile tape complete
        Result Summary:       Tape: JD0009JD Result: (GLESS098E) Reconciling tape JD0009JD failed because orphan files are found.
        Messages:
         2022-05-16 17:36:55.073969 GLESS098E: Reconciling tape JD0009JD failed because orphan files are found

Failure at reclaim_sub subtask

When failure happened at reclaim_sub subtask, the list of files that failed to process can be created by eeadm task show command with -r failed option.

As default setting, the list of processed files is only kept for the last one tape from the last tape_replace task. To change the number of tapes with the file list, add the variable RECLAIM_FILELIST_GENERATION <number> to the file .ltfsee/config/ltfsee.config file.
Note: The file lists are stored under .ltfsee/tmp/reclaim directory. Please ensure the directory has enough space to keep file lists when you change the number of file lists.

The following command output for eeadm task show -v command for the case the tape_replace task failed at the reclaim_sub subtask.


=== Task Information ===
Task ID:              5202
Task Type:            tape_replace
Command Parameters:   eeadm tape replace -p JP1 -l JLIB1 JD0010JD
Status:               completed
Result:               failed
Accepted Time:        Wed May 18 08:21:20 2022 (+0900)
Started Time:         Wed May 18 08:21:20 2022 (+0900)
Completed Time:       Wed May 18 08:24:41 2022 (+0900)
Workload:             1 tapes
Progress:             1/1 tapes completed
Result Summary:       (GLESL750E) Tape replace for JD0010JD failed (4035).
Messages:
 2022-05-18 08:21:20.849250 GLESL755I: Start a reconcile before starting a replace against 1 tapes.
 2022-05-18 08:22:22.691675 GLESS002I: Reconciling tape JD0010JD complete.
 2022-05-18 08:22:24.091778 GLESL756I: Reconcile before replace finished.
 2022-05-18 08:22:24.106062 GLESL753I: Starting tape replace for JD0010JD.
 2022-05-18 08:22:24.106306 GLESL754I: Found a target tape for tape replace (JD0009JD).
 2022-05-18 08:24:41.179256 GLESL750E: Tape replace for JD0010JD failed (4035).


    --- Subtask(level 1) Info ---
    Task ID:              5203
    Task Type:            tape_reconcile
    Status:               completed
    Result:               succeeded
    Accepted Time:        Wed May 18 08:21:20 2022 (+0900)
    Started Time:         Wed May 18 08:21:20 2022 (+0900)
    Completed Time:       Wed May 18 08:22:24 2022 (+0900)
    Work load:            1 tapes
    Progress:             Phase: complete (1/1 tapes completed.)
    Result Summary:
    Messages:
     2022-05-18 08:21:20.882741 GLESS016I: Reconciliation requested.
     2022-05-18 08:21:21.636996 GLESS050I: GPFS file systems involved: /mnt/gpfs .
     2022-05-18 08:21:21.638212 GLESS210I: Valid tapes in the pool: JD0009JD JD0005JD JD0004JD JD0010JD JD0001JD .
     2022-05-18 08:21:21.638439 GLESS049I: Tapes to reconcile: JD0010JD .
     2022-05-18 08:21:21.638763 GLESS134I: Reserving tapes.
     2022-05-18 08:21:21.639333 GLESS269I: JD0010JD is mounted. Moving to homeslot.
     2022-05-18 08:21:21.771097 GLESS135I: Reserved tapes: JD0010JD .
     2022-05-18 08:21:21.869173 GLESS054I: Creating GPFS snapshots:
     2022-05-18 08:21:21.869376 GLESS055I: Deleting the previous reconcile snapshot and creating a new one for /mnt/gpfs ( fsd ).
     2022-05-18 08:21:23.277241 GLESS056I: Searching GPFS snapshots:
     2022-05-18 08:21:23.283855 GLESS057I: Searching GPFS snapshot of /mnt/gpfs ( fsd ).
     2022-05-18 08:22:17.564552 GLESS060I: Processing the file lists:
     2022-05-18 08:22:17.605262 GLESS061I: Processing the file list for /mnt/gpfs ( fsd ).
     2022-05-18 08:22:21.672697 GLESS141I: Removing stale DMAPI attributes:
     2022-05-18 08:22:21.672932 GLESS142I: Removing stale DMAPI attributes for /mnt/gpfs ( fsd ).
     2022-05-18 08:22:21.856841 GLESS063I: Reconciling the tapes:
     2022-05-18 08:22:21.916855 GLESS248I: Reconcile tape JD0010JD.
     2022-05-18 08:22:22.970135 GLESS249I: Releasing reservation of tape JD0010JD.
     2022-05-18 08:22:22.970689 GLESS058I: Removing GPFS snapshots:
     2022-05-18 08:22:22.970855 GLESS059I: Removing GPFS snapshot of /mnt/gpfs ( fsd ).


        --- Subtask(level 2) Info ---
        Task ID:              5204
        Task Type:            reconcile_tape
        Status:               completed
        Result:               succeeded
        Accepted Time:        Wed May 18 08:22:21 2022 (+0900)
        Started Time:         Wed May 18 08:22:21 2022 (+0900)
        Completed Time:       Wed May 18 08:22:22 2022 (+0900)
        Workload:             -
        Progress:             Tape: JD0010JD Phase: reconcile tape complete
        Result Summary:       Tape: JD0010JD Result: (GLESS002I) Reconciling tape JD0010JD complete.
        Messages:
         2022-05-18 08:22:22.691022 GLESS002I: Reconciling tape JD0010JD complete.


    --- Subtask(level 1) Info ---
    Task ID:              5205
    Task Type:            reclaim_sub
    Status:               completed
    Result:               failed
    Accepted Time:        Wed May 18 08:22:24 2022 (+0900)
    Started Time:         Wed May 18 08:22:24 2022 (+0900)
    Completed Time:       Wed May 18 08:24:40 2022 (+0900)
    Workload:             from source_tape: JD0010JD to target_tape: JD0009JD
    Result Summary:       (GLESR261E) A subtask ended because there are files whose status cannot be determined by the reclaim process. The "tape reconcile" command is required to determine the status of the files.
    Messages:
     2022-05-18 08:24:40.419454 GLESR261E: A subtask ended because there are files whose status cannot be determined by the reclaim process. The "tape reconcile" command is required to determine the status of the files.

The following are command and output of the command for the failed task 5202 and later.

# eeadm task show 5202 -r failed

[Source Tape: JD0010JD]
Result    Failure Code  Failed time               i-node     -- File name
Fail      GLESR119E     -                         310794     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M24.bin
Fail      GLESR119E     -                         310795     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M25.bin
Fail      GLESR119E     -                         310796     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M26.bin
Fail      GLESR119E     -                         310797     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M27.bin
Fail      GLESR119E     -                         310798     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M28.bin
Fail      GLESR119E     -                         310800     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M29.bin
Fail      GLESR119E     -                         310801     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M30.bin
Fail      GLESR119E     -                         310804     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M31.bin
Fail      GLESR119E     -                         310805     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M32.bin
Fail      GLESR119E     -                         310806     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M33.bin
Fail      GLESR119E     -                         310807     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M34.bin
Fail      GLESR119E     -                         310808     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M35.bin
Fail      GLESR119E     -                         310809     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M36.bin
Fail      GLESR119E     -                         310811     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M37.bin
Fail      GLESR119E     -                         310812     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M38.bin
Fail      GLESR119E     -                         310813     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M39.bin
Fail      GLESR119E     -                         310814     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M40.bin
Fail      GLESR119E     -                         310815     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M41.bin
Fail      GLESR119E     -                         310816     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M42.bin
Fail      GLESR119E     -                         310817     -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M43.bin

Failure at full tape capacity

When there is no tape in the pool with sufficient remaining capacity to store all of the files on the source tape, the tape replace task fails because the target tape becomes full. To process remaining files to another target tape, rerun the eeadm tape replace command with the same tape.

The following is the command output for eeadm task show -v command for the case the tape_replace task failed because of target tape full.

=== Task Information ===
Task ID:              5274
Task Type:            tape_replace
Command Parameters:   eeadm tape replace -p JP3 JC0002JC -l JLIB1
Status:               completed
Result:               failed
Accepted Time:        Sun Jun 19 18:15:11 2022 (+0900)
Started Time:         Sun Jun 19 18:15:11 2022 (+0900)
Completed Time:       Sun Jun 19 18:16:41 2022 (+0900)
Workload:             1 tapes
Progress:             1/1 tapes completed
Result Summary:       (GLESL750E) Tape replace for JC0002JC failed (4001).
Messages:
 2022-06-19 18:15:11.135065 GLESL755I: Start a reconcile before starting a replace against 1 tapes.
 2022-06-19 18:16:14.191030 GLESS002I: Reconciling tape JC0002JC complete.
 2022-06-19 18:16:15.482323 GLESL756I: Reconcile before replace finished.
 2022-06-19 18:16:15.491467 GLESL753I: Starting tape replace for JC0002JC.
 2022-06-19 18:16:15.491707 GLESL754I: Found a target tape for tape replace (JC0003JC).
 2022-06-19 18:16:41.537064 GLESL750E: Tape replace for JC0002JC failed (4001).


    --- Subtask(level 1) Info ---
    Task ID:              5275
    Task Type:            tape_reconcile
    Status:               completed
    Result:               succeeded
    Accepted Time:        Sun Jun 19 18:15:11 2022 (+0900)
    Started Time:         Sun Jun 19 18:15:11 2022 (+0900)
    Completed Time:       Sun Jun 19 18:16:15 2022 (+0900)
    Work load:            1 tapes
    Progress:             Phase: complete (1/1 tapes completed.)
    Result Summary:
    Messages:
     2022-06-19 18:15:11.182602 GLESS016I: Reconciliation requested.
     2022-06-19 18:15:11.949432 GLESS050I: GPFS file systems involved: /mnt/gpfs .
     2022-06-19 18:15:11.950141 GLESS210I: Valid tapes in the pool: JC0003JC JC0002JC .
     2022-06-19 18:15:11.950334 GLESS049I: Tapes to reconcile: JC0002JC .
     2022-06-19 18:15:11.950503 GLESS134I: Reserving tapes.
     2022-06-19 18:15:11.951001 GLESS269I: JC0002JC is mounted. Moving to homeslot.
     2022-06-19 18:15:12.277501 GLESS135I: Reserved tapes: JC0002JC .
     2022-06-19 18:15:12.379113 GLESS054I: Creating GPFS snapshots:
     2022-06-19 18:15:12.379316 GLESS055I: Deleting the previous reconcile snapshot and creating a new one for /mnt/gpfs ( fsd ).
     2022-06-19 18:15:13.824331 GLESS056I: Searching GPFS snapshots:
     2022-06-19 18:15:13.846176 GLESS057I: Searching GPFS snapshot of /mnt/gpfs ( fsd ).
     2022-06-19 18:16:08.884717 GLESS060I: Processing the file lists:
     2022-06-19 18:16:08.914783 GLESS061I: Processing the file list for /mnt/gpfs ( fsd ).
     2022-06-19 18:16:13.163567 GLESS141I: Removing stale DMAPI attributes:
     2022-06-19 18:16:13.163800 GLESS142I: Removing stale DMAPI attributes for /mnt/gpfs ( fsd ).
     2022-06-19 18:16:13.325858 GLESS063I: Reconciling the tapes:
     2022-06-19 18:16:13.360462 GLESS248I: Reconcile tape JC0002JC.
     2022-06-19 18:16:14.409094 GLESS249I: Releasing reservation of tape JC0002JC.
     2022-06-19 18:16:14.409571 GLESS058I: Removing GPFS snapshots:
     2022-06-19 18:16:14.409770 GLESS059I: Removing GPFS snapshot of /mnt/gpfs ( fsd ).


        --- Subtask(level 2) Info ---
        Task ID:              5276
        Task Type:            reconcile_tape
        Status:               completed
        Result:               succeeded
        Accepted Time:        Sun Jun 19 18:16:13 2022 (+0900)
        Started Time:         Sun Jun 19 18:16:13 2022 (+0900)
        Completed Time:       Sun Jun 19 18:16:14 2022 (+0900)
        Workload:             -
        Progress:             Tape: JC0002JC Phase: reconcile tape complete
        Result Summary:       Tape: JC0002JC Result: (GLESS002I) Reconciling tape JC0002JC complete.
        Messages:
         2022-06-19 18:16:14.188964 GLESS002I: Reconciling tape JC0002JC complete.


    --- Subtask(level 1) Info ---
    Task ID:              5277
    Task Type:            reclaim_sub
    Status:               completed
    Result:               failed
    Accepted Time:        Sun Jun 19 18:16:15 2022 (+0900)
    Started Time:         Sun Jun 19 18:16:15 2022 (+0900)
    Completed Time:       Sun Jun 19 18:16:40 2022 (+0900)
    Workload:             from source_tape: JC0002JC to target_tape: JC0003JC
    Result Summary:       (GLESR278I) The target tape became full and operations stopped. Rerun the command to continue the process.
    Messages:
     2022-06-19 18:16:40.708479 GLESR278I: The target tape became full and operations stopped. Rerun the command to continue the process.

Alternative Recovery Action for the tapes with require_replace and need_replace status

The status of tape, require_replace, or need_replace are set when write errors or read errors are detected on the tape. There are numerous reasons why the tape drive reports write or read errors, and those errors might reoccur on the same tape. When read errors occur on the tape, the tape replace task is unable to read the file on the tape as well. In that case, the user can use the following alternative recovery method.
Note: This procedure is used to retrieve all files from the tape that have the require_replace or need_replace status. This necessitates sufficient disk space and a reoperation for the recalled files. This procedure does not include the remigrate step. Recalled files can be migrated in the same manner as newly created files.
  1. Create a list of files from the output of eeadm task show command for failed tape_replace task.
    Example of command line to have a list file from output of eeadm task show command.
    # eeadm task show 5202 -r failed > ./5202_fail.lst

    When using the eeadm tape replace command with two or more tapes to replace, the output can contain files from multiple tapes. You can remove the line from the other tapes that use the text editor, leaving only the line of files from the tape to be processed.

  2. The number of files to be processed by this alternative action step is represented by the number of lines in the list file. When there is enough space on disk and all of the files on the tape can be stored, proceed to Steps 3a and 3b.

    Select Step 4 a, b, and c when there is not be enough disk space left and the user wants to process the files step by step.

    1. Use the eeadm tape unassign command with the —safe-unassign option to retrieve all required files from the tape and make them resident. After that, the command removes the tape from the pool. The recalled files should be migrated back to the appropriate target pool as described above.
      Example of command to run tape_unassign task.
      # eeadm tape unassign --safe-unassign JD0010JD -p JP1
    2. When the eeadm tape unassign command completes without error, all files on the tape are recalled to the disk and the file states are changed to resident. The pool tape is removed and the alternative recovery action is completed.
      When eeadm tape unassign command failed, there are some files that were not able to read from the tape and remained in it. Use eeadm task show command with -v option to find out the task number of selective_recall subtask. 
      Example of eeadm task show -v output for eeadm tape unassign --safe-unassign command.
      
      === Task Information ===
      Task ID:              5228
      Task Type:            tape_unassign
      Command Parameters:   eeadm tape unassign JD0005JD --safe-unassign -p JP1 -l JLIB1
      Status:               completed
      Result:               succeeded
      Accepted Time:        Wed May 18 17:04:30 2022 (+0900)
      Started Time:         Wed May 18 17:04:30 2022 (+0900)
      Completed Time:       Wed May 18 17:06:30 2022 (+0900)
      Workload:             1 tapes
      Progress:             1/1 tapes completed
      Result Summary:       (GLESL875I) Tape unassign is requested, empty-check: disk, on-remaining: safe-unassign.
                            Tape: JD0005JD Result: (GLESL359I) Unassigned tape JD0005JD from pool JP1 successfully.
      Messages:
       2022-05-18 17:04:30.319443 GLESL875I: Tape unassign is requested, empty-check: disk, on-remaining: safe-unassign.
       2022-05-18 17:04:31.960785 GLESL600I: Searching the GPFS file systems to find migrated/saved objects in tape JD0005JD.
       2022-05-18 17:05:29.575684 GLESL605I: Tape JD0005JD has files to be recovered. The list is saved to /mnt/gpfs/.ltfsee/statesave/active/5228/subtask.5229/ltfs81.9762.mnt.gpfs.recoverlist. (num=1)
       2022-05-18 17:06:30.544842 GLESL603I: Searching for the non-IBM Storage Archive EE objects in tape JD0005JD.
       2022-05-18 17:06:30.571865 GLESL610I: Recovery of tape JD0005JD was successful. 1 files were recovered. The list is saved under statesave directory of task id = 5229 with ".recoverlist" extension.
       2022-05-18 17:06:30.572194 GLESL879I: non-IBM Storage Archive EE files are not found on tape JD0005JD.
       2022-05-18 17:06:30.836833 GLESL359I: Unassigned tape JD0005JD from pool JP1 successfully.
      
      
          --- Subtask(level 1) Info ---
          Task ID:              5229
          Task Type:            tape_unassign
          Status:               completed                                                                                                                                                                                                                [0/1877]
          Result:               succeeded
          Accepted Time:        Wed May 18 17:04:31 2022 (+0900)
          Started Time:         Wed May 18 17:04:31 2022 (+0900)
          Completed Time:       Wed May 18 17:06:30 2022 (+0900)
          Workload:             1 tapes
          Progress:             1/1 tapes completed
          Result Summary:       Tape: JD0005JD Result: (GLESL359I) Unassigned tape JD0005JD from pool JP1 successfully.
          Messages:
           2022-05-18 17:04:31.958669 GLESL600I: Searching the GPFS file systems to find migrated/saved objects in tape JD0005JD.
           2022-05-18 17:05:29.571486 GLESL605I: Tape JD0005JD has files to be recovered. The list is saved to /mnt/gpfs/.ltfsee/statesave/active/5228/subtask.5229/ltfs81.9762.mnt.gpfs.recoverlist. (num=1)
           2022-05-18 17:05:32.029623 GLESL602I: Searching for the remaining objects migrated/saved in tape JD0005JD.
           2022-05-18 17:06:30.542159 GLESL603I: Searching for the non-IBM Storage Archive EE objects in tape JD0005JD.
           2022-05-18 17:06:30.571725 GLESL610I: Recovery of tape JD0005JD was successful. 1 files were recovered. The list is saved under statesave directory of task id = 5229 with ".recoverlist" extension.
           2022-05-18 17:06:30.572037 GLESL879I: non-IBM Storage Archive EE files are not found on tape JD0005JD.
           2022-05-18 17:06:30.836619 GLESL359I: Unassigned tape JD0005JD from pool JP1 successfully.
      
      
              --- Subtask(level 2) Info ---
              Task ID:              5230
              Task Type:            selective_recall
              Status:               completed
              Result:               succeeded
              Accepted Time:        Wed May 18 17:05:29 2022 (+0900)
              Started Time:         Wed May 18 17:05:29 2022 (+0900)
              Completed Time:       Wed May 18 17:05:32 2022 (+0900)
              Workload:             10 file(s), 0 bytes to process.
              Progress:             10 completed (or failed) files / 10 total files.
              Result Summary:       -
              Messages:
               2022-05-18 17:05:32.025866 GLESL839I: All 10 file(s) has been successfully processed.
               2022-05-18 17:05:32.026276 GLESL873W: 0 files have inconsistent file hash but the errors have been regarded as warning.
               2022-05-18 17:05:32.026448 GLESL872I:   Succeeded: 10 resident, 0 already_resident, 0 recalled_but_inconsistent_file_hash
      
      
                  --- Subtask(level 3) Info ---
                  Task ID:              5231
                  Task Type:            selective_recall
                  Status:               completed
                  Result:               succeeded
                  Accepted Time:        Wed May 18 17:05:29 2022 (+0900)
                  Started Time:         Wed May 18 17:05:29 2022 (+0900)
                  Completed Time:       Wed May 18 17:05:31 2022 (+0900)
                  Workload:             10 file(s), 10,485,760 bytes (10.4 MB) to process.
                  Progress:             10 completed (or failed) files / 10 total files.
                  Result Summary:       -
                  Messages:
                   No messages

      In this example, the Task ID to be pick up as selective_recall subtask is 5230. Proceed Step 5 for the following operations.

    1. Create file lists. Filter out following two failure code from the list; GLESR137E, GLESR138E. 
      Example of command to pick up lines including file without those two failure codes . 
      # cat ./5202_fail.lst | grep '^Fail' | grep -v 'GLESR137E' | grep -v 'GLESR138E' > ./5202_fail_filtered.lst
      Example of command to pick up lines with those two failure codes.
      
      # cat ./5202_fail.lst | grep 'GLESR137E'  > ./5202_fail_137E.lst
      # cat ./5202_fail.lst | grep 'GLESR138E'  > ./5202_fail_138E.lst

      When the two files contain some lines, the file in the lines cannot be found by its file name. Use inode to locate the file and confirm the file name. Once the files are found, make a list of them and label each one with the name found by inode search. In the following example, the list file is 5202 found.lst. It must be noted that inode search can fail to find any file and happens if the files on disk space are already removed. In that case, the file is no longer required and does not need to be included in the list. For more information on the list file, see eeadm recall

    2. Recall files and make them as resident. At this step, the user can split a file list into multiple file lists with small number of files. Example of command to recall files in the list created in Step.2
      
      # cat ./5202_fail_filtered.lst | eeadm recall --resident 
      # cat ./5202_found.lst | eeadm recall --resident (in the case that 5202_found.lst is created.)
    3. When the eeadm recall command completes without error, migrate the recalled files back into the target pools so that there is space on disk storage again. Repeat this process until all of the split list files are processed. When the eeadm recall command fails for a list, skip the list and proceed to the next file lists until all of the list files are processed.

      When all list files are processed without error, proceed to Step 7 to remove the tape from the pool. Proceed to Step.5, if one or more list files failed with the eeadm recall command.

  3. Specify which files were failed by the eeadm task show -r command for the selective recall subtask of the eeadm tape unassign command or the failed selective recall task that is initiated by the eeadm recall command. As a different attempt to read the file, use a file system command (for example, head, od, or hexdump). An example of the output of the eeadm task show -r command for recall, including failure.
    Result    Failure Code  Failed time               Node -- File name
    Fail      GLESL255E     2022/05/18T14:07:02          8 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M24.bin
    Fail      GLESL255E     2022/05/18T14:07:02          8 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M25.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M26.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M27.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M28.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M29.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M30.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M31.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M32.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M33.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M34.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M35.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M36.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M37.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M38.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M39.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M40.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M41.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M42.bin
    Success   -             -                            - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M43.bin
    Example of a file system command to read the file. 
    
    # hexdump -n 128 /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M24.bin
    0000000 28d1 7005 6c3f e0e8 0385 23e5 2b78 bfd6
    0000010 b672 f919 6fca e162 1600 7860 d93f 7660
    0000020 a3e3 6d9b 1cf3 2c31 feaf f35a 3259 5d93
    0000030 24c8 7c5e 4369 e0c0 f54b 7db4 42f8 3e1a
    0000040 7243 585a 1468 412c c01a 48c4 4210 05ad
    0000050 34da 91be 2c94 bc8f e96c f9ef c7d3 9e4d
    0000060 547c e6d9 17d3 1e5c 08c2 b3e1 44a1 5f9e
    0000070 414a ec6a 6d35 90b8 3dd5 2214 b54e 06ce
    0000080
    # 
    When there are many files in failure state, any script can help to reduce the work of recovery process.
    Note: When the command line of the migrate command that was used to migrate the file had two or three pools at -p option to migrate a file into two or three different pools, the attempt of read the file with a file system command succeed in most cases.  When the migrate command was used to write a file into single pool, the command cannot read the file successfully. 
  4. When recall attempt on Step 5 succeeded, create a file list with the files (for example, 5202_recalled.lst) and runeeadm recall --resident command again. Example of command to recall files in the list, 5202_recalled.lst.
    # cat ./5202_recalled.lst | eeadm recall --resident 

    Here all of the files are in resident state. To remove the tape from the pool, proceed to Step 7. When recall attempt on Step 5 failed again, there is no more step to attempt in this procedure. To remove the tape in require_replace or need_replace state from the pool, need to delete the file that failed to recall from the tape. If it is acceptable to remove those file from the disk, remove them and proceed to Step 7. Otherwise, get support from IBM to see whether there is any other possibility to recover the file from the tape with failure.

  5. Run either of following commands to remove require_replace or need_replace tape from the pool.
    • Rerun eeadm tape replace command. For more information, see eeadm tape replace.
    • eeadm tape unassign command with --safe-unassign option. For more information, see eeadm tape unassign.