Troubleshooting failed tape replace
This topic describes how to troubleshoot eeadm tape replace command failure and possible alternative recovery action.
Confirm the step where failure occurred
The eeadm tape replace command in IBM Storage® Archive Enterprise Edition first runs tape reconcile tasks. These subtasks begin as within the tape replace task. When the tape reconcile subtasks fail, so does the tape replace task.
The successful tape reconcile subtask is followed by the reclaim subtasks.
The reclaim sub tasks begin as subtasks of the tape replace task. The reclaim sub subtasks are created for each tape and processed one by one.The eeadm tape show -v command displays messages that can be used to determine which steps in the tape replace task failed.
Failure at tape_reconcile subtask
When failure happened at tape_reconcile subtask, identify error message from tape_reconcile subtask output and refer Troubleshooting a failed reconcile.
The following command output for eeadm task show -v command when the tape_replace task fails at the tape_reconcile subtask.
=== Task Information ===
Task ID: 5135
Task Type: tape_replace
Command Parameters: eeadm tape replace JD0009JD -p JP1 -l JLIB1
Status: completed
Result: failed
Accepted Time: Mon May 16 17:35:51 2022 (+0900)
Started Time: Mon May 16 17:35:51 2022 (+0900)
Completed Time: Mon May 16 17:36:56 2022 (+0900)
Workload: 0 tapes
Progress: 0/0 tapes completed
Result Summary: (GLESL757E) Failed to reconcile for replace.
Messages:
2022-05-16 17:35:51.463115 GLESL755I: Start a reconcile before starting a replace against 1 tapes.
2022-05-16 17:36:55.076089 GLESS098E: Reconciling tape JD0009JD failed because orphan files are found.
2022-05-16 17:36:56.581187 GLESL757E: Failed to reconcile for replace.
--- Subtask(level 1) Info ---
Task ID: 5136
Task Type: tape_reconcile
Status: completed
Result: failed
Accepted Time: Mon May 16 17:35:51 2022 (+0900)
Started Time: Mon May 16 17:35:51 2022 (+0900)
Completed Time: Mon May 16 17:36:56 2022 (+0900)
Work load: 1 tapes
Progress: Phase: complete (1/1 tapes completed.)
Result Summary:
Messages:
2022-05-16 17:35:51.494182 GLESS016I: Reconciliation requested.
2022-05-16 17:35:52.246002 GLESS050I: GPFS file systems involved: /mnt/gpfs .
2022-05-16 17:35:52.246634 GLESS210I: Valid tapes in the pool: JD0006JD JD0005JD JD0009JD JD0004JD JD0001JD .
2022-05-16 17:35:52.246869 GLESS049I: Tapes to reconcile: JD0009JD .
2022-05-16 17:35:52.247058 GLESS134I: Reserving tapes.
2022-05-16 17:35:52.248972 GLESS269I: JD0009JD is mounted. Moving to homeslot.
2022-05-16 17:35:52.369285 GLESS135I: Reserved tapes: JD0009JD .
2022-05-16 17:35:52.438789 GLESS054I: Creating GPFS snapshots:
2022-05-16 17:35:52.438969 GLESS055I: Deleting the previous reconcile snapshot and creating a new one for /mnt/gpfs ( fsd ).
2022-05-16 17:35:53.927497 GLESS056I: Searching GPFS snapshots:
2022-05-16 17:35:53.935386 GLESS057I: Searching GPFS snapshot of /mnt/gpfs ( fsd ).
2022-05-16 17:36:49.763496 GLESS060I: Processing the file lists:
2022-05-16 17:36:49.799062 GLESS061I: Processing the file list for /mnt/gpfs ( fsd ).
2022-05-16 17:36:53.864704 GLESS141I: Removing stale DMAPI attributes:
2022-05-16 17:36:53.864930 GLESS142I: Removing stale DMAPI attributes for /mnt/gpfs ( fsd ).
2022-05-16 17:36:54.019848 GLESS063I: Reconciling the tapes:
2022-05-16 17:36:54.055975 GLESS248I: Reconcile tape JD0009JD.
2022-05-16 17:36:55.104043 GLESS249I: Releasing reservation of tape JD0009JD.
2022-05-16 17:36:55.104337 GLESS058I: Removing GPFS snapshots:
2022-05-16 17:36:55.104511 GLESS059I: Removing GPFS snapshot of /mnt/gpfs ( fsd ).
--- Subtask(level 2) Info ---
Task ID: 5137
Task Type: reconcile_tape
Status: completed
Result: failed
Accepted Time: Mon May 16 17:36:54 2022 (+0900)
Started Time: Mon May 16 17:36:54 2022 (+0900)
Completed Time: Mon May 16 17:36:55 2022 (+0900)
Workload: -
Progress: Tape: JD0009JD Phase: reconcile tape complete
Result Summary: Tape: JD0009JD Result: (GLESS098E) Reconciling tape JD0009JD failed because orphan files are found.
Messages:
2022-05-16 17:36:55.073969 GLESS098E: Reconciling tape JD0009JD failed because orphan files are found
Failure at reclaim_sub subtask
When failure happened at reclaim_sub subtask, the list of files that failed to process can be created by eeadm task show command with -r failed option.
The following command output for eeadm task show -v command for the case the tape_replace task failed at the reclaim_sub subtask.
=== Task Information ===
Task ID: 5202
Task Type: tape_replace
Command Parameters: eeadm tape replace -p JP1 -l JLIB1 JD0010JD
Status: completed
Result: failed
Accepted Time: Wed May 18 08:21:20 2022 (+0900)
Started Time: Wed May 18 08:21:20 2022 (+0900)
Completed Time: Wed May 18 08:24:41 2022 (+0900)
Workload: 1 tapes
Progress: 1/1 tapes completed
Result Summary: (GLESL750E) Tape replace for JD0010JD failed (4035).
Messages:
2022-05-18 08:21:20.849250 GLESL755I: Start a reconcile before starting a replace against 1 tapes.
2022-05-18 08:22:22.691675 GLESS002I: Reconciling tape JD0010JD complete.
2022-05-18 08:22:24.091778 GLESL756I: Reconcile before replace finished.
2022-05-18 08:22:24.106062 GLESL753I: Starting tape replace for JD0010JD.
2022-05-18 08:22:24.106306 GLESL754I: Found a target tape for tape replace (JD0009JD).
2022-05-18 08:24:41.179256 GLESL750E: Tape replace for JD0010JD failed (4035).
--- Subtask(level 1) Info ---
Task ID: 5203
Task Type: tape_reconcile
Status: completed
Result: succeeded
Accepted Time: Wed May 18 08:21:20 2022 (+0900)
Started Time: Wed May 18 08:21:20 2022 (+0900)
Completed Time: Wed May 18 08:22:24 2022 (+0900)
Work load: 1 tapes
Progress: Phase: complete (1/1 tapes completed.)
Result Summary:
Messages:
2022-05-18 08:21:20.882741 GLESS016I: Reconciliation requested.
2022-05-18 08:21:21.636996 GLESS050I: GPFS file systems involved: /mnt/gpfs .
2022-05-18 08:21:21.638212 GLESS210I: Valid tapes in the pool: JD0009JD JD0005JD JD0004JD JD0010JD JD0001JD .
2022-05-18 08:21:21.638439 GLESS049I: Tapes to reconcile: JD0010JD .
2022-05-18 08:21:21.638763 GLESS134I: Reserving tapes.
2022-05-18 08:21:21.639333 GLESS269I: JD0010JD is mounted. Moving to homeslot.
2022-05-18 08:21:21.771097 GLESS135I: Reserved tapes: JD0010JD .
2022-05-18 08:21:21.869173 GLESS054I: Creating GPFS snapshots:
2022-05-18 08:21:21.869376 GLESS055I: Deleting the previous reconcile snapshot and creating a new one for /mnt/gpfs ( fsd ).
2022-05-18 08:21:23.277241 GLESS056I: Searching GPFS snapshots:
2022-05-18 08:21:23.283855 GLESS057I: Searching GPFS snapshot of /mnt/gpfs ( fsd ).
2022-05-18 08:22:17.564552 GLESS060I: Processing the file lists:
2022-05-18 08:22:17.605262 GLESS061I: Processing the file list for /mnt/gpfs ( fsd ).
2022-05-18 08:22:21.672697 GLESS141I: Removing stale DMAPI attributes:
2022-05-18 08:22:21.672932 GLESS142I: Removing stale DMAPI attributes for /mnt/gpfs ( fsd ).
2022-05-18 08:22:21.856841 GLESS063I: Reconciling the tapes:
2022-05-18 08:22:21.916855 GLESS248I: Reconcile tape JD0010JD.
2022-05-18 08:22:22.970135 GLESS249I: Releasing reservation of tape JD0010JD.
2022-05-18 08:22:22.970689 GLESS058I: Removing GPFS snapshots:
2022-05-18 08:22:22.970855 GLESS059I: Removing GPFS snapshot of /mnt/gpfs ( fsd ).
--- Subtask(level 2) Info ---
Task ID: 5204
Task Type: reconcile_tape
Status: completed
Result: succeeded
Accepted Time: Wed May 18 08:22:21 2022 (+0900)
Started Time: Wed May 18 08:22:21 2022 (+0900)
Completed Time: Wed May 18 08:22:22 2022 (+0900)
Workload: -
Progress: Tape: JD0010JD Phase: reconcile tape complete
Result Summary: Tape: JD0010JD Result: (GLESS002I) Reconciling tape JD0010JD complete.
Messages:
2022-05-18 08:22:22.691022 GLESS002I: Reconciling tape JD0010JD complete.
--- Subtask(level 1) Info ---
Task ID: 5205
Task Type: reclaim_sub
Status: completed
Result: failed
Accepted Time: Wed May 18 08:22:24 2022 (+0900)
Started Time: Wed May 18 08:22:24 2022 (+0900)
Completed Time: Wed May 18 08:24:40 2022 (+0900)
Workload: from source_tape: JD0010JD to target_tape: JD0009JD
Result Summary: (GLESR261E) A subtask ended because there are files whose status cannot be determined by the reclaim process. The "tape reconcile" command is required to determine the status of the files.
Messages:
2022-05-18 08:24:40.419454 GLESR261E: A subtask ended because there are files whose status cannot be determined by the reclaim process. The "tape reconcile" command is required to determine the status of the files.
The following are command and output of the command for the failed task 5202 and later.
# eeadm task show 5202 -r failed
[Source Tape: JD0010JD]
Result Failure Code Failed time i-node -- File name
Fail GLESR119E - 310794 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M24.bin
Fail GLESR119E - 310795 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M25.bin
Fail GLESR119E - 310796 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M26.bin
Fail GLESR119E - 310797 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M27.bin
Fail GLESR119E - 310798 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M28.bin
Fail GLESR119E - 310800 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M29.bin
Fail GLESR119E - 310801 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M30.bin
Fail GLESR119E - 310804 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M31.bin
Fail GLESR119E - 310805 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M32.bin
Fail GLESR119E - 310806 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M33.bin
Fail GLESR119E - 310807 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M34.bin
Fail GLESR119E - 310808 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M35.bin
Fail GLESR119E - 310809 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M36.bin
Fail GLESR119E - 310811 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M37.bin
Fail GLESR119E - 310812 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M38.bin
Fail GLESR119E - 310813 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M39.bin
Fail GLESR119E - 310814 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M40.bin
Fail GLESR119E - 310815 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M41.bin
Fail GLESR119E - 310816 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M42.bin
Fail GLESR119E - 310817 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M43.bin
Failure at full tape capacity
When there is no tape in the pool with sufficient remaining capacity to store all of the files on the source tape, the tape replace task fails because the target tape becomes full. To process remaining files to another target tape, rerun the eeadm tape replace command with the same tape.
=== Task Information ===
Task ID: 5274
Task Type: tape_replace
Command Parameters: eeadm tape replace -p JP3 JC0002JC -l JLIB1
Status: completed
Result: failed
Accepted Time: Sun Jun 19 18:15:11 2022 (+0900)
Started Time: Sun Jun 19 18:15:11 2022 (+0900)
Completed Time: Sun Jun 19 18:16:41 2022 (+0900)
Workload: 1 tapes
Progress: 1/1 tapes completed
Result Summary: (GLESL750E) Tape replace for JC0002JC failed (4001).
Messages:
2022-06-19 18:15:11.135065 GLESL755I: Start a reconcile before starting a replace against 1 tapes.
2022-06-19 18:16:14.191030 GLESS002I: Reconciling tape JC0002JC complete.
2022-06-19 18:16:15.482323 GLESL756I: Reconcile before replace finished.
2022-06-19 18:16:15.491467 GLESL753I: Starting tape replace for JC0002JC.
2022-06-19 18:16:15.491707 GLESL754I: Found a target tape for tape replace (JC0003JC).
2022-06-19 18:16:41.537064 GLESL750E: Tape replace for JC0002JC failed (4001).
--- Subtask(level 1) Info ---
Task ID: 5275
Task Type: tape_reconcile
Status: completed
Result: succeeded
Accepted Time: Sun Jun 19 18:15:11 2022 (+0900)
Started Time: Sun Jun 19 18:15:11 2022 (+0900)
Completed Time: Sun Jun 19 18:16:15 2022 (+0900)
Work load: 1 tapes
Progress: Phase: complete (1/1 tapes completed.)
Result Summary:
Messages:
2022-06-19 18:15:11.182602 GLESS016I: Reconciliation requested.
2022-06-19 18:15:11.949432 GLESS050I: GPFS file systems involved: /mnt/gpfs .
2022-06-19 18:15:11.950141 GLESS210I: Valid tapes in the pool: JC0003JC JC0002JC .
2022-06-19 18:15:11.950334 GLESS049I: Tapes to reconcile: JC0002JC .
2022-06-19 18:15:11.950503 GLESS134I: Reserving tapes.
2022-06-19 18:15:11.951001 GLESS269I: JC0002JC is mounted. Moving to homeslot.
2022-06-19 18:15:12.277501 GLESS135I: Reserved tapes: JC0002JC .
2022-06-19 18:15:12.379113 GLESS054I: Creating GPFS snapshots:
2022-06-19 18:15:12.379316 GLESS055I: Deleting the previous reconcile snapshot and creating a new one for /mnt/gpfs ( fsd ).
2022-06-19 18:15:13.824331 GLESS056I: Searching GPFS snapshots:
2022-06-19 18:15:13.846176 GLESS057I: Searching GPFS snapshot of /mnt/gpfs ( fsd ).
2022-06-19 18:16:08.884717 GLESS060I: Processing the file lists:
2022-06-19 18:16:08.914783 GLESS061I: Processing the file list for /mnt/gpfs ( fsd ).
2022-06-19 18:16:13.163567 GLESS141I: Removing stale DMAPI attributes:
2022-06-19 18:16:13.163800 GLESS142I: Removing stale DMAPI attributes for /mnt/gpfs ( fsd ).
2022-06-19 18:16:13.325858 GLESS063I: Reconciling the tapes:
2022-06-19 18:16:13.360462 GLESS248I: Reconcile tape JC0002JC.
2022-06-19 18:16:14.409094 GLESS249I: Releasing reservation of tape JC0002JC.
2022-06-19 18:16:14.409571 GLESS058I: Removing GPFS snapshots:
2022-06-19 18:16:14.409770 GLESS059I: Removing GPFS snapshot of /mnt/gpfs ( fsd ).
--- Subtask(level 2) Info ---
Task ID: 5276
Task Type: reconcile_tape
Status: completed
Result: succeeded
Accepted Time: Sun Jun 19 18:16:13 2022 (+0900)
Started Time: Sun Jun 19 18:16:13 2022 (+0900)
Completed Time: Sun Jun 19 18:16:14 2022 (+0900)
Workload: -
Progress: Tape: JC0002JC Phase: reconcile tape complete
Result Summary: Tape: JC0002JC Result: (GLESS002I) Reconciling tape JC0002JC complete.
Messages:
2022-06-19 18:16:14.188964 GLESS002I: Reconciling tape JC0002JC complete.
--- Subtask(level 1) Info ---
Task ID: 5277
Task Type: reclaim_sub
Status: completed
Result: failed
Accepted Time: Sun Jun 19 18:16:15 2022 (+0900)
Started Time: Sun Jun 19 18:16:15 2022 (+0900)
Completed Time: Sun Jun 19 18:16:40 2022 (+0900)
Workload: from source_tape: JC0002JC to target_tape: JC0003JC
Result Summary: (GLESR278I) The target tape became full and operations stopped. Rerun the command to continue the process.
Messages:
2022-06-19 18:16:40.708479 GLESR278I: The target tape became full and operations stopped. Rerun the command to continue the process.
Alternative Recovery Action for the tapes with require_replace and need_replace status
- Create a list of files from the output of eeadm task show command for failed
tape_replace task.Example of command line to have a list file from output of eeadm task show command.
# eeadm task show 5202 -r failed > ./5202_fail.lst
When using the eeadm tape replace command with two or more tapes to replace, the output can contain files from multiple tapes. You can remove the line from the other tapes that use the text editor, leaving only the line of files from the tape to be processed.
- The number of files to be processed by this alternative action step is represented by the number
of lines in the list file. When there is enough space on disk and all of the files on the tape can
be stored, proceed to Steps 3a and 3b.
Select Step 4 a, b, and c when there is not be enough disk space left and the user wants to process the files step by step.
-
- Use the eeadm tape unassign command with the
—safe-unassign option to retrieve all required files from the tape and make them
resident. After that, the command removes the tape from the pool. The recalled files should be
migrated back to the appropriate target pool as described above.Example of command to run tape_unassign task.
# eeadm tape unassign --safe-unassign JD0010JD -p JP1
- When the eeadm tape unassign command completes without error, all files on
the tape are recalled to the disk and the file states are changed to resident. The pool tape is
removed and the alternative recovery action is completed.When eeadm tape unassign command failed, there are some files that were not able to read from the tape and remained in it. Use eeadm task show command with -v option to find out the task number of selective_recall subtask.
Example of eeadm task show -v output for eeadm tape unassign --safe-unassign command.=== Task Information === Task ID: 5228 Task Type: tape_unassign Command Parameters: eeadm tape unassign JD0005JD --safe-unassign -p JP1 -l JLIB1 Status: completed Result: succeeded Accepted Time: Wed May 18 17:04:30 2022 (+0900) Started Time: Wed May 18 17:04:30 2022 (+0900) Completed Time: Wed May 18 17:06:30 2022 (+0900) Workload: 1 tapes Progress: 1/1 tapes completed Result Summary: (GLESL875I) Tape unassign is requested, empty-check: disk, on-remaining: safe-unassign. Tape: JD0005JD Result: (GLESL359I) Unassigned tape JD0005JD from pool JP1 successfully. Messages: 2022-05-18 17:04:30.319443 GLESL875I: Tape unassign is requested, empty-check: disk, on-remaining: safe-unassign. 2022-05-18 17:04:31.960785 GLESL600I: Searching the GPFS file systems to find migrated/saved objects in tape JD0005JD. 2022-05-18 17:05:29.575684 GLESL605I: Tape JD0005JD has files to be recovered. The list is saved to /mnt/gpfs/.ltfsee/statesave/active/5228/subtask.5229/ltfs81.9762.mnt.gpfs.recoverlist. (num=1) 2022-05-18 17:06:30.544842 GLESL603I: Searching for the non-IBM Storage Archive EE objects in tape JD0005JD. 2022-05-18 17:06:30.571865 GLESL610I: Recovery of tape JD0005JD was successful. 1 files were recovered. The list is saved under statesave directory of task id = 5229 with ".recoverlist" extension. 2022-05-18 17:06:30.572194 GLESL879I: non-IBM Storage Archive EE files are not found on tape JD0005JD. 2022-05-18 17:06:30.836833 GLESL359I: Unassigned tape JD0005JD from pool JP1 successfully. --- Subtask(level 1) Info --- Task ID: 5229 Task Type: tape_unassign Status: completed [0/1877] Result: succeeded Accepted Time: Wed May 18 17:04:31 2022 (+0900) Started Time: Wed May 18 17:04:31 2022 (+0900) Completed Time: Wed May 18 17:06:30 2022 (+0900) Workload: 1 tapes Progress: 1/1 tapes completed Result Summary: Tape: JD0005JD Result: (GLESL359I) Unassigned tape JD0005JD from pool JP1 successfully. Messages: 2022-05-18 17:04:31.958669 GLESL600I: Searching the GPFS file systems to find migrated/saved objects in tape JD0005JD. 2022-05-18 17:05:29.571486 GLESL605I: Tape JD0005JD has files to be recovered. The list is saved to /mnt/gpfs/.ltfsee/statesave/active/5228/subtask.5229/ltfs81.9762.mnt.gpfs.recoverlist. (num=1) 2022-05-18 17:05:32.029623 GLESL602I: Searching for the remaining objects migrated/saved in tape JD0005JD. 2022-05-18 17:06:30.542159 GLESL603I: Searching for the non-IBM Storage Archive EE objects in tape JD0005JD. 2022-05-18 17:06:30.571725 GLESL610I: Recovery of tape JD0005JD was successful. 1 files were recovered. The list is saved under statesave directory of task id = 5229 with ".recoverlist" extension. 2022-05-18 17:06:30.572037 GLESL879I: non-IBM Storage Archive EE files are not found on tape JD0005JD. 2022-05-18 17:06:30.836619 GLESL359I: Unassigned tape JD0005JD from pool JP1 successfully. --- Subtask(level 2) Info --- Task ID: 5230 Task Type: selective_recall Status: completed Result: succeeded Accepted Time: Wed May 18 17:05:29 2022 (+0900) Started Time: Wed May 18 17:05:29 2022 (+0900) Completed Time: Wed May 18 17:05:32 2022 (+0900) Workload: 10 file(s), 0 bytes to process. Progress: 10 completed (or failed) files / 10 total files. Result Summary: - Messages: 2022-05-18 17:05:32.025866 GLESL839I: All 10 file(s) has been successfully processed. 2022-05-18 17:05:32.026276 GLESL873W: 0 files have inconsistent file hash but the errors have been regarded as warning. 2022-05-18 17:05:32.026448 GLESL872I: Succeeded: 10 resident, 0 already_resident, 0 recalled_but_inconsistent_file_hash --- Subtask(level 3) Info --- Task ID: 5231 Task Type: selective_recall Status: completed Result: succeeded Accepted Time: Wed May 18 17:05:29 2022 (+0900) Started Time: Wed May 18 17:05:29 2022 (+0900) Completed Time: Wed May 18 17:05:31 2022 (+0900) Workload: 10 file(s), 10,485,760 bytes (10.4 MB) to process. Progress: 10 completed (or failed) files / 10 total files. Result Summary: - Messages: No messages
In this example, the Task ID to be pick up as selective_recall subtask is 5230. Proceed Step 5 for the following operations.
- Use the eeadm tape unassign command with the
—safe-unassign option to retrieve all required files from the tape and make them
resident. After that, the command removes the tape from the pool. The recalled files should be
migrated back to the appropriate target pool as described above.
-
- Create file lists. Filter out following two failure code from the list; GLESR137E,
GLESR138E. Example of command to pick up lines including file without those two failure codes .
# cat ./5202_fail.lst | grep '^Fail' | grep -v 'GLESR137E' | grep -v 'GLESR138E' > ./5202_fail_filtered.lst
Example of command to pick up lines with those two failure codes.# cat ./5202_fail.lst | grep 'GLESR137E' > ./5202_fail_137E.lst # cat ./5202_fail.lst | grep 'GLESR138E' > ./5202_fail_138E.lst
When the two files contain some lines, the file in the lines cannot be found by its file name. Use inode to locate the file and confirm the file name. Once the files are found, make a list of them and label each one with the name found by inode search. In the following example, the list file is 5202 found.lst. It must be noted that inode search can fail to find any file and happens if the files on disk space are already removed. In that case, the file is no longer required and does not need to be included in the list. For more information on the list file, see eeadm recall
- Recall files and make them as resident. At this step, the user can split a file list into
multiple file lists with small number of files. Example of command to recall files in the list
created in Step.2
# cat ./5202_fail_filtered.lst | eeadm recall --resident # cat ./5202_found.lst | eeadm recall --resident (in the case that 5202_found.lst is created.)
- When the eeadm recall command completes without error, migrate the recalled
files back into the target pools so that there is space on disk storage again. Repeat this process
until all of the split list files are processed. When the eeadm recall command
fails for a list, skip the list and proceed to the next file lists until all of the list files are
processed.
When all list files are processed without error, proceed to Step 7 to remove the tape from the pool. Proceed to Step.5, if one or more list files failed with the eeadm recall command.
- Create file lists. Filter out following two failure code from the list; GLESR137E,
GLESR138E.
- Specify which files were failed by the eeadm task show -r command for the
selective recall subtask of the eeadm tape unassign command or the failed
selective recall task that is initiated by the eeadm recall command. As a
different attempt to read the file, use a file system command (for example, head, od, or hexdump).
An example of the output of the eeadm task show -r command for recall, including
failure.
Result Failure Code Failed time Node -- File name Fail GLESL255E 2022/05/18T14:07:02 8 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M24.bin Fail GLESL255E 2022/05/18T14:07:02 8 -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M25.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M26.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M27.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M28.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M29.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M30.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M31.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M32.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M33.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M34.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M35.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M36.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M37.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M38.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M39.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M40.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M41.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M42.bin Success - - - -- /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M43.bin
Example of a file system command to read the file.# hexdump -n 128 /mnt/gpfs/testdata/large-lto/dualmmm/20220418/d001/d09/d_1M24.bin 0000000 28d1 7005 6c3f e0e8 0385 23e5 2b78 bfd6 0000010 b672 f919 6fca e162 1600 7860 d93f 7660 0000020 a3e3 6d9b 1cf3 2c31 feaf f35a 3259 5d93 0000030 24c8 7c5e 4369 e0c0 f54b 7db4 42f8 3e1a 0000040 7243 585a 1468 412c c01a 48c4 4210 05ad 0000050 34da 91be 2c94 bc8f e96c f9ef c7d3 9e4d 0000060 547c e6d9 17d3 1e5c 08c2 b3e1 44a1 5f9e 0000070 414a ec6a 6d35 90b8 3dd5 2214 b54e 06ce 0000080 #
When there are many files in failure state, any script can help to reduce the work of recovery process.Note: When the command line of the migrate command that was used to migrate the file had two or three pools at -p option to migrate a file into two or three different pools, the attempt of read the file with a file system command succeed in most cases. When the migrate command was used to write a file into single pool, the command cannot read the file successfully. - When recall attempt on Step 5 succeeded, create a file list with the files (for example,
5202_recalled.lst) and runeeadm recall --resident command again. Example of
command to recall files in the list,
5202_recalled.lst.
# cat ./5202_recalled.lst | eeadm recall --resident
Here all of the files are in resident state. To remove the tape from the pool, proceed to Step 7. When recall attempt on Step 5 failed again, there is no more step to attempt in this procedure. To remove the tape in require_replace or need_replace state from the pool, need to delete the file that failed to recall from the tape. If it is acceptable to remove those file from the disk, remove them and proceed to Step 7. Otherwise, get support from IBM to see whether there is any other possibility to recover the file from the tape with failure.
- Run either of following commands to remove require_replace or
need_replace tape from the pool.
- Rerun eeadm tape replace command. For more information, see eeadm tape replace.
-
eeadm tape unassign command with --safe-unassign option. For more information, see eeadm tape unassign.