Managing Data Integrity

How to manage data integrity of files with IBM Storage Archive Enterprise Edition.

Starting from 1.3.3.0, IBM Storage Archive Enterprise Edition can detect file corruption by using the file hash value that is calculated by the MD5 algorithm. During the migration, it calculates the file hash value of the file contents and stores it in both the DMAPI attribute of the stub file on disk and the extended attribute on tape. It also validates data integrity during recall by comparing the stored value with the newly calculated value.
Note: This function is effective only for the files that are newly migrated after upgrading the IBM Storage Archive Enterprise Edition to 1.3.3.0 or later. 

The file hash process during the migration

IBM Storage Archive Enterprise Edition calculates the file hash value of the target file while it reads the file from the disk and transporting it to the tape drive. After transporting the entire file, it adds the calculated file hash value as an extended attribute named ltfs.hash.md5sum. For example,

[root@ltfsee_vm]/ltfs/VTAP15L5/.LTFSEE_DATA getfattr -d ./4965006092968729555-7288246275357059264-234252992-29383-0
# file: 4965006092968729555-7288246275357059264-234252992-29383-0
user.ibm.ltfsee.gpfs.objtype="regfile"
user.ibm.ltfsee.gpfs.path="/ibm/fs1/10mb_0"
user.ibm.ltfsee.objgen="38"
user.ltfs.hash.md5sum="26537607147eee48f569915794bb39b5"
After it writes the extended attribute for all the replicas, it adds the DMAPI attribute named IBMMD5. For example,

[root@ltfsee_vm]/ibm/fs1 mmlsattr -L -d 10mb_0 | grep IBMMD5
dmapi.IBMMD5:         "26537607147eee48f569915794bb39b5"

The file hash process during the recall

IBM Storage Archive Enterprise Edition calculates the file hash value of the target file while reading it from the tape and transporting it to the server. After transporting the entire file, it compares the file hash value on IBMMD5 to the file hash value on ltfs.hash.md5sum. If they differ, the recall fails. The fact that these file hash values are not identical indicates that incorrect data is read or written. The eeadm task show command for the task helps to identify that the recall was failed due to inconsistent file hash value. The following example shows Failed (Inconsistent file hash) as the Result Summary.

[root@ltfsee_vm]/ibm/fs1 eeadm task show 7378
=== Task Information ===
Task ID:              7378
Task Type:            transparent_recall
Command Parameters:   dsmrecalld
Status:               completed
Result:               failed
Accepted Time:        Thu Apr 14 23:37:07 2022 (-0400)
Started Time:         Thu Apr 14 23:37:07 2022 (-0400)
Completed Time:       Thu Apr 14 23:37:07 2022 (-0400)
Workload:             102400 bytes to process. (File name: /ibm/fs1/eatf/all_fail_case/all_fail_case0.bin, inode: 42546)
Progress:             -
Result Summary:       Failed (Inconsistent file hash)

Next action after a recall failed due to inconsistent file hash

If the failed recall is a selective recall and the files are migrated into some replicas, try transparent recall for the file. The transparent recall can recall the file from other replicas. If the transparent recall does not work, contact IBM support.

The options of eeadm cluster set to enable and disable the data integrity checking

There are two options that are named filehash_enable and filehash_verify_on_read for eeadm cluster set command. When filehash_enable value is set to "yes", IBM Storage® Archive Enterprise Edition calculates and stores the file hash value during migration. When filehash_verify_on_read value is set to yes, it uses and validates the file hash value during the recall.  A user can see the current settings by using eeadm cluster show command.  Both are enabled by default, and the user must disable them only if the performance of migration and recall is more important than the data integrity.

An example of a result of eeadm cluster show:

[root@ltfsee_vm]/ibm/fs1 eeadm cluster show | grep filehash
filehash_enable           yes
filehash_verify_on_read   yes