IBM Storage Protect data storage model
When using IBM Storage Protect for backing up IBM Storage Scale file systems, the IBM Storage Protect data storage architecture and its implications need to be considered.
The file data that is the content stored in a file.
The file metadata that includes all attributes that are related to the file. For example:
Create, access, and modify times
Size of the file, size that is occupied in the file system, and number of blocks used
Inode information, owner user ID, owning group ID, and mode bits
POSIX rights or access control lists (ACLs)
Flags to indicate whether the file is immutable or mutable, read only, or append only
Extended attributes (EAs)
In IBM Storage Protect, the same file data and metadata is stored but the method of storing this data differs. The file content is stored in an IBM Storage Protect storage pool such as on disk or on tape while some of the metadata is stored in the IBM Storage Protect database. The primary reason for storing metadata in the IBM Storage Protect database is to provide fast access to information useful for backup requests.
However, not all metadata is stored in the IBM Storage Protect database. Access control lists (ACLs) and extended attributes (EAs) are stored with the file content in the storage pool (media depends on storage pool type). This has the following implications:
- When the ACL or EA of a file changes then the next backup job backs up the whole file again.
This occurs because the file content, ACL, and EA are stored together in the IBM Storage
Protect data pool, for example on tape and they need to be
updated as one entity.Note: ACLs are inherited in the IBM Storage Scale file system. Therefore, an ACL on a top-level directory object can be inherited to all the descendant objects. ACL changes to a top-level directory object are therefore propagated down through the object tree hierarchy, rippling the change through all objects that inherited the original ACL. The number of files to be backed up increases even though nothing else in these files has changed. A surprisingly large backup workload can be induced by a seemingly small change to an ACL to a top level directory object.
When the IBM Storage Protect for Space Management capability is enabled, an ACL change such as this occurs to objects that are currently migrated to offline storage as well. These files will then need to be recalled during the next backup cycle to enable the updated ACL to be stored with the file data once again in their IBM Storage Protect backup storage pool.
- Renaming of a file also leads to a backup of the whole file because the IBM Storage Protect database is indexed by file object path name.
You can use the following approaches to mitigate the size of a backup workload when widely inherited ACLs are likely to be changed frequently.
- Avoid renaming directories that are close to the file system root.
- Avoid ACL and EA changes in migrated files as much as possible.
- Consider using the skipacl or skipaclupdatecheck options of
the IBM
Storage Protect client. Important: Be certain to note the implications of using these options by referring to Clients options reference in the Backup-archive Client options and commands section of IBM Storage Protect documentation.Note: Using the skipacl option also omits EAs from the backup data store in the IBM Storage Protect backup pool. Using this option can be considered when static ACL structures are used that can be reestablished through another tool or operation external to the IBM Storage Protect restore operation. If you are using this approach, ensure that the ACL is restored either manually or automatically, by inheritance, to avoid an unauthorized user getting access to a file or a directory after it is restored.