Snapshot support
In native HDFS, it can create snapshot against one directory. IBM Storage Scale supports two kinds of snapshot: file system snapshot (global snapshot) and independent fileset snapshot.
Before HDFS Transparency 2.7.3-1, HDFS Transparency implemented the snapshot from the Hadoop interface as a global snapshot and creating snapshot from a remote mounted file system was not supported.
HDFS Transparency 2.7.3-2 and later supports creating snapshot from a remote mounted file system.
The snapshot interface from the Hadoop shell is as follows:
hadoop dfs -createSnapshot /path/to/directory <snapshotname>
For the /path/to/directory, HDFS Transparency checks the parent directories from right to left. If there is one directory linked with one IBM Storage Scale fileset (check the column “Path” from the output of mmlsfileset <fs-name>) and if the fileset is an independent fileset, the mmlsfileset command creates the snapshot against the independent fileset. For example, if /path/to/directory is linked with fileset1 and fileset1 is an independent fileset, the above command creates snapshot against fileset1. If not, Transparency checks /path/to, then checks /path followed by / (which is /gpfs.mnt.dir/gpfs.data.dir from IBM Storage Scale file system). If Transparency cannot find any independent fileset linked with the above path /path/to/directory, Transparency creates the <snapshotname> against the fileset root in IBM Storage Scale file system.
- Do not create a snapshot frequently (For example, do not create more than one snapshot every hour) because creating a snapshot holds on all on-fly IO operations. One independent fileset on IBM Storage Scale file system supports only 256 snapshots. When you delete a snapshot, it is better to remove the snapshot from the oldest snapshot to the latest snapshot.
- On IBM Storage Scale level, only the root user
and the owner of the linked directory of independent fileset can create snapshot for IBM Storage Scale fileset. On HDFS interface from HDFS
Transparency, only super group users (all users belong to the groups defined by
gpfs.supergroup in
/usr/lpp/mmfs/hadoop/etc/hadoop/gpfs-site.xml and
dfs.permissions.superusergroup in
/usr/lpp/mmfs/hadoop/etc/hadoop/hdfs-site.xml) and the owner of directory can
create snapshot against the /path/to/directory.
For example, if the userA is the owner of /path/to/directory and /path/to/directory is the linked directory of one independent fileset or /path/to/directory is the child directory under the linked directory of one independent fileset, userA can create the snapshot against /path/to/directory.
- Currently, Transparency caches all fileset information when Transparency is started. After Transparency is started, newly created filesets will not be detected automatically. You need to run /usr/lpp/mmfs/hadoop/bin/gpfs dfsadmin -refresh hdfs://<namenode-hostname>:8020 refreshGPFSConfig to refresh the newly created filesets or you can restart the HDFS Transparency.
- Do not take nested fileset, such as /gpfs/dependent_fileset1/independent_fileset1/dependent_fileset2/independent_fileset2. Transparency creates the snapshot against the first independent fileset by checking the path from right to left. Also, the snapshots for independent filesets are independent. For example, the snapshot of independent fileset1 has no relationship with any other independent fileset.
- hadoop fs -renameSnapshot is not supported.
- Do not run hadoop dfs -createSnapshot or Hadoop dfs
-deleteSnapshot under the .snapshots directory that is located in IBM Storage Scale file system.
Otherwise, error such as Could not determine current working directory
occurs.For example,
[root@dn01 .snapshots]# hadoop fs -deleteSnapshot / snap1 Error occurred during initialization of VM java.lang.Error: Properties init: Could not determine current working directory. at java.lang.System.initProperties(Native Method) at java.lang.System.initializeSystemClass(System.java:1166)
- HDFS Transparency does not need to run hdfs dfsadmin --allowSnapshot or hdfs dfsadmin -disallowSnapshot commands.
- Snapshot is supported similarly for multiple IBM Storage Scale file systems.
- Snapshot for remote mounted file system is not supported if gpfs.remotecluster.autorefresh (/usr/lpp/mmfs/hadoop/etc/hadoop/gpfs-site.xml) is configured as false. By default, it is true.
- HDFS Transparency supports only the Hadoop snapshot create and delete functions. Hadoop snapshot list command will only list the path of the snapshot directory name and not the snapshot contents because the snapshots are created by the IBM Storage Scale snapshot commands and are stored in the Scale snapshot root directory where the Hadoop environment does not have access.
- To perform snapshot restores, see the following topics under the Administering
section in IBM® Storage Scale documentation:
- Restoring a file system from a snapshot topic under the Creating and maintaining snapshots of file systems section
- Restoring a subset of files or directories from a local file system snapshot topic under the Managing file systems section
- Restoring a subset of files or directories from local snapshots using the sample script topic under the Managing file systems section