I'm trying to optimize my policy rules. The rules will be used by mmbackup to distribute all files of the filesystem across different TSM servers. Some of the old rules were of the form:
RULE EXTERNAL LIST ... RULE 'BackupRule' LIST ... WHERE ( ( '/GPFS/dir/0003' =SUBSTR(PATH_NAME,1,LENGTH( '/GPFS/dir/0003'))) OR ( '/GPFS/dir/0005' =SUBSTR(PATH_NAME,1,LENGTH( '/GPFS/dir/0005'))) OR ( '/GPFS/dir/0008' =SUBSTR(PATH_NAME,1,LENGTH( '/GPFS/dir/0008'))) OR ( '/GPFS/dir/0007' =SUBSTR(PATH_NAME,1,LENGTH( '/GPFS/dir/0007'))) OR ( '/GPFS/dir/0009' =SUBSTR(PATH_NAME,1,LENGTH( '/GPFS/dir/0009'))) OR ( '/GPFS/dir/0006' =SUBSTR(PATH_NAME,1,LENGTH( '/GPFS/dir/0006'))) OR ( '/GPFS/dir/0004' =SUBSTR(PATH_NAME,1,LENGTH( '/GPFS/dir/0004'))) )
Starting from GPFS 3.5 it is possible to use the REGEX function which would make this more elegant:
This new form is obviously shorter and easier to read, but since we have nearly 200 million files I'd like to know if this is faster/slower.
I tried running mmapplypolicy with several versions of my policy rules file on subtrees of 4M and 16M files, but these results are not conclusive. There just seems to be too much variation and the sort happening during the inodeScan phase seems to weigh too much for these subtrees.
When I tried the version, that was fastest during the tests on the subtrees, on the complete filesystem it was almost 2.5 times slower than what I had before. Running mmapplypolicy on the complete filesystem for every iteration of the optimisation process would take too much time (a single 'mmapplypolicy -I test' runs for 5 hours).
Any tips on writing efficient policy rules or techniques to tune them are welcome. Is it possible to isolate the policy evaluation from the directory scan? I used '-d 07' to get timing information, but this is not isolating the sort from the policy evaluation itself. Is there a more accurate way?