We recently migrated some "legacy" NFS content onto our GPFS clusters. Along with this migration we set up a daily rsync job to sync changes from the Production cluster to a DR cluster at another location. Since making these changes I've noticed a significant decrease in performance while the rsync process is running, in some cases causing NFS content to other clients to slow down. I am suspecting that the rsync process is trashing my cache, and in fact when I look at mmdiag --stats I see that I have 0 hits:
stats: inserts 119636775 steals 57796289 hits 0 expands 52009607 revokes 1180813 uses 47485999
I also see a lot of waiters appearing during the rsync processing with the following:
0x2B285C04D500 waiting 0.001814000 seconds, CleanFileThread: on ThMutex 0x18012748B10 (0xFFFFC20012748B10) (CacheReplacementListMutex)
We are on RHEL 5.9 64-bit w/GPFS v18.104.22.168. The MFTC value is 100K and using the default for MSC of MFTC*4, so 400K. There are 3 nodes in the cluster with the rsync process running on just one of them.
I've been reviewing the composition of the directory structures of this content. There are about 3.2 million files involved. I have found at least one single directory that contains over 500K files alone and I suspect this is what is contributing to the poor cache performance when the rsync process runs.
I'm considering breaking the rsync processing down into smaller chunks, but I think I probably need to look at some adjustments to my cache configuration as well given the number of files in a single directory exceeds my current cache size. I'm looking for any input or suggestions on ways to mitigate these effects.