today we tried to delete files using a GPFS policy.
The filesystem has 268Mio preallocated indodes with 189Mio used. The FS is defined in cluster of 72 NSD servers serving this to a several thousand clients.
While the scan phase of our delete policy job completed nicely and fast (running on all 72 NSD servers, it took 4 minutes to scan the FS and about 10 more minutes to identify the candidates for deletion), the actual delete of the 79Mio candidates progressed at a mere (250..300) per second.
This appears much to little to me, it means aech NSD server deletes not more than 4 files per second on average.
We cancelled that job which was planned in preparation of a mmsrestripefs -b, the restripe is just running now and treats about 4500 inodes per second.
We suspect that the file deletion of the tsapolicy is done uncoordinated between the participating nodes so that they could compete for access to directories and thus block each other, but that is just a guess. How is that process actually done? I suppose the inode space is divided in equal buckets (here: http://0..(268Mio/72)-1], 268Mio/72..2*(268Mio/72)-1, ..., and each of the nodes works one of those buckets for data scan. However, when it comes to deletion -- would each node still stick to "its" bucket, or would these deletion jobs be redistributed according to the directory structure so that not two or more nodes would interfere?
This topic has been locked.
7 replies Latest Post - 2013-03-26T10:08:56Z by ufa
Pinned topic low performance of policy-controlled deletion.
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2013-03-26T10:08:56Z at 2013-03-26T10:08:56Z by ufa
SystemAdmin 110000D4XK2092 PostsACCEPTED ANSWER
Re: low performance of policy-controlled deletion.2013-03-08T23:21:23Z in response to ufaYour hunch about directory lock thrashing is a good one. But there is also the matter of updating the metadata that tracks the free disk space.
Off hand, I don't know which one "hurts more" ;-(
You can use the mmapplypolicy "-I prepare -f /filelist-dir" options to stop the policy command and output the list of files to be deleted.
You can sort the filelist by pathname and then restart the policy command with the sorted filelist using the -r /filelist-dir/some-filelist-name.
And see how much that helps.
In future we may provide an easier way for you to cluster deletes together by pathname instead of by WEIGHT and/or inode number.
I know you are also working this issue through IBM support since they contacted me earlier today.
They may have some other suggestions for you. Among which... if metadata scans and operations are bogging you down, buy some SSD storage and migrate your GPFS metadata to the SSD volumes/LUNs. That can help a lot!
Re: low performance of policy-controlled deletion.2013-03-09T21:43:30Z in response to SystemAdminMarc,
I guess we'll try to investigate that in future.
And, AFAIK, there is another German customer actually suffering from slow scans -- they contacted support (A.R., who then talked to you) -- we are another case (not completely, but still different :-)
We already considered splitting the task into scanning done by all nodes and just saving the file list, and start the delete on the generated candidates list our own way , lets see if that makes a difference.
BTW: while the delete-by-policy would have taken until Monday, our restripe finished in about 12 hours (64 pitWorkers)...
Re: low performance of policy-controlled deletion.2013-03-12T17:32:39Z in response to ufaHI, to let you know:
a colleague of mine checked out the deferred way by manually assigning the found candidates to the NSD servers. The deletion rate was up to more than 12 Mio per hour (i.e. by more than a factor of 10 faster). It is nice to hear that there are plans to improve the system here. To support this we are going to have a PMR opened (on request of our customer).
SystemAdmin 110000D4XK2092 PostsACCEPTED ANSWER
Re: low performance of policy-controlled deletion.2013-03-13T10:55:54Z in response to ufaYes, go ahead and make a formal request for IBM to examine and hopefully improve this area.
In the meanwhile, I am re-visiting this, "on the side". To tell you the truth, heretofore we have not gotten serious negative feedback on mmapplypolicy DELETE performance, so I haven't looked at this area since the early days of what is now `mmapplypolicy`. As I indicated above, we have to try to see how much the mmapplypolicy command can do, and how much is in the underlying implementation of unlink(2), which is what the policy command ultimately invokes on each pathname to be deleted.
Re: low performance of policy-controlled deletion.2013-03-26T10:08:56Z in response to SystemAdminThanks ,
I suppose the issue becomes more serious the more nodes you use to run the policy. Thus, if comparable runs are just done on a few nodes, one would not detect it. I admit what we did meant a heavy load for the tool (big FS, relatively high number of inodes, high number of candidates), but there's always room for improvement :-) and the scenario might be found more often in future that a higher number of nodes is used to run such a policy and a high number of candidates is targeted.
HajoEhlers 0100001U0A251 PostsACCEPTED ANSWER
Re: low performance of policy-controlled deletion.2013-03-13T08:17:30Z in response to SystemAdminIn our environment i use sometime a filesets as a target for temporary files. Just an unlink and all files are gone.
To the OP problems of deleting files: I would suggest that GPFS has a hidden area in every fileset where directories and files could be MOVED (within the same fileset ) via a policy and then deleted by an GPFS process like for the deletion of a fileset. Thus a "deletion" is a two stage process.
1) Move the dir/file out of sight.
2) Free the space in the background - I would assume that the same logic as for a fileset could work.
This would mean that the deletion of a directory tree with millions of files is done within a second.
Of course some logic could be used to find the highest directory entry which shell be deleted and move this one and/or have a counter in the hidden area which allows to move(delete) directories with the same name.
Example: dir3 shell be deleted from the following directory structures.
I assume that the same logic would help in case where only files will be deleted.
In case the OP has time for some testing. Why not check if the "move" of a file/directory is faster then a "delete" ?
If it is then you could apply the above logic on a standard GPFS.
Re: low performance of policy-controlled deletion.2013-03-26T10:01:47Z in response to HajoEhlersHi,
thanks for your suggestions. But if the problem is the concurrent access to directories from multiple NSD servers, the move would be slow in our case as it is also a write to the directories those files to be moved are in.
Mind: we are not deleting entire directory trees or subtrees but individual (plain) files which is controlled by their time of last access.
And: we were at the time we faced the issue not interested to merely get the files "out of sight" but to really free up the blocks.