Topic
  • 3 replies
  • Latest Post - ‏2012-05-04T11:06:18Z by SystemAdmin
pfo
pfo
23 Posts

Pinned topic Killing mmapplypolicy doesn't stop migration

‏2012-05-02T01:37:52Z |
I'm having an incident regarding a killed mmapplypolicy that was running on `-N` all and a globally reachable path for `-g`. After sending a SIGTERM to the {mmapplypolicy,tsapolicy,mhelp-apolicy} the tsapolicy daemons become zombies, while the rest went away. Apparently the migrations - only pool to pool - are still happening and I can't stop them without forcing that by issuing `mmshutdown` or reboot all nodes with the mounted fs. The reason for stopping was that mmapplypolicy was reporting a predicted negative disk space usage on a pool, but I noticed that when it was already running. Should I be worried or will a `mmrestripefs` fix things? The system is running Build branch "3.3.0.16 " on SLES 11SP0.
Updated on 2012-05-04T11:06:18Z at 2012-05-04T11:06:18Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: Killing mmapplypolicy doesn't stop migration

    ‏2012-05-02T19:53:04Z  
    Firstly, you need to know how to kill mmapplypolicy, according to the latest documentation:
    Note: To terminate mmapplypolicy, use the kill command to send a SIGTERM signal to the process
    group running mmapplypolicy.
    For example, on Linux if you wanted to terminate mmapplypolicy on a process group whose ID is 3813,
    you would enter the following:
    kill -s SIGTERM -- -3813

    That kills the mmapplypolicy process and all of it's children. But may leave some mhelp-apolicy processes on other nodes -- those will eventually time-out and exit when they notice that the originating tsapolicy process has broken its tcp connections.
    There's no harm in doing the killing some other way, but you be left with some zombies that aren't reaped, particularly on AIX which does not reap zombies the same way as Linux.

    I suppose it is possible that some data migrations that started running before the kill, may continue running in the GPFS daemon process. There may be no way to stop those short of mmshutdown, which I do not recommend. Rather let them continue... Eventually all migrations should complete.

    Predicted negative pool disk utilization is not necessarily a problem, but it could be an indication that some files have "illplaced" file data blocks.

    +And yes, mmrestripefs, can be used to "fix" (ill-placed) blocks within files.
  • pfo
    pfo
    23 Posts

    Re: Killing mmapplypolicy doesn't stop migration

    ‏2012-05-03T21:22:04Z  
    Firstly, you need to know how to kill mmapplypolicy, according to the latest documentation:
    Note: To terminate mmapplypolicy, use the kill command to send a SIGTERM signal to the process
    group running mmapplypolicy.
    For example, on Linux if you wanted to terminate mmapplypolicy on a process group whose ID is 3813,
    you would enter the following:
    kill -s SIGTERM -- -3813

    That kills the mmapplypolicy process and all of it's children. But may leave some mhelp-apolicy processes on other nodes -- those will eventually time-out and exit when they notice that the originating tsapolicy process has broken its tcp connections.
    There's no harm in doing the killing some other way, but you be left with some zombies that aren't reaped, particularly on AIX which does not reap zombies the same way as Linux.

    I suppose it is possible that some data migrations that started running before the kill, may continue running in the GPFS daemon process. There may be no way to stop those short of mmshutdown, which I do not recommend. Rather let them continue... Eventually all migrations should complete.

    Predicted negative pool disk utilization is not necessarily a problem, but it could be an indication that some files have "illplaced" file data blocks.

    +And yes, mmrestripefs, can be used to "fix" (ill-placed) blocks within files.
    Thanks marc, you guys are really awesome. I guess the migrations were really taking this long since they were moving largish files!
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: Killing mmapplypolicy doesn't stop migration

    ‏2012-05-04T11:06:18Z  
    • pfo
    • ‏2012-05-03T21:22:04Z
    Thanks marc, you guys are really awesome. I guess the migrations were really taking this long since they were moving largish files!
    We developed a PTF for 3.4 a while ago that divides any long data migrations done by mmchattr or mmapplypolicy into a series of shorter task/calls into the GPFS kernel and daemon. This was done primarily to avoid blocking commands like mmdelsnapshot, but should also facilitate killing long migrations more quickly. I don't recall the PTF number, but it was probably delivered in a PTF for 3.4 that was shipped on or after July 2011.