Topic
1 reply Latest Post - ‏2013-09-23T15:52:27Z by dlmcnabb
Ponch
Ponch
6 Posts
ACCEPTED ANSWER

Pinned topic MMDIAG Indicates Thrasing

‏2013-09-21T01:22:43Z |

Investigating long locks around our cluster, I noticed this and was hoping for some insight.  It is curious that one of our file system managers are called out for thrashing.  Is this something to be considered?  Also, can some context around this message be provided?

 

[user@currentclustermgr ~]# mmdiag --tokenmgr
 
=== mmdiag: tokenmgr ===
  Token Domain nawest1
    There is 1 active token server in this domain:
      8.254.143.XXX
  Token Domain nawest2
    There is 1 active token server in this domain:
      8.254.140.XXX
 
    Server stats: requests 2509316549 ServerSideRevokes 40871
           nTokens 128828 nranges 131993
           designated mnode appointed 3625 mnode thrashing detected 3625
  Token Domain internal1
    There is 1 active token server in this domain:
      8.254.139.XXX
  Token Domain naeast1
    There is 1 active token server in this domain:
      8.254.136.XXX
  Token Domain tmp-tus
    There is 1 active token server in this domain:
      8.254.143.XXX
  Token Domain tmp-wdc
    There is 1 active token server in this domain:
      8.254.136.XXX
 
  • dlmcnabb
    dlmcnabb
    1012 Posts
    ACCEPTED ANSWER

    Re: MMDIAG Indicates Thrasing

    ‏2013-09-23T15:52:27Z  in response to Ponch

    "designated mnode appointed 3625 mnode thrashing detected 3625"

    "mnode thrashing" is a detection by the token manager that many nodes are trying to be the metanode for the same file, but it decides that things would be much better if the token manager became the "designated mnode". This can happen in instances where many nodes are trying to create a file in the same directory or many nodes trying to open/create the same file all at the same time. When each node first looked up the same file/directory, they did not find an existing entry, so each tried to become the mnode in order to do the create.

    The token manager tries to prevent thrashing the directory or file metadata through the disk by doing this takeover. Thus all metadata reads and updates will flow through the network instead of through disk.

    The only way to mitigate this is to have the application be more "distributed file system" aware and let one node do massive file creates or mkdir operations, before letting all the other nodes open the file or directory they need in the common directory.