Topic
  • 3 replies
  • Latest Post - ‏2014-06-02T16:49:02Z by marc_of_GPFS
oester
oester
171 Posts

Pinned topic Understanding "File Heat" distribution

‏2014-02-25T15:36:05Z |

It's easy to print out the "value" using a simple policy, but harder to understand your overall distribution. For example, this will list out all the files sorted by heat value:

 

define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END])
 
rule fh1 external list 'fh' exec ''
rule fh2 list 'fh' weight(FILE_HEAT) show('++' DISPLAY_NULL(FILE_HEAT) )

 

But what I really want is a sort of "bucket" distribution of the files. Upper 10%, next 10%, etc.. all the way down to zero. Is there a way to do this using a policy file? I couldn't find an expression to do what I want.

 

Thanks.

  • FredStockatIBM
    FredStockatIBM
    57 Posts

    Re: Understanding "File Heat" distribution

    ‏2014-02-27T02:44:13Z  

    A fellow developer suggested the following.

    Use LIST mmapplypolicy rule(s) to collect the file attribute values. To save as a file list use  the `-I defer -f /where-to-put-the-lists` options. 
    You may want to filter the lists with the `cut` command or `awk` or your favorite to make it acceptable for ingestion
    into your favorite spreadsheet, database, statistics package or whatever and generate the plots, histograms, reports.

     

  • oester
    oester
    171 Posts

    Re: Understanding "File Heat" distribution

    ‏2014-02-27T16:59:01Z  

    A fellow developer suggested the following.

    Use LIST mmapplypolicy rule(s) to collect the file attribute values. To save as a file list use  the `-I defer -f /where-to-put-the-lists` options. 
    You may want to filter the lists with the `cut` command or `awk` or your favorite to make it acceptable for ingestion
    into your favorite spreadsheet, database, statistics package or whatever and generate the plots, histograms, reports.

     

    Thanks - I was hoping for a more elegant solution. The file heat values range from zero to "some very large numbers"  - almost logarithmic. It would be nice if GPFS could tell me that % are the most active, rather than just using a policy to move files into a new storage pool until it fills up.

  • marc_of_GPFS
    marc_of_GPFS
    35 Posts

    Re: Understanding "File Heat" distribution

    ‏2014-06-02T16:49:02Z  
    • oester
    • ‏2014-02-27T16:59:01Z

    Thanks - I was hoping for a more elegant solution. The file heat values range from zero to "some very large numbers"  - almost logarithmic. It would be nice if GPFS could tell me that % are the most active, rather than just using a policy to move files into a new storage pool until it fills up.

    WEIGHT(FILE_HEAT) or WEIGHT(-FILE_HEAT) will sort the files that appear in the LIST.

    Besides SHOWing FILE_HEAT, you probably want to also SHOW(  ....  || varchar(KB_ALLOCATED) || ...  )

    Then a simple filter, script and/or report generator you can process the filelist from mmapplypolicy and tell/show/summarize what you're looking for.

    OR load the data into a spreadsheet, and "play"/plot it.  Of course if we're talking millions of files, you probably want to do some summary filter-reduction of the file list before loading it into a spread sheet.