3 replies Latest Post - ‏2014-06-02T16:49:02Z by marc_of_GPFS
66 Posts

Pinned topic Understanding "File Heat" distribution

‏2014-02-25T15:36:05Z |

It's easy to print out the "value" using a simple policy, but harder to understand your overall distribution. For example, this will list out all the files sorted by heat value:


define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END])
rule fh1 external list 'fh' exec ''
rule fh2 list 'fh' weight(FILE_HEAT) show('++' DISPLAY_NULL(FILE_HEAT) )


But what I really want is a sort of "bucket" distribution of the files. Upper 10%, next 10%, etc.. all the way down to zero. Is there a way to do this using a policy file? I couldn't find an expression to do what I want.



  • FredStockatIBM
    42 Posts

    Re: Understanding "File Heat" distribution

    ‏2014-02-27T02:44:13Z  in response to oester

    A fellow developer suggested the following.

    Use LIST mmapplypolicy rule(s) to collect the file attribute values. To save as a file list use  the `-I defer -f /where-to-put-the-lists` options. 
    You may want to filter the lists with the `cut` command or `awk` or your favorite to make it acceptable for ingestion
    into your favorite spreadsheet, database, statistics package or whatever and generate the plots, histograms, reports.


    • oester
      66 Posts

      Re: Understanding "File Heat" distribution

      ‏2014-02-27T16:59:01Z  in response to FredStockatIBM

      Thanks - I was hoping for a more elegant solution. The file heat values range from zero to "some very large numbers"  - almost logarithmic. It would be nice if GPFS could tell me that % are the most active, rather than just using a policy to move files into a new storage pool until it fills up.

      • marc_of_GPFS
        33 Posts

        Re: Understanding "File Heat" distribution

        ‏2014-06-02T16:49:02Z  in response to oester

        WEIGHT(FILE_HEAT) or WEIGHT(-FILE_HEAT) will sort the files that appear in the LIST.

        Besides SHOWing FILE_HEAT, you probably want to also SHOW(  ....  || varchar(KB_ALLOCATED) || ...  )

        Then a simple filter, script and/or report generator you can process the filelist from mmapplypolicy and tell/show/summarize what you're looking for.

        OR load the data into a spreadsheet, and "play"/plot it.  Of course if we're talking millions of files, you probably want to do some summary filter-reduction of the file list before loading it into a spread sheet.