Topic
14 replies Latest Post - ‏2013-02-26T11:55:33Z by PaulDW
PaulDW
PaulDW
25 Posts
ACCEPTED ANSWER

Pinned topic How to opimize policy rules

‏2013-02-08T14:19:40Z |
Hi,

I'm trying to optimize my policy rules. The rules will be used by mmbackup to distribute all files of the filesystem across different TSM servers. Some of the old rules were of the form:

RULE EXTERNAL LIST ... RULE 
'BackupRule' LIST ... WHERE ( (
'/GPFS/dir/0003' =SUBSTR(PATH_NAME,1,LENGTH(
'/GPFS/dir/0003')))    OR (
'/GPFS/dir/0005' =SUBSTR(PATH_NAME,1,LENGTH(
'/GPFS/dir/0005')))    OR (
'/GPFS/dir/0008' =SUBSTR(PATH_NAME,1,LENGTH(
'/GPFS/dir/0008')))    OR (
'/GPFS/dir/0007' =SUBSTR(PATH_NAME,1,LENGTH(
'/GPFS/dir/0007')))    OR (
'/GPFS/dir/0009' =SUBSTR(PATH_NAME,1,LENGTH(
'/GPFS/dir/0009')))    OR (
'/GPFS/dir/0006' =SUBSTR(PATH_NAME,1,LENGTH(
'/GPFS/dir/0006')))    OR (
'/GPFS/dir/0004' =SUBSTR(PATH_NAME,1,LENGTH(
'/GPFS/dir/0004'))) )

Starting from GPFS 3.5 it is possible to use the REGEX function which would make this more elegant:

REGEX(PATH_NAME,[
'^/GPFS/dir/000[3-9]'])

This new form is obviously shorter and easier to read, but since we have nearly 200 million files I'd like to know if this is faster/slower.
I tried running mmapplypolicy with several versions of my policy rules file on subtrees of 4M and 16M files, but these results are not conclusive. There just seems to be too much variation and the sort happening during the inodeScan phase seems to weigh too much for these subtrees.
When I tried the version, that was fastest during the tests on the subtrees, on the complete filesystem it was almost 2.5 times slower than what I had before. Running mmapplypolicy on the complete filesystem for every iteration of the optimisation process would take too much time (a single 'mmapplypolicy -I test' runs for 5 hours).

Any tips on writing efficient policy rules or techniques to tune them are welcome. Is it possible to isolate the policy evaluation from the directory scan? I used '-d 07' to get timing information, but this is not isolating the sort from the policy evaluation itself. Is there a more accurate way?

Thanks,
Paul.
Updated on 2013-02-26T11:55:33Z at 2013-02-26T11:55:33Z by PaulDW
  • SystemAdmin
    SystemAdmin
    2092 Posts
    ACCEPTED ANSWER

    Re: How to opimize policy rules

    ‏2013-02-08T15:12:00Z  in response to PaulDW
    Paul,

    Thanks for your question! It's good to know that you are keeping up with features we're adding to GPFS/policy and that you find them useful or interesting!

    As you've noticed, the "noise" or variance in the other computations and IO going on during a policy scan can overwhelm minor tweaks to the rules proper.

    To get an idea of how much a particular rule "costs" I would make two policy files:

    In the first, just one LIST...WHERE rule

    RULE 'x' LIST 'x' WHERE (expression-I-want-to-test)
    AND USER_ID= -1234 /* this condition should always fail but SQL prepare phase does not know that */

    In the second, 101 copies of the same rule, that is:
    RULE 'x' LIST 'x' WHERE ...
    RULE 'x' LIST 'x' WHERE ...
    /* 101 copies of a test rule with the final failure condition
    we want each rule to fail so that all of the rules are evaluated and no output records are written
    ... */
    RULE 'x' LIST 'x' WHERE ...

    This is an old trick you might recognize, but let me keep going with tips for policy tests...

    As you've noted, -d 02 will give you timing for each phase.
    The difference between running 1 rule and 101 rules against the same set of files, will let you know the incremental cost of the rules themselves.

    To do this faster, more accurately and cut out much of the noise:

    Use the `-i experts-file-list` option to skip the directory phase. and the `-I test` option to skip the execution phase.

    In your experts-file-list you can have just 10000 copies of the same record

    inodes-number:0:pathlength!path_name

    I'm pretty sure that will effectively keep just about everything in cache during the policy evaluation phase,
    there should be almost no disk activity!

    Use the simple, serial form of the command, and use the "internal" form to be sure that -N is not provided for you from defaultHelperNodes config param.
    That is:

    tsapolicy fsname -i experts-file-list -P policy-rules-file -d 02 -L 1 -I test

    Run this several times with the one rule file and several times with the 101 rule file, and let us know how you make out!
    • PaulDW
      PaulDW
      25 Posts
      ACCEPTED ANSWER

      Re: How to opimize policy rules

      ‏2013-02-08T15:39:15Z  in response to SystemAdmin
      Thanks, Marc, for this quick response!
      I'll give it a go and let you know how it worked out.

      A quick unrelated question you must know the answer to: I use the MM_SORT_CMD="LC_ALL=C /usr/linux/bin/sort" variable to use the gnusort in stead of the standard AIX sort, but while looking at the '-d 07' debug output I noticed that only the node executing the mmapplypolicy command picked-up the gnusort. The other nodes were still using AIX-sort. How can I avoid this?
      • PaulDW
        PaulDW
        25 Posts
        ACCEPTED ANSWER

        Re: How to opimize policy rules

        ‏2013-02-11T15:54:29Z  in response to PaulDW
        FYI, I solved my unrelated question by using the "-E" option of mmstartup on the other nodes
      • SystemAdmin
        SystemAdmin
        2092 Posts
        ACCEPTED ANSWER

        Re: How to opimize policy scans - gnusort

        ‏2013-02-11T17:08:35Z  in response to PaulDW
        BTW - If mmapplypolicy is burning a lot of time in sort on AIX systems, gnusort may well be worth a try.

        It is semi-officially supported in IBM' "AIX Toolbox for Linux Applications" http://www-03.ibm.com/systems/power/software/aix/linux/index.html,
        buried in the coreutils package.
        Gnusort generally will "win" on very large files because it makes much more effective use of (virtual) memory, and can
        greatly reduce the number of "external" merge passes the sort requires.

        To see this in action, try sorting a 10 million of more 100 bytes records, look at the /tmp directory as the sort is running!

        Paul can comment on how much it helped in his situation.
        • PaulDW
          PaulDW
          25 Posts
          ACCEPTED ANSWER

          Re: How to optimize policy scans - gnusort

          ‏2013-02-15T11:00:13Z  in response to SystemAdmin
          I agree with Marc that gnusort is well worth a try. I have done some tests more than a year ago and depending on the filesize the difference can be big. For example: sorting a large number of large temporary policy files took the AIX sort 1hr 3 mins. The gnusort completed the same job in only 28 mins. For smaller files the AIX sort is comparable, if not faster.
          In our case the sorting activity is the most timeconsuming part of mmapplypolicy after the directory scan. So, it does matter.

          There exist more recent gnusort versions that can do parallel sorts and this is even faster if you have enough cpus, but even after trying several versions I could not get it to work in the context of mmapplypolicy. It just hung at some point. I gave up further attempts, but if someone knows how to get it to work on AIX, I'm interested.

          In my previous post I said I was able to get mmapplypolicy to use the gnusort on the remote nodes by using the "-E" option of mmapplypolicy, but I spoke too fast. I had used
          
          mmstartup -E MM_SORT_CMD=
          "LC_ALL=C /usr/linux/bin/sort"
          

          Even though I used double quotes, the final MM_SORT_CMD environment variable contained only the first part "LC_ALL=C". I tried several combinations of single quotes, double quotes, backslashes, but none of them worked.
          Then I added the MM_SORT_CMD variable definition to /etc/environment and rebooted the server hoping mmfsd64 would finally pick up the correct value, but then it contained just the second part "/usr/linux/bin/sort". Does anyone have more ideas?

          Thanks,
          Paul.
          • truongv
            truongv
            68 Posts
            ACCEPTED ANSWER

            Re: How to optimize policy scans - gnusort

            ‏2013-02-15T12:59:38Z  in response to PaulDW
            A white space in the environment string will not work. I think you have two options here:

            1) write a wrapper script. Something like /usr/local/bin/mysort:
            
            #/bin/ksh LC_ALL=C /usr/linux/bin/sort 
            "$@"   mmstartup -E MM_SORT_CMD=/usr/local/bin/mysort
            


            2) Pass both environment variables to mmstartup:
            
            mmstartup -E LC_ALL=C -E MM_SORT_CMD=/usr/linux/bin/sort
            


            The side affect of #2 is that every command will run under LC_ALL=C
            Once you have it working, you may want to set up so that GPFS picks up your environment variables every time without the use of -E.
            mmchconfig envVar="VAR1 VALUE1 VAR2 VALUE2 ..."
            
            eg: mmchconfig envVar=
            "LC_ALL C MM_SORT_CMD /usr/linux/bin/sort"
            
            • PaulDW
              PaulDW
              25 Posts
              ACCEPTED ANSWER

              Re: How to optimize policy scans - gnusort

              ‏2013-02-15T13:21:19Z  in response to truongv
              Thanks truongv for your quick and interesting reponse.

              I like your second option best. (I'm a little reluctant to use a wrapper script as I don't know how GPFS will use the sort and what the side effects will be. It might for example sort something using a pipe.)
              Do you have an idea how this "LC_ALL=C" could affect GPFS?
              • truongv
                truongv
                68 Posts
                ACCEPTED ANSWER

                Re: How to optimize policy scans - gnusort

                ‏2013-02-15T13:42:43Z  in response to PaulDW
                You are probably ok with that as GPFS will retain LC_ALL environment variable. If you export LC_ALL in your environment on every node, you don't need to pass that to mmstartup again. GPFS will keep LC_ALL.

                So this is your 3rd option:
                
                export LC_ALL=C mmstartup -E MM_SORT_CMD=/usr/linux/bin/sort
                
                • PaulDW
                  PaulDW
                  25 Posts
                  ACCEPTED ANSWER

                  Re: How to optimize policy scans - gnusort

                  ‏2013-02-15T14:16:03Z  in response to truongv
                  Thanks truongv, I'm gonna try option 2 in the non-production cluster first
              • SystemAdmin
                SystemAdmin
                2092 Posts
                ACCEPTED ANSWER

                Re: How to optimize policy scans - gnusort

                ‏2013-02-15T19:55:11Z  in response to PaulDW
                FYI,

                • The sort command is run in a popen() style pipe within mmapplypolicy. Still wrapping should be okay (if you do it right!)

                • The LC_ALL=C is there because we found that using any other collating sequence can slow the sort down considerably. But the environment on some (many?) systems does specify a different collating sequence.

                • We are aware that sorting can bog down mmapplypolicy. In the current release there is a --choice-algorithm=fast option which will help in some situations, except you lose the default precise WEIGHT THEN INODE sorting going into the execution phase. Something that some uses, notably mmbackup depend upon.
                So the research agenda includes a parallelized sort, that should work "on top of" the standard sort utilities.
                • PaulDW
                  PaulDW
                  25 Posts
                  ACCEPTED ANSWER

                  Re: How to optimize policy scans - gnusort

                  ‏2013-02-25T15:33:35Z  in response to SystemAdmin
                  Hi Marc,

                  With the solution to the sort in place (truongv's 3rd option) I started experimenting with your suggestions to tune the policy rules.

                  I didn't want to use a large number of copies of identical records as this might influence the result. For example, I'd like to figure out if it would be better to write
                  
                  WHERE (PATHNAME LIKE 
                  '/dirA/%') OR (PATHNAME LIKE 
                  '/dirB/%')
                  

                  than to write
                  
                  WHERE (PATHNAME LIKE 
                  '/dirB/%') OR (PATHNAME LIKE 
                  '/dirA/%')
                  

                  Depending on the number of files in dirA and dirB one or the other may be faster. If I would test with 10K identical records in the experts-file-list, it could point me in the wrong direction. This is why I created an "experts-file-list" containing files in the filesystem.

                  I ran the tsapolicy command with this complete filelist (188M files) and it ran for more than 22 hours (even using 4 nodes). Not really a practical amount of time. Even with a subset of the full file list, these tests take rather long. When mmapplypolicy runs as part of mmbackup, the inodescan takes between 15 and 30 minutes depending on the policy rules. There clearly isn't the same parallelism or something else is different... or I'm not doing it right.

                  Besides that, my first tests seem to indicate that some new rules file is faster than the old, but when using it in mmbackup, the inodescan phase takes twice as long.

                  Isn't there a way to let mmapplypolicy do the same thing as in mmbackup, but skipping the directory scan. Suppose I save the temporary files, can't I just execute the inodescan?

                  Thanks,
                  Paul.
                  • SystemAdmin
                    SystemAdmin
                    2092 Posts
                    ACCEPTED ANSWER

                    Re: How to choose the most economical/fastest SQL exprs for GPFS policy

                    ‏2013-02-25T19:23:15Z  in response to PaulDW
                    Seems we are not understanding one another. I thought you were interested in discovering the relative speeds of different SQL expressions.
                    For that purpose, there is no need to test more than a minute or three at a time, no reason to run different files, no reason to run in parallel on multiple nodes.

                    The SQL interpreter we use in policy is very straightforward. There is a prepare (think simple compiler) phase that does constant propagation and a few other optimizations.
                    • SystemAdmin
                      SystemAdmin
                      2092 Posts
                      ACCEPTED ANSWER

                      Re: How to choose the most economical/fastest SQL exprs for GPFS policy

                      ‏2013-02-25T19:31:45Z  in response to SystemAdmin
                      Your guess is correct that OR does a shortcut if the left part is true. But the does not mean you have to test with millions of files for many minutes to see the difference. In your dirA vs dirB example, just make up an "expert list" with a few thousand copies of a ...dirA... pathname.
                      Run the SQL dirA OR dirB, then run dirB or dirA and see the difference... use the repeat the rules with USERID = -1234 trick to force the SQL interpreter to evaluate multiple times for each record loaded. Again, the point it to measure the SQL interpreter. Otherwise your measurments are likely to be overwhelmed by the noise/jitter/variance in IO.
                      • PaulDW
                        PaulDW
                        25 Posts
                        ACCEPTED ANSWER

                        Re: How to choose the most economical/fastest SQL exprs for GPFS policy

                        ‏2013-02-26T11:55:33Z  in response to SystemAdmin
                        Marc,

                        In the end the goal is to write an optimal rules file that will be used by mmbackup. My most important criteria are speed and simplicity. Simplicity, because I want to avoid that the complexity of the rules would result in mistakes that could lead to files not being backed up or being backed up by the wrong TSM server.

                        The rules of my first post in this thread are a good example of what I'm trying to achieve. In that example, the rule using REGEX is certainly easier to read and maintain. So, this appeared to be a major improvement at first, but using the new REGEX-style rules the inodescan phase completed only after about 32 minutes where it used to be done in 17 minutes. So, I am interested in the relative speeds of different expressions, but I think the file I'm testing this expressions against might influence the result.
                        For example: if I would put a file /GPFS/dir/0003/testfile in the expert list, I expect that the rule
                        
                        RULE#1: WHERE ( (
                        '/GPFS/dir/0003' =SUBSTR(PATH_NAME,1,LENGTH(
                        '/GPFS/dir/0003')))    OR  (A) (
                        '/GPFS/dir/0005' =SUBSTR(PATH_NAME,1,LENGTH(
                        '/GPFS/dir/0005')))    OR  (B) (
                        '/GPFS/dir/0008' =SUBSTR(PATH_NAME,1,LENGTH(
                        '/GPFS/dir/0008')))    OR  (C) (
                        '/GPFS/dir/0007' =SUBSTR(PATH_NAME,1,LENGTH(
                        '/GPFS/dir/0007')))    OR  (D) (
                        '/GPFS/dir/0009' =SUBSTR(PATH_NAME,1,LENGTH(
                        '/GPFS/dir/0009')))    OR  (E) (
                        '/GPFS/dir/0006' =SUBSTR(PATH_NAME,1,LENGTH(
                        '/GPFS/dir/0006')))    OR  (F) (
                        '/GPFS/dir/0004' =SUBSTR(PATH_NAME,1,LENGTH(
                        '/GPFS/dir/0004')))        (G) )
                        

                        evaluates faster than rule
                        
                        RULE#2: REGEX(PATH_NAME,[
                        '^/GPFS/dir/000[3-9]'])
                        

                        because only the expression on line marked by (A) needs to be evaluated.
                        (note: in RULE#1 I had ordered the OR operands according to the number of files in each of the subtrees)
                        For a file /GPFS/dir/0004/testfile RULE#2 may be faster than RULE#1 because now (A) and (B) and ... and (G) need to be evaluated. Suppose this is true, than the number of files in each of the subtrees /GPFS/dir/0003, /GPFS/dir/0004, .. might be important. Many files in /GPFS/dir/0004 in the expert list might favor the REGEX version, while many files in /GPFS/dir/0003 might favor RULE#1.

                        It makes sense to start with some synthetic tests and get a feel of how fast certain constructs are, but in the end I'm only interested in the execution speed during mmbackup. I thought using a realistic expert list would get me closer to the mmbackup context.

                        The tests that I've done so far were not so useful. I tried your suggestion with 101 copies of the rules and the inode scan took 2031 seconds for the regex version and 2379 seconds for the other version. So, the regex version was supposed to be faster according to these tests, but when part of mmbackup the REGEX version was significantly slower. With 2000 seconds per test and given the fact that it pointed me in the wrong direction, I'm looking for another approach.

                        Thanks for you input,
                        Paul.

                        PS. with the debug output you get timing information like "2013-02-23@16:30:10.427 (+844.073,+0.020,+0.010): inodeScan done"
                        What's the meaning of these 3 times between parantheses?