Topic
9 replies Latest Post - ‏2012-11-28T11:57:12Z by BillWoodger
PeteInOxford
PeteInOxford
25 Posts
ACCEPTED ANSWER

Pinned topic Another performance oddity. Anyone know why?

‏2012-10-23T07:42:29Z |
Folks,

In line with the thread about subscripts vs. indexes, here's a thing to consider: the problem may lie in the tests of those fields, not the arithmetic. Here's a little demo program, with the compiler line numbers and other information:


LineID  PL SL  ----+-*A-1-B--+----2----+----3----+----4----+----5----+----6----+----7-|--+----8 Map and Cross Reference 000011         000515/                                                                 00051500 000012         000516 Data                            Division.                        00051600 000013         000517*                                                                 00051700 000014         000518 Working-Storage                 Section.                         00051800 000015         000519*                                                                 00051900 000016         000520 01  WA-Zoned-Numeric                Pic 9(008).                  00052019 BLW=00000+000         8C 000017         000911*                                                                 00091119 000018         000912 01  WA-Work-Fullword-as-X.                                       00091219 BLW=00000+008         0CL4 000019         000913     05   WA-Work-Fullword           Pic S9(08) Comp-5 Sync.      00091319 BLW=00000+008,0000000 1F 000020         000914*                                                                 00091419 000021         000915 01  WA-Work-Halfword-as-X.                                       00091519 BLW=00000+010         0CL2 000022         000916     05   WA-Work-Halfword           Pic S9(04) Comp-5 Sync.      00091619 BLW=00000+010,0000000 1H 000023         000917*                                                                 00091719 000024         000918 01  Filler                          Pic X(001).                  00091819 BLW=00000+018         1C 000025         000919     88  WA-Case-1                              Value 
'1'.        00091919 000026         000920     88  WA-Case-2                              Value 
'2'.        00092019 000027         000930     88  WA-Case-3                              Value 
'3'.        00093019 000028         000940     88  WA-Case-4                              Value 
'4'.        00094019 000029         001100/                                                                 00110000 000030         001200 Procedure                       Division.                        00120000 000031         001300*                                                                 00130000 000032         001400 0000-Mainline                   Section.                         00140000 000033         001658*                                                                 00165800 000034         001662     Accept WA-Zoned-Numeric                                      00166219 16 000035         001665*                                                                 00166500 000036         001666     Move WA-Zoned-Numeric           to  WA-Work-Fullword-as-X    00166619 16 18 000037         001667                                         WA-Work-Halfword-as-X    00166719 21 000038         001668*                                                                 00166819 000039         001669     If WA-Work-Halfword is greater than 256                      00166919 22 000040      1  001680         Set WA-Case-1               to  True                     00168019 25 000041         001718     End-If                                                       00171819 000042         001719*                                                                 00171919 000043         001720     If WA-Work-Halfword-as-X is greater than X
'0100'             00172019 21 000044      1  001721         Set WA-Case-2               to  True                     00172119 26 000045         001723     End-If                                                       00172319 000046         001724*                                                                 00172419 000047         001725     If WA-Work-Fullword is greater than 4096                     00172519 19 000048      1  001726         Set WA-Case-3               to  True                     00172619 27 000049         001728     End-If                                                       00172819 000050         001729*                                                                 00172919 000051         001730     If WA-Work-Fullword-as-X is greater than X
'00001000'         00173019 18 000052      1  001731         Set WA-Case-4               to  True                     00173119 28 000053         001740     End-If                                                       00174019 000054         001780*                                                                 00178006 000055         001800     Stop Run.                                                    00180000


If we take a look at the LIST output, we can see that the comparison of the halfword with a literal is fast whether we describe the two bytes of the halfword as binary or character:


000039  IF 0002C8  4840 3010               LH    4,16(0,3)               WA-WORK-HALFWORD 0002CC  4940 A018               CH    4,24(0,10)              PGMLIT AT +12 0002D0  47D0 B0A0               BC    13,160(0,11)            GN=8(0002D8) 000040  SET 0002D4  92F1 3018               MVI   24(3),X
'F1'             (BLW=0)+24 0002D8                 GN=8     EQU   * 000043  IF 0002D8  D501 3010 A024          CLC   16(2,3),36(10)          WA-WORK-HALFWORD-AS-X             PGMLIT AT +24 0002DE  47D0 B0AE               BC    13,174(0,11)            GN=9(0002E6) 000044  SET 0002E2  92F2 3018               MVI   24(3),X
'F2'             (BLW=0)+24 0002E6                 GN=9     EQU   *

But the difference for the fullword COMP-5 field is pronounced (this is the output for an optimized compile, too):

000047  IF 0002E6  5840 3008               L     4,8(0,3)                WA-WORK-FULLWORD 0002EA  8E40 0020               SRDA  4,32(0) 0002EE  5D40 C000               D     4,0(0,12)               SYSLIT AT +0 0002F2  4E50 D118               CVD   5,280(0,13)             TS2=16 0002F6  F154 D108 D11B          MVO   264(6,13),283(5,13)     TS2=0                             TS2=19 0002FC  4E40 D118               CVD   4,280(0,13)             TS2=16 000300  9110 D10D               TM    269(13),X
'10'           TS2=5 000304  D204 D10D D11B          MVC   269(5,13),283(13)       TS2=5                             TS2=19 00030A  4780 B0DA               BC    8,218(0,11)             GN=20(000312) 00030E  9601 D111               OI    273(13),X
'01'           TS2=9 000312                 GN=20    EQU   * 000312  F952 D10C A09C          CP    268(6,13),156(3,10)     TS2=4                             PGMLIT AT +144 000318  47D0 B0E8               BC    13,232(0,11)            GN=10(000320) 000048  SET 00031C  92F3 3018               MVI   24(3),X
'F3'             (BLW=0)+24 000320                 GN=10    EQU   * 000051  IF 000320  D503 3008 A020          CLC   8(4,3),32(10)           WA-WORK-FULLWORD-AS-X             PGMLIT AT +20 000326  47D0 B0F6               BC    13,246(0,11)            GN=11(00032E) 000052  SET 00032A  92F4 3018               MVI   24(3),X
'F4'             (BLW=0)+24 00032E                 GN=11    EQU   *

It all seems a bit over-the-top to me (but what do I know) - I was tuning a program and was very surprised to find this (a Divide is one of the more expensive operations, after all). I was able to make significant savings by going down the character-compare route. Can anyone tell me why the compiler does it that way? This is with the 4.2 release of Enterprise COBOL for z/OS.

Cheers,

Pete.
Updated on 2012-11-28T11:57:12Z at 2012-11-28T11:57:12Z by BillWoodger
  • BillWoodger
    BillWoodger
    75 Posts
    ACCEPTED ANSWER

    Re: Another performance oddity. Anyone know why?

    ‏2012-10-23T12:37:59Z  in response to PeteInOxford
    Try unsigned.
    • PeteInOxford
      PeteInOxford
      25 Posts
      ACCEPTED ANSWER

      Re: Another performance oddity. Anyone know why?

      ‏2012-10-23T13:21:37Z  in response to BillWoodger
      Bill,

      Thanks for the reply. The result's undoubtedly shorter, but it still makes no real sense to me I'm afraid - I simply can't understand why the generated code (in the first, signed, case) is not using a plain vanilla compare (OpCode X'59') instruction when the whole point of COMP-5 is to be a "native" binary format. Here's the unsigned case (same line numbers):
      
      000047  IF 0002EA  5850 3008               L     5,8(0,3)                WA-WORK-FULLWORD 0002EE  1F44                    SLR   4,4 0002F0  9045 D108               STM   4,5,264(13)             TS2=0 0002F4  D507 D108 A08C          CLC   264(8,13),140(10)       TS2=0                             PGMLIT AT +128 0002FA  47D0 B0CA               BC    13,202(0,11)            GN=10(000302)
      


      Cheers,

      Pete.
      • SystemAdmin
        SystemAdmin
        403 Posts
        ACCEPTED ANSWER

        Re: Another performance oddity. Anyone know why?

        ‏2012-10-23T17:31:58Z  in response to PeteInOxford
        As to your question about why the compiler generates the code it does, I have no special insight into the minds of those who wrote that code. But it can be fun to guess.

        Note that the current Enterprise COBOL 4.2 compiler does not have an ARCH option like C and PL/I. The current compiler must generate code that runs on all supported architectures, and thus cannot take advantage of newer instructions. Having said that, CLC is hardly new.

        I suspect that the comparison of what is only known to the compiler as a character literal with a COMP-5 field is a contributing factor. However, if you change the definition to COMP I think you'll see an interesting difference, even without optimization, so maybe not.

        COMP-5 may be called "native binary" but it suffers from the same difficulties as those documented for TRUNC(BIN) see pages 9 and 33 of the PDF at the linked page.

        I am not a tester for the v.Next compiler, perhaps it would generate more optimal code in the instances you specify.
      • BillWoodger
        BillWoodger
        75 Posts
        ACCEPTED ANSWER

        Re: Another performance oddity. Anyone know why?

        ‏2012-10-23T19:15:11Z  in response to PeteInOxford
        Sorry, I only had a few seconds on the way out earlier.

        If your COMP-5 is signed, then the compiler has to deal with it. In no way can it do it with a CLC, as in that case, all negative values would be greater than 256, 4016, whatever.

        To put it another way, your code comparing to X' is not correct, IF, the COMP-5 can contain a negative value.

        You have the advantage over the compiler, as you can tell it can't have a negative value in the program. But it is signed, so the compiler acts as though you can.

        For the halfword, the CH is operating on a signed field.

        For the fullword, the code generated correctly deals with, potential, negative values.

        As Craig indicated, "Native Binary" is just TRUNC(BIN) writ large. I think its use is for genuine cases where a "binary" value is able to, correctly, exceed the Cobol PICture. CICS and SQL do it. You might need it for fields from external sources where a "binary" value of potential magnitude greater than a possible PICture may arrive in the program.

        What I'd do with these, is make them "conform to PICture" as soon as reasonably possible and use TRUNC(OPT) for any processing.

        Just because they are "Native Binary" does not confer upon them (at least yet, given Craig's point about the new compiler) any performance advantage. Indeed, in general, it is slower than TRUNC(OPT).
        • PeteInOxford
          PeteInOxford
          25 Posts
          ACCEPTED ANSWER

          Re: Another performance oddity. Anyone know why?

          ‏2012-10-24T09:02:10Z  in response to BillWoodger
          Craig, Bill,

          Firstly, thanks for taking the time to reply.

          I was over-swift in posing the example, it seems - the point about signed vs. unsigned is quite correct and well made; thank you.

          But I think the central thrust of the question is still valid and must remain - given that a PIC S9(09) COMP-5 field is a "native binary" format, why the (very expensive) divide and subsequent mucking about? There is an instruction for the comparison of signed fullwords. Surely it's not unreasonable to expect that such an instruction would be used? I can well understand that there must be significant overhead in the modification of COMP-5 fields - that's the nature of the beast - but to see such an expensive code sequence generated for a test is very disappointing. I've checked and this only occurs for relational tests; equality checks are still a bit odd but not as, well, surprising as this one.

          Perhaps an RFE is called for. Anyone think it's an appropriate thing to request? I'll just point out in passing that SQLCODE is defined as S9(09) COMP-5.

          Cheers,

          Pete.
          • BillWoodger
            BillWoodger
            75 Posts
            ACCEPTED ANSWER

            Re: Another performance oddity. Anyone know why?

            ‏2012-10-24T11:00:10Z  in response to PeteInOxford
            It is certainly an interesting collection of code :-)

            You SYNCs should be irrelevant, as they are defined on double-word boundaries due to the 01s.

            I thought to make one not on a boundary. Identical code (given the different address) is generated. Perhaps it is a "works wherever defined" solution? Even with the unsigned value, the copy to temporary storage before the CLC is interesting.

            Since presumably Enterprise Cobol will be used for many years into the future (it seems Cobol II is still being used... oh, you know that :-) ), despite the new Cobol coming, perhaps it is worth raising the issue, even if the reply just illuminates why it is done that way.

            It is good to know that the overhead exists. I think that many people expect "native binary" should perform "better" than other binary definitions.

            Personally, I really like stuff to conform to PICture (where possible).

            I'm thinking to do a bit more exploring...
            • PeteInOxford
              PeteInOxford
              25 Posts
              ACCEPTED ANSWER

              Re: Another performance oddity. Anyone know why?

              ‏2012-10-24T11:51:37Z  in response to BillWoodger
              > Your SYNCs should be irrelevant, as they are defined on double-word boundaries due to the 01s.

              Yes. They are. Force of habit.

              > I thought to make one not on a boundary. Identical code (given the different address) is generated. Perhaps it is a "works wherever defined" solution? Even with the
              > unsigned value, the copy to temporary storage before the CLC is interesting.

              The "specification error" abend hasn't been seen for a while :-) Binary fields work anywhere but may go a bit faster if aligned correctly. Less chance of going across a cache line, allegedly.

              > Since presumably Enterprise Cobol will be used for many years into the future (it seems Cobol II is still being used... oh, you know that :-) ), despite the new Cobol coming, perhaps it is worth raising the issue, even if the reply just illuminates why it is done that way.

              I'd love to know, believe me.

              > It is good to know that the overhead exists. I think that many people expect "native binary" should perform "better" than other binary definitions.

              Yes. And Gods, no.

              > Personally, I really like stuff to conform to PICture (where possible).

              By golly, yes. Wouldn't life be easier!

              > I'm thinking to do a bit more exploring...

              Start with the comparison of indexes. If you compare two indexes that are defined on the same table, they get "normalized" even though it isn't necessary. The standard essentially says that they should behave "as if normalized" (i.e. as if converted to ordinal slot numbers); the code actually does the expensive part of the conversion even though it's needless. Here:

              
              000011         000515/                                                                 00051500 000012         000516 Data                            Division.                        00051600 000013         000517*                                                                 00051700 000014         000518 Working-Storage                 Section.                         00051800 000015         000530*                                                                 00053000 000016         000540 01  Control-Card.                                                00054003 BLW=00000+000         0CL10 000017         000541*                                                                 00054103 000018         000550     05  Filler                      Pic X(04).                   00055003 BLW=00000+000,0000000 4C 000019         000552         88  Control-End                       Value 
              '****'.      00055203 000020         000553*                                                                 00055303 000021         000554     05  Control-Number              Pic 9(06).                   00055403 BLW=00000+004,0000004 6C 000022         001071/                                                                 00107103 000023         001072 01  WB-Work-Array-Area.                                          00107203 BLW=00000+010         0CL1008 000024         001073*                                                                 00107300 000025         001074     05  WA-Work-Array-Limit         Pic S9(08) Binary Sync       00107403 BLW=00000+010,0000000 1F 000026         001075                                                Value +1000.      00107503 000027         001076*                                                                 00107603 000028         001077     05  WA-Work-Array-Length        Pic S9(08) Binary Sync       00107703 BLW=00000+014,0000004 1F 000029         001078                                                Value Zero.       00107803 IMP 000030         001079*                                                                 00107903 000031         001080     05  WA-Work-Array-Entry     Occurs 0 to 1000 Times           00108003 BLW=00000+018,0000008 1C 000032         001081                                 Depending on                     00108103 000033         001082                                     WA-Work-Array-Length         00108203 28 000034         001083                                 Indexed by                       00108303 000035         001084                                     WA-Work-Array-Index          00108403 IDX=00001+000 000036         001085                                     WA-Work-Array-Limit-Index    00108504 IDX=00002+000 000037         001086*                                                                 00108604 000038         001087                                     Pic X(001).                  00108704 000039         001088/                                                                 00108803 000040         001089 Procedure                       Division.                        00108903 000041         001090*                                                                 00109003 000042         001091 0000-Mainline                   Section.                         00109103 000043         001092*                                                                 00109203 000044         001093     Accept Control-Card.                                         00109303 16 000045         001094*                                                                 00109403 000046         001095     Set WA-Work-Array-Limit-Index                                00109503 36 000047         001096                                 to  WA-Work-Array-Limit          00109603 25 000048         001100*                                                                 00110000 000049         001200     Set WA-Work-Array-Index     to  Control-Number               00120003 35 21 000050         001210*                                                                 00121003 000051         001300     If WA-Work-Array-Index is GREATER than                       00130003 35 000052         001310                                 WA-Work-Array-Limit-Index        00131003 36 000053         001410*                                                                 00141000 000054      1  001500         Display 
              'Number ' Control-Number 
              ' out of range.'        00150003 21 000055         001510*                                                                 00151003 000056         001600     End-If                                                       00160003 000057         001620* 00162000 000058         001630     Stop Run.                                                    00163000
              


              Nice, simple stuff. But look at the nastiness that emerges from the comparison of the indexes:

              
              000051  IF 00032E  5840 9134               L     4,308(0,9)              WA-WORK-ARRAY-INDEX 000332  8E40 0020               SRDA  4,32(0) 000336  5D40 A014               D     4,20(0,10)              PGMLIT AT +8 00033A  5860 9138               L     6,312(0,9)              WA-WORK-ARRAY-LIMIT-INDEX 00033E  8E60 0020               SRDA  6,32(0) 000342  5D60 A014               D     6,20(0,10)              PGMLIT AT +8 000346  1957                    CR    5,7 000348  47D0 B0E8               BC    13,232(0,11)            GN=8(000356)
              


              I ran into this little nasty when I was tuning a big, table-heavy program and wondered why a perform epilog was taking all the time. What galls me about it is that if you define the limit index independently, like this, and modify the program accordingly:

              
              000011         000515/                                                                 00051500 000012         000516 Data                            Division.                        00051600 000013         000517*                                                                 00051700 000014         000518 Working-Storage                 Section.                         00051800 000015         000530*                                                                 00053000 000016         000540 01  Control-Card.                                                00054003 BLW=00000+000         0CL10 000017         000541*                                                                 00054103 000018         000550     05  Filler                      Pic X(04).                   00055003 BLW=00000+000,0000000 4C 000019         000552         88  Control-End                       Value 
              '****'.      00055203 000020         000553*                                                                 00055303 000021         000554     05  Control-Number              Pic 9(06).                   00055403 BLW=00000+004,0000004 6C 000022         001071/                                                                 00107103 000023         001072 01  WB-Work-Array-Area.                                          00107203 BLW=00000+010         0CL1012 000024         001073*                                                                 00107300 000025         001074     05  WA-Work-Array-Limit         Pic S9(08) Binary Sync       00107403 BLW=00000+010,0000000 1F 000026         001075                                                Value +1000.      00107503 000027         001076*                                                                 00107605 000028         001077     05  WA-Work-Array-Limit-Index              Index Sync.       00107705 BLW=00000+014,0000004 1F 000029         001078*                                                                 00107805 000030         001081     05  WA-Work-Array-Length        Pic S9(08) Binary Sync 00108105 BLW=00000+018,0000008 1F 000031         001082                                                Value Zero.       00108205 IMP 000032         001083*                                                                 00108305 000033         001084     05  WA-Work-Array-Entry     Occurs 0 to 1000 Times           00108405 BLW=00000+01C,000000C 1C 000034         001085                                 Depending on                     00108505 000035         001086                                     WA-Work-Array-Length         00108605 30 000036         001087                                 Indexed by                       00108705 000037         001088                                     WA-Work-Array-Index          00108805 IDX=00001+000 000038         001090*                                                                 00109005 000039         001091                                     Pic X(001).                  00109105 000040         001092/                                                                 00109205 000041         001093 Procedure                       Division.                        00109305 000042         001094*                                                                 00109405 000043         001095 0000-Mainline                   Section.                         00109505 000044         001096*                                                                 00109605 000045         001097     Set WA-Work-Array-Index     to  WA-Work-Array-Limit          00109705 37 25 000046         001098     Set WA-Work-Array-Limit-Index 00109805 28 000047         001099                                 to  WA-Work-Array-Index          00109905 37 000048         001100*                                                                 00110005 000049         001101     Accept Control-Card.                                         00110105 16 000050         001110*                                                                 00111000 000051         001200     Set WA-Work-Array-Index     to  Control-Number               00120003 37 21 000052         001210*                                                                 00121003 000053         001300     If WA-Work-Array-Index is GREATER than                       00130003 37 000054         001310                                 WA-Work-Array-Limit-Index        00131003 28 000055         001410*                                                                 00141000 000056      1  001500         Display 
              'Number ' Control-Number 
              ' out of range.'        00150003 21 000057         001510*                                                                 00151003 000058         001600     End-If                                                       00160003 000059         001620*                                                                 00162000 000060         001630     Stop Run.                                                    00163000
              


              then the compiler generates fast code after all..

              
              000053  IF 000334  5950 9134               C     5,308(0,9)              WA-WORK-ARRAY-INDEX 000338  47B0 B0D8               BC    11,216(0,11)            GN=8(000346)
              


              It should not be beyond the wit of man to get the compiler to say (in essence) Aha! A Comparison of indexes, and we have either the same slot sizes or an independent index; I'll use the fast sequence. And, look - the compiler authors do use the C instruction after all! :-)

              This one should (I think) be the subject of an RFE 'coz it penalizes only the people who are really trying for faster code, as far as I can see.

              Thoughts welcomed.

              Cheers,

              Pete.
              • BillWoodger
                BillWoodger
                75 Posts
                ACCEPTED ANSWER

                Re: Another performance oddity. Anyone know why?

                ‏2012-10-24T13:02:37Z  in response to PeteInOxford
                Long ago and far away, when people were concerned about code generation for performance, I was told that comparing indexes to non-indexes was not a good thing to do. So, I've always (like your SYNC) used USAGE INDEX to hold the maximum index value. Seems that these days, though it matters less, there may be even more that it saves :-) I don't even use multiple indexes on the same table, in general.

                Since you are able to use an index defined on one table to reference another table (not such good practice in normal use) I'm surprised that the comparison index to index is so lengthy as well, I've just never come across it as I just don't do it :-) Again, nice to know.
  • BillWoodger
    BillWoodger
    75 Posts
    ACCEPTED ANSWER

    Re: Another performance oddity. Anyone know why?

    ‏2012-11-28T11:57:12Z  in response to PeteInOxford
    A little something to mention, as it is related.

    This I looked at from R. J. Arellanes' Performance Tuning papers.

    If using TRUNC(OPT), COMP PIC 9(9) is slower than COMP PIC 9(10) for arithmetic. For a fullword binary, specify PIC 9(8) if you don't need nine digits, or PIC 9(10) if you need nine digits.

    Reason is that the 9(9) will be converted to a doubleword, arithmetic carried out in doubleword, then converted back to fullword. The 9(10) will just all be done in doubleword, saving two conversions.

    The effect can be seen even by just ADD 1...