In his paper "IBM Enterprise COBOL Version 3 Release 1 Performance Tuning" (http://www-01.ibm.com/support/docview.wss?uid=swg27001475&aid=1) from 2002, R. J. Arellanes underlined the performance superiority of using indexes over the usage of subscripts.
He delivered astoundingly concrete statistics, I quote from page 32:
"Performance considerations for indexes vs subscripts (PIC S9(8)):
| using binary data items (COMP) to address a table is 30% slower than using indexes
| using packed decimal data items (COMP-3) to address a table is 300% slower than using indexes
| using DISPLAY data items to address a table is 450% slower than using indexes"
We tried to verify these statistics, but not a single one of them appeared to be valid. Not even close. (We use IBM Enterprise COBOL for z/OS 3.4.1, btw.)
Can anybody help to explain this phenomenon?
Test conditions: Using a cobol table with "OCCURS 1000" and five table elements per line, we fill all 1,000 lines in a PERFORM loop with five MOVEs each. This logic is embraced by another PERFORM loop of 50,000, summing up to a total of 250,000,000 MOVE statements. Quite like this:
PERFORM 50000 TIMES
PERFORM VARYING IND FROM 1 BY 1
MOVE "A" TO FIELD-1(IND)
MOVE "B" TO FIELD-2(IND)
MOVE "C" TO FIELD-3(IND)
MOVE ZERO TO FIELD-4(IND)
MOVE 1 TO FIELD-5(IND)
The results were:
5,59 seconds - when using the table's index
3,18 seconds - when using subscript PIC S9(4) binary
5,68 seconds - when using subscript PIC S9(9) binary
5,45 seconds - when using subscript PIC S9(4) packed-decimal
5,31 seconds - when using subscript PIC S9(5) packed-decimal
6,17 seconds - when using subscript PIC S9(9) packed-decimal
6,04 seconds - when using subscript PIC S9(4) display with sign
7,09 seconds - when using subscript PIC 9(4) display without sign
6,70 seconds - when using subscript PIC 9(9) display without sign
So the test tended to confirm some of the statements: Using variables with a length of 9 digits is slower than a length of 4 or 5 digits; using display data items is (usually) a bad decision, and using unsigned variables is even worse.
BUT the foremost statement seems to have proved wrong: In contradiction to the performance tuning guide, using a subcript with PIC S9(4) binary was considerably faster than using the table's index.
And besides that, none of the quoted percentage numbers ("30%", "300%", even "400%) appeared anywhere near confirmability.
The differences between the performance guide statements and the own test results can have several reasons, of course:
- The paper is based on Cobol V3R1, we use V3R4.1. Maybe there have been some vast improvements for subscripts since V3R1, and this part of the guide is out-dated.
- Maybe we use compiler settings that disadvantage the use of indexes.
- Maybe we misinterpreted the propositions, and what we tested was something else than what R. J. Arellanes meant. (In this case, I would tend to say that this segment ot the guide was written too ambiguously and should be more differentiated.)
- Maybe we corrupted the advantages of indexes by not handling indexes properly all through the experiment - e.g. by not using "SET IND UP BY 1", but simply modifying it in the PERFORM phrase.
In any case, we would be glad to hear your opinions about that matter, and perhaps you can explain some of the discrepancies.
Thank you in advance. :)