As promised, here is one more item in my toolkit that may benefit some of you. This tool is still in is early stages, but has already been very helpful to me when helping customers achieve peak performance out of the accelerator. The purpose of this tool is to help you figure out how to best set the distribution keys and organizing keys on your accelerator tables. As you probably know, these settings can have a large impact on query performance.
As of right now, here are the input files that you need to provide to this tool:
1) A file with a list of all the tables to tune. All these tables must reside in the same DB2 subsystem or group. Put one table per line in the format <schema>.<table>.
2) A file with a list of the join conditions among these tables. The join conditions don't have to be listed in any particular order. Put one join condition on each line in the format...
<schema>.<table>.<col> = <schema>.<table>.<col>
We only look at equi-joins, so don't bother entering any join conditions that use a comparison operator other than equals. If the equijoin has multiple columns, just put 1 on each line.
3) A file with a list of the predicates that are used in your queries. Again, there's no particular order. Put one predicate on each line in the format...
<schema>.<table>.<col> <operator> ?
For example: TPCH.CUSTOMER.C_CUSTKEY >= ?
The tuner program doesn't care what the constant value in the predicate is, so you can always just write the predicate as though it was a parameter marker.
Once you have these 3 files, you can run the tool like this:
java -classpath IDAATuner.jar IDAATuner <file with list of tables> <file with list of joins> <file with list of predicates> <output filename> <accelerator name>
Follow the prompts. This program will then do a number of things.
1) It will gather the cardinality of each table from DB2 real time statistics.
2) It will look at each column used in an equijoin and see if it would make a decent distribution key or not. This is done by looking at the cardinality of the column and by looking at the data skew of the column. These calculations do not rely on runstats. They are calculated by executing queries directly against the tables themselves. The tool does force these queries to run on the accelerator.
3) It will use a combination of table cardinality, column cardinality, join conditions, and data skew to determine a set of distribution keys for the tables. It's not guaranteed to be the best possible set of distribution keys, but the goal is that it should result in good performance.
4) The tool will look at the list of predicates and the associated column cardinalities and pick a set of organizing keys that seems likely to benefit performance. The tool has the same goal here as it does for the distribution keys.
5) Lastly, the tool generates a SQL script that can be executed to set all of the distribution keys and organizing keys to the values recommended by the tool. Note that the tool may decide that some tables should be randomly distributed. This is OK.
It's worth pointing out one other feature of the tool. We frequently recommend that you set your organizing keys to match the primary key of the table when you are using incremental update. This is because it makes it faster for incremental update to handle updates and deletes. However, sometimes this isn't the right thing to do. For example, maybe your replicated workload is all inserts. Maybe the volume of replicated data is low and query performance is much more critical.
If you want the tool to generate organizing keys based on the predicates in your query, follow the directions above. If you would rather that the tool set the organizing keys such that they match the primary keys, just put one line in the predicates file and on that line put...
USE PRIMARY KEYS
That's it for now. Obviously this tool would be much more user friendly if it could just pull your SQL from the accelerator cache (see the GetQueries tool in my last post), parse your SQL and generate all the necessary input files itself. Unfortunately, I don't have access to a java parser for DB2 z/OS, and anyone familiar with writing parsers knows how much of a challenge that is to create. If someone is aware of a java parser for DB2 z/OS SQL, please let me know, and I'd be happy to stitch the pieces together.
BTW - if you are trying to use the accelerator with SAP BW, SAP always generates SQL statements in the same format. This allows us to skip the SQL parsing, generate the input files automatically, and still catch almost all of the join conditions and predicates. I do have a way to do this. If you are interested, please contact me.
As with my last post... Use at your own risk. There is no support, warranties, or guarantees. Please do contact me though if you have questions, thoughts, or bugs. Response will be on a best effort basis. Source code is once again provided so that you can modify to suit your needs. Thanks!
IDAATuner.jarView DetailsIDAATunerSource.jarView Details