Query performance for common SQL statements
A number of performance improvements have been made to improve the speed of many queries. These improvements are automatic. There are no configuration settings or changes to the SQL statements required.
Partial early distinct (PED)
An efficient hashing function will now be used to partially remove duplicates early in the processing of the query. This may not remove all duplicates, but will reduce the amount of data that must be processed later in the query evaluation. Removing some of the initial duplicate rows will speed up the query, and reduce the chance that it will run out of sort heap memory, thereby eliminating the need to use relatively slow disk space for temporary storage in these cases. This improvement is termed partial early distinct (PED).
- ARGUMENT_TYPE column = UNIQUE
- ARGUMENT_VALUE column can now also have the value:
HASHED PARTIAL
which indicates that the new feature has been used
HASHED
PARTIAL
in its output, as shown in the following example:6) UNIQUE: (Unique)
Cumulative Total Cost: 132.519
Cumulative CPU Cost: 1.98997e+06
...
...
Arguments:
---------
JN INPUT: (Join input leg)
INNER
UNIQKEY : (Unique Key columns)
1: Q1.C22
UNIQKEY : (Unique Key columns)
2: Q1.C21
pUNIQUE : (Uniqueness required flag)
HASHED PARTIAL
Partial early aggregation (PEA)
Similar to partial early distinct (PED), partial early aggregation (PEA) is an attempt to do a partial aggregation of data early in the processing of the query. While it is unlikely that all aggregation can take place at this point, it will at least reduce the amount of data that must be processed later in the query evaluation.
- ARGUMENT_TYPE column = AGGMODE
- ARGUMENT_VALUE column can now also have the value:
HASHED PARTIAL
which indicates that this new feature has been used
HASHED
PARTIAL
in its output for GRPBY
sections,
along with a pGRPBY
in the tree view, if this new
functionality has been applied within that part of the query.Hash join now selected by the query optimizer for a wider range of SQL queries
- Data type mismatches
- A hash join will now be considered even if the two columns in the join are not the same data type. This is the case in all but the most extreme situations.
- Expressions used in join predicate
- Join predicates that contain an expression no longer restrict
the join method to a nested loop join. In this release a hash join
is considered in cases where the WHERE clause contains an expression,
like:
WHERE T1.C1 = UPPER(T1.C3)
Improved cost estimates of network communication traffic generated by a query
The query optimizer relies on a range of information to choose an access plan that is efficient as possible. The estimated communication costs of queries has now been improved, enabling the optimizer to more accurately consider and compare all of the CPU, IO, and communication costs. In many cases this will result in faster query performance.
- If there is more than one network adapter involved, the cumulative communication cost for the adapter with the highest value is returned. In previous releases the total number of frames transmitted throughout the entire network was returned.
- The values only include the costs of network traffic between physical machines. They do not include the virtual communication costs between node partitions on the same physical machine in a partitioned database environment.