Cache recommendations (Data Virtualization)

Based on a given set of queries, Data Virtualization recommends a set of data caches that can improve the overall performance of these queries.

Cache recommendations are generated on-demand, based on configuration parameters for query workloads that are defined by the service Admin. The service Admin can then add data caches from these recommendations.

Data Virtualization provides an engine to generate a ranked list of recommendations. How a recommendation is ranked is determined by the execution time of influenced queries and the frequency of those queries in the input workload. Additionally, the engine does not recommend the creation of duplicated caches.

The process of generating cache recommendations consists of five stages, as depicted in the following image:

1. Collect

The Data Virtualization recommendation engine collects the information, such as query text, execution time, cardinality, timestamp, and frequency for the appropriate time period.

The following image shows how queries from historical workload are filtered to arrive to the final input set of queries for the recommendation engine: Cache configuration settings are defined by the Data Virtualization Admin, see Configure cache recommendations for details.

2. Extract

The recommendation engine selects potential cache candidates of the query workload.

3. Translate

The recommendation engine translates the extracted candidates to reference virtual objects. Each candidate is unique and passes all Db2® restrictions.

4. Evaluate

The engine evaluates the translated definitions by matching each cache candidate against the input query workload.

As a result of this evaluation, the recommendation engine generates a match score for each cache candidate. The evaluation of each candidate is based on the following criteria:

Matchability: Number of queries that match the cache candidate.
Diversity: Different queries that match the cache candidate.
Cardinality: Size of result set that the cache candidate fetched.
Performance: Speed of queries that the cache candidate matched.

5. Rank and sort

The recommendation engine ranks and sorts the cache candidates to generate a final list of recommendations. The final list of recommendations is created based on the following criteria:

Sort candidates by using a weighted metric of matched query execution time and frequency.
Any tie between candidates is broken by using query frequency and cardinality.