Referential integrity constraints help reduce the number of statistical views

Statistical views can be used to obtain good cardinality estimates for the Db2® query optimizer, but they also require resources to maintain and process. However, by using referential integrity constraints, you can cut down on the number of statistical views used to gather statistics.

If you want to collect accurate statistics for a star join query that joins different sets of tables, you can create multiple statistical views or one statistical view by using referential integrity constraints.

For example, consider a schema which has tables D1, D2, D3, D4, D5, and F:
create table D1 (D1_PK int not null primary key, 
                                  D1_C1 int, 
                                  D1_C2 int, 
                                  D1_C3 int, 
                                  D1_FK int not null);
create table D2 (D2_PK int not null primary key, 
                                  D2_C1 int, 
                                  D2_C2 int, 
                                  D2_C3 int, 
                                  D2_FK int not null);
create table D3 (D3_PK int not null primary key, 
                                  D3_C1 int, 
                                  D3_C2 int, 
                                  D3_C3 int, 
                                  D3_FK int not null);
create table D4 (D4_PK int not null primary key, 
                                  D4_C1 int, 
                                  D4_C2 int, 
                                  D4_C3 int, 
                                  D4_FK int not null);
create table D5 (D5_PK int not null primary key, 
                                  D5_C1 int, 
                                  D5_C2 int, 
                                  D5_C3 int, 
                                  D5_FK int not null);
create table F  (F_FK1 int not null, 
                                F_FK2 int not null, 
                                F_FK3 int not null, 
                                F_FK4 int not null, 
                                F_FK5 int not null, 
                                F_C1 int, F_C2 int);
Consider that table F is altered in the following way:
alter table F add foreign key (F_FK1) 
 references D1 (D1_PK) on delete cascade;
alter table F add foreign key (F_FK2) 
 references D2 (D2_PK) on delete cascade;
alter table F add foreign key (F_FK3) 
 references D3 (D3_PK) on delete cascade;
alter table F add foreign key (F_FK4) 
 references D4 (D4_PK) on delete cascade;
alter table F add foreign key (F_FK5) 
 references D5 (D5_PK) on delete cascade;
Also, consider that you want to provide statistics for the following query:
select distinct * from F, D1, D2, D3 where F_FK1 = D1_PK and F_FK2
   = D2_PK and F_FK3 = D3_PK and D1_C1='ON' and D2_C2>='2009-01-01';
To gather accurate statistics you can create the complete set of views, as follows:
create view SV1 as (select D1.* from F, D1 where   F_FK1 = D1_PK);
alter view SV1 enable query optimization;

create view SV2 as(select D2.* from F, D2 where  F_FK2 = D2_PK);
alter view SV2 enable query optimization;

create view SV3 as(select D3.* from F, D3 where  F_FK3 = D3_PK);
alter view SV3 enable query optimization;

create view SV4 as(select D1.*, D2.*, D3.* from F, D1, D2, D3 where  
  F_FK1 = D1_PK and F_FK2 = D2_PK and F_FK3 = D3_PK);
alter view SV4 enable query optimization;
You can reduce the number of statistical views created to obtain accurate statistics if referential integrity constraints exist between join columns. This reduction in the number of statistical views needed, saves you time in creating, updating, and maintaining statistical views. For this example, the following single statistical view would be sufficient to obtain the same statistics as the complete set of statistical views created earlier:
create view SV5 as (select D1.*, D2.*, D3.*, D4.*, D5.* from F, D1, D2, D3, D4, D5 
   where  
      F_FK1 = D1_PK and F_FK2 = D2_PK and F_FK3 = D3_PK 
      and F_FK4 = D4_PK and F_FK5 = D5_PK
);
alter view SV5 enable query optimization;
The statistics for SV4, SV3, SV2, and SV1 are inferred from SV5 based on referential integrity constraints. The referential integrity constraints between F, D1, D2, D3, D4, and D5 ensure that the joins among them are lossless. These lossless joins let us infer that the following cardinalities are the same:
  • The cardinality of the join result between F and D1
  • The cardinality of SV1
  • The cardinality of the statistical view SV5
The same reasoning applies to the joins between F and D2, F and D3, and F and D4. This reasoning shows that just one statistical view SV5 is required by the Db2 optimizer to obtain improved cardinality estimates for the earlier query.

For referential integrity constraints that permit NULL values, the percentage of NULL values that are in foreign key values can affect statistics inferred from a statistical view.

If the NULL values are a significant percentage of the foreign key values, the statistics and cardinality estimates inferred by the Db2 optimizer by using the constraints can be inaccurate. Prevent these inaccuracies by preventing the referential integrity constraints from being used to infer statistics. To prevent the constraints from being used to infer statistics, disable query optimization by using the DISABLE QUERY OPTIMIZATION option of the ALTER TABLE command.

You can detect this situation by using the explain facility. The explain facility shows the proportion of NULL values in the foreign key by producing an explain diagnostic message if the number of NULL values is possibly too high and if statistics are being inferred from a statistical view.