We all know how DB2 BLU delivers extreme performance for terabytes of analytical workload based applications.
Its "load and go" concept works really well for terabytes of data volumes.
However, what about startups and enterprises with small and medium size businesses having databases of few gigabytes of data driving their applications?
Specially, applications that may have mixed workload with the same database being used for both transactional and reporting purposes.How would DB2 BLU perform on such smaller than ideal data workloads?
Having this question in mind, I had the opportunity to test one such application recently. This application drives a mixed workload. It has a decent amount of reporting being done by its reporting module which has some complex SQL statements in it. The overall size of the database is about 12GB (relatively small size when compared to terabytes on data used in analytical workload) The database has some tables that are being actively used and populated by the application while it has another set of tables used only for the reporting module.
Now, we began by creating an equivalent DB2 BLU columnar database to see if using columnar database will yield any sort of gain to this application.
As usual, we began by enabling the DB2 registry variable DB2_WORKLOAD to ANALYTICS indicating to the database that we are expecting DB2 to create a database that is optimized for analytical workload.
We had the choice to either convert the existing database into a DB2 BLU columnar database or create a new equivalent DB2 BLU columnar database
and populate it with same data as in the relational database. We chose the later because
1. It would allow us to see load data into the empty tables and see the compression capability of DB2 BLU.
2. Keeping the relational database unchanged allows us to perform additional tests whose results can then be compared with its equivalent columnar database.
So, we then created the columnar database using the CREATE DATABASE command
CREATE DATABASE COLDB USING CODEPAGE UTF-8
The next step was to create the database objects using the existing ddl script which was obtained using the db2look command on the relational database.
Please note that if the ddl script has ORGANIZE BY ROW clause appended to the end of CREATE TABLE statement, it is necessary to change it to ORGANIZE BY COLUMN to create columnar tables .A simple find and replace should be able to this for you. For any newly created tables the table will be columnar because of the registry variable DB2_WORKLOAD being set to ANALYTICS.
One question that might arise is what about the tables that are used by the transactional module of the application and their performance.
It must be remembered that even though the default setting creates the tables in columnar format, one can continue to create row based tables for transactional modules of the application which will ensure that the transactional nature of the application remains unaffected.
We then loaded all the tables using the data from flat files.
One can easily generate these flat files by executing the db2move export command against the existing relational DB2 database.
After loading all the tables with data from flat files, we calculated the database size by invoking the GET_DBSIZE_INFO stored procedure as below
call GET_DBSIZE_INFO(?, ?, ?, -1);
We noticed that the database size had shrunk by nearly 3.5 times proven by the fact that the original relational database was of size approximately 12 GB while the newly created equivalent DB2 columnar database only took about 3.4 GB disk space.
After this, we began to execute the complex SQL statements that were invoked by the reporting module of this application.
These queries had complex joins over multiple tables, creating views over multiple tables before returning the results and so on.
For reasons of simplicity and accuracy, we chose to execute the queries using DB2'S query performance benchmarking tool db2batch as shown below.
db2batch -d COLDB -f workload.sql
We executed the set of complex queries repeatedly and noted the timing. We noticed sub-second response time for majority of the queries while the whole set of queries got executed in less than 7 seconds as against nearly 400 seconds taken by its equivalent relational database.
So, what does it prove?
To me, it is apparent that DB2 BLU works well not only for database with terabytes of data but also with relatively small databases serving analytical workload.
Want to try it yourself? Here you go : http://www-01.ibm.com/software/data/db2/linux-unix-windows/db2-blu-acceleration/