What she meant was that the data warehouse is ever growing at a high rate and nothing ever gets deleted.
- Large table spaces
These alone allow tables to reach TB sizes
- Large Row iDs
Large row ids allow more rows in a table
- Range partitioning
This feature allows you to compose composite tables where each partition resides in different table space
- Data partitioning
This features allows DB2 to hash partition tables across multiple nodes
Being a pack rat however does have downsides however: Storage is expensive.
This PC that I use to write this article on uses an solid state drive (SSD).
Its pricey, which is why I only use it for the OS where I really appreciate the drive's fast random I/O.
In addition I have a much larger regular hard drive (HDD) which is cheaper, slower and stores any programs and documents.
This is my slowest storage since it's accessed via my wireless router.
My personal setup somewhat reflects what you see in IT shops, except that the price tags are much higher.
The question is: How do you make the best of what you can afford?
Multi Temperature Data
What you then do is to place those "cold" tables in cheaper, slower storage while placing the central "hot" tables onto faster, expensive storage.
Prior to DB2 10 you did this by creating hot, warm and cold table spaces and then define tables within these tablespaces.
This makes management easier.
SELECT SUBSTR(STORAGE_GROUP_NAME, 1, 30) AS NAME, SUBSTR(DB_STORAGE_PATH, 1, 30) AS PATH FROM TABLE(ADMIN_GET_STORAGE_PATHS(NULL, NULL)); NAME PATH ------------------------------ ------------------------------ IBMSTOGROUP D:
So this is my HDD.
So the first I do is rename the storage group to reflect this.
Further more I want to specify some data I have on the drive.
The average latency is 4.17ms and it's said to handle 300MB/sec.
(I have no clue whether my BUS can handle this, hardware is not my strength)
RENAME STOGROUP IBMSTOGROUP TO WARM;
ALTER STOGROUP WARM OVERHEAD 4.17
DEVICE READ RATE 300;
CREATE STOGROUP HOT ON 'C:' OVERHEAD 0.1 DEVICE READ RATE 200;
CREATE STOGROUP COLD ON 'G:'
DEVICE READ RATE 6;
Then I define my tables.
CREATE TABLESPACE H_tsp MANAGED BY AUTOMATIC STORAGE USING STOGROUP HOT; CREATE TABLESPACE W_tsp MANAGED BY AUTOMATIC STORAGE USING STOGROUP WARM; CREATE TABLESPACE C_tsp MANAGED BY AUTOMATIC STORAGE USING STOGROUP COLD;
CREATE TABLE H(c1 INT) IN H_tsp;
CREATE TABLE W(c1 INT) IN W_tsp;
CREATE TABLE C(c1 INT) IN C_tsp;
We could have achieved the same using the storage paths of CREATE TABLESPACE.
Let's add another wrinkle.
Range partitioning and multi temperature data
That is a simplistic environment however.
Every time you do online banking you are likely to look at your recent transactions as well as pay some bills.
Perhaps some company claims you didn't pay a bill on time and you want to see what happened.
But there is no expectation that they can be retrieved as quickly as your recent history.
If we use range partitioning this can be achieved:
CREATE TABLESPACE Q4_2010_tsp MANAGED BY AUTOMATIC STORAGE USING STOGROUP COLD; CREATE TABLESPACE Q1_2011_tsp MANAGED BY AUTOMATIC STORAGE USING STOGROUP WARM; CREATE TABLESPACE Q2_2011_tsp MANAGED BY AUTOMATIC STORAGE USING STOGROUP WARM; CREATE TABLESPACE Q3_2011_tsp MANAGED BY AUTOMATIC STORAGE USING STOGROUP WARM; CREATE TABLESPACE Q4_2011_tsp MANAGED BY AUTOMATIC STORAGE USING STOGROUP HOT; CREATE TABLESPACE Q1_2012_tsp MANAGED BY AUTOMATIC STORAGE USING STOGROUP HOT;
CREATE TABLE transactions(tx_id BIGINT NOT NULL,
amount DECIMAL(8, 2),
PARTITION BY RANGE(stamp)
(PART Q4_2010 STARTING ('2010-10-01') ENDING ('2011-01-01') EXCLUSIVE IN Q4_2010_tsp,
PART Q1_2011 STARTING ('2011-01-01') ENDING ('2011-04-01') EXCLUSIVE IN Q1_2011_tsp,
PART Q2_2011 STARTING ('2011-04-01') ENDING ('2011-07-01') EXCLUSIVE IN Q2_2011_tsp,
PART Q3_2011 STARTING ('2011-07-01') ENDING ('2011-10-01') EXCLUSIVE IN Q3_2011_tsp,
PART Q4_2011 STARTING ('2011-10-01') ENDING ('2012-01-01') EXCLUSIVE IN Q4_2011_tsp,
PART Q1_2012 STARTING ('2012-01-01') ENDING ('2012-03-31') EXCLUSIVE IN Q1_2012_tsp);
BEGIN DECLARE i INTEGER DEFAULT 0; WHILE i < 1000000 DO INSERT INTO transactions VALUES (i, RAND() * 1000, TIMESTAMP('2010-10-01') + INTEGER(RAND() * 17) MONTHS + INTEGER(RAND() * 30) DAYS + INTEGER(RAND() * 3600) SECONDS, REPEAT(CHR(33 + RAND() * 80), RAND() * 20), RAND() * 10000, REPEAT(CHR(33 + RAND() * 80), RAND() * 80)); SET i = i + 1; IF i / 100 * 100 = i THEN COMMIT; END IF; END WHILE; COMMIT; END;
CREATE INDEX i1 ON transactions(account, stamp);
We test how often we can sum up all the amounts for a random account within a quarter.
CREATE OR REPLACE PROCEDURE drive(from TIMESTAMP, to TIMESTAMP) BEGIN DECLARE vi INTEGER DEFAULT 0; DECLARE vstart TIMESTAMP; DECLARE vstop TIMESTAMP; DECLARE vsum DECFLOAT; DECLARE vcount INTEGER; DECLARE vaccount INTEGER; WHILE vi < 5 DO SET vcount = 0; SET vstart = CURRENT TIMESTAMP; WHILE vstart + 5 second > CURRENT TIMESTAMP DO SET vaccount = RAND() * 1000; SELECT SUM(amount) INTO vsum FROM transactions WHERE stamp BETWEEN from AND to AND account = vaccount; SET vcount = vcount + 1; END WHILE; CALL DBMS_OUTPUT.PUT_LINE('Time: ' || vstart || ' Count: ' || vcount); SET vi = vi + 1; END WHILE; END; /
To get good information we must disable self tuning of the bufferpool and keep it fairly small.
ALTER BUFFERPOOL IBMDEFAULTBP SIZE 250; SET SERVEROUTPUT ON; CALL drive('2011-10-01', '2011-12-31'); Time: 2012-04-30-08.04.29.088000 Count: 120 Time: 2012-04-30-08.04.34.167000 Count: 120 Time: 2012-04-30-08.04.39.181000 Count: 120 Time: 2012-04-30-08.04.44.213000 Count: 118 Time: 2012-04-30-08.04.49.219000 Count: 121 DB250000I: The command completed successfully.
How does this compare to a quarter on warm storage?
CALL drive('2011-07-01', '2011-09-30'); Time: 2012-04-30-08.05.29.309000 Count: 37 Time: 2012-04-30-08.05.34.340000 Count: 38 Time: 2012-04-30-08.05.39.470000 Count: 37 Time: 2012-04-30-08.05.44.523000 Count: 38 Time: 2012-04-30-08.05.49.631000 Count: 37 DB250000I: The command completed successfully..
CALL drive('2010-10-01', '2010-12-31'); Time: 2012-04-30-08.06.25.366000 Count: 4 Time: 2012-04-30-08.06.31.747000 Count: 4 Time: 2012-04-30-08.06.38.350000 Count: 4 Time: 2012-04-30-08.06.44.257000 Count: 3 Time: 2012-04-30-08.06.49.389000 Count: 3 DB250000I: The command completed successfully.
This is what we wanted
Just like in fashion what's hot now is no so hot in three months
Managing the data life cycle
Space on the SSD drive is limited.
So we need to make room.
This is where storage groups really shine.
And this is why DB2 10 provides so much value in this area.
Since DB2 has slipped an abstraction layer between the storage path and the tablespace it is possible to move tablespace from one storage path to another.
And best of all, this is done online and in the background with a simple ALTER TABLESPACE statement
ALTER TABLESPACE Q4_2011_tsp USING STOGROUP WARM; COMMIT; CALL drive('2011-10-01', '2011-12-31'); Time: 2012-04-30-08.18.46.655000 Count: 21 Time: 2012-04-30-08.18.51.695000 Count: 37 Time: 2012-04-30-08.18.56.718000 Count: 37 Time: 2012-04-30-08.19.01.768000 Count: 38 Time: 2012-04-30-08.19.06.845000 Count: 38
Housekeeping slowed down the first few seconds and then performance was already where it should be for WARM data.
Perhaps we have more luck pushing out Q1 of 2011 into COLD storage.
ALTER TABLESPACE Q1_2011_tsp USING STOGROUP COLD; COMMIT;
CALL drive('2011-01-01', '2011-03-31'); Time: 2012-04-30-08.24.55.692000 Count: 1 Time: 2012-04-30-08.25.06.636000 Count: 1 Time: 2012-04-30-08.25.12.455000 Count: 1 Time: 2012-04-30-08.25.22.821000 Count: 1 Time: 2012-04-30-08.25.39.185000 Count: 1 DB250000I: The command completed successfully.
Clearly the REBALANCE that is happening in the background has an impact.
The fact that my hard drive has only a single spindle and this is a PC to begin with is exaggerating the overhead.
Luckily the effect doesn't last long.
CALL drive('2011-01-01', '2011-03-31'); Time: 2012-04-30-08.26.19.831000 Count: 4 Time: 2012-04-30-08.26.26.132000 Count: 4 Time: 2012-04-30-08.26.32.664000 Count: 4 Time: 2012-04-30-08.26.38.970000 Count: 3 Time: 2012-04-30-08.26.43.971000 Count: 4 DB250000I: The command completed successfully.
The free room on the HOT storage can now be re-used to ATTACH a new partition for a new quarter.