CSI: DB2 - Part One
Historical Data Forensics On Demand for Audit Defense
Imagine that you get audited by the tax department for claiming a large business loss due to your online retail business facing some unforeseen competition. You try to recall the details of all your business expenses, such as the times you used your car and home for business purposes. You wish you had kept log records of all your business activities neatly organized and indexed on your computer for quick analysis. Instead, you attempt to cobble the details together to put forth some semblance of proof. Every detail that you cannot prove costs you money.
Now imagine you are the Risk Officer at a $30 billion/year enterprise that services some of the most sensitive transactional data in the world. This could include Social Security numbers, medical records/lab results, credit card numbers, account balances. Changes to this data are under constant scrutiny by regulatory bodies in each industry sector. Many organizations devote significant financial and technical resources to risk management. For example, internal governance rules may require housing 20+ years of historical records in case of a lawsuit. Audits related to government regulations (HIPAA, SEC Rule 17a-4) may not only require maintenance of historical data, but also a view of all data changes. In order to do this, organizations may:
- Transform all transactional ‘Update’ operations into ‘Insert’ and ‘Delete’ pairs to retain before-and-after images of records
- Employ procedural code (e.g. triggers) to keep track of changes
- Create copies of the historical data on external systems, which may increase the liability of data breach and lead to additional costs related to data copying, transformation, storage, and maintenance
Performing these tasks may cost millions in yearly costs associated with additional transactions, additional procedural computing, increased storage, copying data to external environments, etc. But what if there was a way to:
- Keep an entire history of changes to the data without manually changing the transactions themselves (i.e. without requiring code to transform updates into insert/delete pairs)
- Automatically maintain beginning and end timestamps for each row of data where the timestamps indicate the “life” of the data (i.e. without requiring procedural code)
- Access and analyze this data via the transactional systems (without impacting resources on these transactional systems)
- Create a snapshot of the data as it existed at any point in time or range(s) of time with massive parallelism (without creating separate data connections and credentials)
All of these "data forensic" enabling features are made possible on IBM Z through two technologies. The first is a capability within DB2 for z/OS called Temporal Tables. The second is through a technology called the IBM DB2 Analytics Accelerator (The Accelerator). Part one of this blog will discuss Temporal Tables, and Part two will cover the Accelerator.
DB2 for z/OS temporal support allows you to insert, update, delete, and query/snapshot data on demand in the past, the present, and future while keeping a complete history of what you knew and when you knew it. When temporal versioning support is added on a DB2 for z/OS table, DB2 automatically tracks when changes (Inserts/Updates/Deletes) are made to the state of a table, automatically preserves deleted records in a temporal table with period timestamps, automatically maintains period timestamps for inserted records, allows for partitioning and archiving of the temporal data, and makes the data available online for analysis. DB2 for z/OS supports three kinds of temporal tables:
- System-period temporal table (STT) with SYSTEM_TIME period: DB2 transparently keeps a history of old rows that have been updated or deleted over time. With new constructs in the SQL language standard, users can go back in time and query the database at any chosen point in the past
- Application-period temporal table (ATT) with BUSINESS_TIME period: Applications supply dates or timestamps to describe the business validity of their data
- Bi-temporal table (BTT) with both ATT and STT features combined: Bi-temporal tables combine all the capabilities of system-period and application-period temporal tables. This combination enables applications to manage the business validity of their data while DB2 keeps a full history of any updates and deletes.
What makes temporal tables different from regular base tables is the period concept. A period is represented by a pair of date and time columns in DB2. The start column stores the start time of the period, and the end column stores the end time of the period. DB2 supports two kinds of periods: SYSTEM_TIME period and BUSINESS_TIME period.
- SYSTEM_TIME, sometimes referred to as “transaction time”, tracks when changes (Inserts/Updates/Deletes) are made to the state of a table, such as when an insurance policy is modified or a loan is created. The SYSTEM_TIME period columns are populated and controlled by the system.
- BUSINESS_TIME, sometimes referred to as “valid time” or “application time”, tracks the effective dates of certain business conditions, such as the terms of an insurance policy or the interest rate of a loan. The BUSINESS_TIME period columns are populated and controlled by the application.
Adding Temporal System Time Versioning:
Step 1: Before view of LOAN DB2 table:
Step 2: Add columns required for System-period Temporal Table (transaction start id column can be NULLABLE)
ALTER TABLE LOAN ADD COLUMN SYS_START NOT NULL GENERATED ALWAYS AS ROW BEGIN;
ALTER TABLE LOAN ADD COLUMN SYS_END NOT NULL GENERATED ALWAYS AS ROW END;
ALTER TABLE LOAN ADD COLUMN TRANS_ID NOT NULL GENERATED ALWAYS AS TRANSACTION START ID;
Step 3: Add System_Time period required for System-Period temporal table
ALTER TABLE LOAN ADD PERIOD SYSTEM_TIME (SYS_START, SYS_END);
Step 4: Create the history table required for system-time data versioning
CREATE TABLE LOAN_HIST LIKE LOAN IN DB2.TS2;
Step 5: Enable System_Time data versioning
ALTER TABLE LOAN ADD VERSIONING USE HISTORY TABLE LOAN_HIST;
Examples of Data Changes:
Inserts into the LOAN table within the COMMIT scope: the new rows have the same “begin time” (i.e., the start time of the physical transaction) and “end time” (i.e., the maximum value indicating that the versions are current). There is no history change for inserts – the history table is empty at this moment.
INSERT INTO LOAN (ACCOUNT_NO, BALANCE) VALUES (1003, 8000);
INSERT INTO LOAN (ACCOUNT_NO, BALANCE) VALUES (1004, 9000);
Updates against the current table: within same commit scope, the updated rows have same “begin” and “end” time in the current table. The “old images” before the update, but not the intermediate changes, are moved to history table automatically by DB2
UPDATE LOAN SET BALANCE = 3333 WHERE ACCOUNT_NO = 1001;
UPDATE LOAN SET BALANCE = 4444 WHERE ACCOUNT_NO = 1001;
UPDATE LOAN SET BALANCE = 9999 WHERE ACCOUNT_NO = 1004;
Deletes against the current table: the deleted rows are moved from current table to history table automatically by DB2
DELETE FROM LOAN WHERE ACCOUNT_NO = 1001;
INSERT INTO LOAN (ACCOUNT_NO, BALANCE) VALUES (1005, 5555);
DELETE FROM LOAN WHERE ACCOUNT_NO = 1005;
Time Travel Snapshot Query:
Let's assume you have an audit that requires you to provide a snapshot of the data as it existed at month end of March 2016. With System-period Temporal tables, you can produce this information by running a single query. A query with "FOR SYSTEM_TIME AS OF <timestamp_value>" retrieves all data that was active at the time of <timestamp_value>. For example:
SELECT * FROM LOAN FOR SYSTEM TIME AS OF '2016-03-31-184.108.40.2069999'
would produce any record that existed in the active LOAN table on ‘2016-03-31-23-59-59-999999’. Anything deleted before that time would not appear. Anything inserted after that time would not appear. Now suppose you needed to see a snapshot of loans that started in Jan 2016 and ended in March 2016. The following query would retrieve any data that was active from the beginning of January to the end of March 2016.
SELECT * FROM LOAN FOR SYSTEM_TIME FROM '2016-01-01-00.00.00.000000' TO '2016-03-31-220.127.116.119999'
The Temporal Table support in DB2 for z/OS can allow organizations to maintain a complete story of how data has changed over time, and it provides the ability to perform “forensic look-back” analysis of data. In a crime scene investigation, knowing that a car window was smashed is only the preliminary step in the investigation. Having a camera recording allowing investigators to go back in time and observe how, when, and by whom the car window was smashed can help solve the crime. Similarly, simply maintaining x years’ worth of data may show the final state of a LOAN; however, it doesn’t show the details of how the LOAN changed over time to reach this state. In an audit, the ability to quickly look-back to any point in time and snapshot the data as it existed can help investigators understand all of the transactions that resulted in the final state of the record.
Stay tuned for Part 2 of this blog, in which I will discuss how the DB2 Analytics Accelerator can:
- Help reduce the costs associated with storing Temporal data
- Improve the time to analyze the current and temporal data without impacting transactional systems
- Represent an “online archive” with the added benefit that all access is managed by the transactional system, so end users don’t have to create separate connections to the data with separate credentials
About the author
Shantan Kethireddy is an IBM Master Inventor and the NA Solution Architect for analytics solutions on IBM Z products. He is responsible for leading consultative business value validation engagements for big data and analytics. He holds a Masters in Computer Engineering and Electrical Engineering from the University of Iowa and possesses two dozen patents, primarily focused on data-centric technologies.
Thanks to the following people for their valuable contributions:
- James Knisley - IBM Client Technical Specialist
- Ruiping Li - Senior Technical Staff Member in DB2 for z/OS development at IBM Silicon Valley Lab
- Patricia Zakhar - Portfolio Marketing Manager, DB2 Analytics Accelerator and z Analytics
Disclaimer: The comments in this blog are based upon the author’s current knowledge and personal experiences. All comments are the author’s personal view and do not necessarily reflect the positions or opinions of IBM or its affiliates. You should conduct independent tests to verify the validity of any statements made in this blog before basing any decisions upon those statements.