Several days ago, I heard some complain that the customer hasn't solid backup strategy and the crash of the database made a really big deal there. But, even you have backup strategy, are you certain that you're totally good?
A very good DBA ever mentioned that, in his opinion, the top thing of a DBA should always take, is BACKUP, the next is BACKUP, and the third is still BACKUP.
I highly respect the importance of BACKUP but I never or don't want to sing so high for the BACKUP. Instead, when facing with customers or leading a bank/company dba practice, the thing I always emphasize is DISCIPLINE, DISCIPLINE and DISCIPLINE. And of course, this doesn't really mean I won't seriously design our backup strategy。
I'm not playing on words.
From the years of experience helping customers dealing with all kinds of db2 related problems, I believe that for any organization, you need at least one or two really good core DBAs who make key decisions, research new product features and emerging technology to better their team's performance. At the same time, you need several other DBAs who can have the most jobs done. When the team grows mature, the importance of DISCIPLINE or rules becomes more and more significant. Most of the time, it's not the team doesn't have the skills to have the job done, but the violation crashes the system. You may not know what kind of impact may be introduced when you're trying to do some little "fix". I know some guy almost fired with an extra "ctl+c", actually everybody likes the guy and he's a good DBA definitely.
Let's back to the topic of BACKUP/RESTORE.
My basic point is, RESTORE is the last option you may want to take when something bad happens --- We only say, something bad, not include "oh, this morning ETL loads wrong data, let's go back....". My excuse is, you may not have enough time your business would like to allow. Especially for those core banking system, retail transaction system, etc. Can you image the face of your CIO when you have to take hours to recover the database?
The exception may be the warehousing system. They are available more on 5x8 than 7x24. You may have the time to do a restore. But, just remember a fact, RESTORE costs, it is too heavy for many situations.
One thing I noticed for many years, companies speak BACKUP strategy, they ask for DELTA/INCREMENTAL BACKUP when interviewing new DBA, but many of them don't have a RESTORE practice. If you don't know how to restore your data, why bother backup? especially when you involve more factors such as legato/tsm as backup device, delta/incremental backup strategy, are you really sure your team can start the RESTORE at the first minute if required? I bet some would keep quiet.
So, we need to have some firewall before the RESTORE.
For db2 warehouse systems, HACMP or TSA may be the most popular HA setup and they guarantee the database service will be back in minutes when something bad happens to the server/network/etc. But, only one copy of DATA.
Q/SQL replication for key tables would help and can be good enough for a lot situations.
HADR is not available on DPF environment.
For OLTP system, I strongly suggest HA/DR solutions. I believe nowadays, it's a must if you're doing real serious business. On the top is pureScale, or DB2 ACT feature; then HADR, then Q replication, SQL replication. And DR solutions from storage companies, such as PPRC or SRDF. Need very careful balance of the existing techniques, combine them to make something really fits your organization and meets SLA.
afaik, now, HADR still doesn't work with pureScale; HADR target database could be readable but cannot used as replication source. Much better support for ACT feature is expected in recent future.