Safeguarded Copy is a new DS8000 function helping to protect against logical data corruption. It is available since 28 September 2018.
Potential reasons for such logical data corruption can be inadvertent user error, malicious intent, or even cyberattacks. The cost for a single incident, like such deriving from data-encrypting ransomware, often goes into millions. While many storage systems have means of peer-to-peer replication of data to protect against physical failures, such a replication architecture alone is not yet sufficient to protect against a scenario of logically destroyed or corrupted data.
Especially in the financial sector, some regional supervisory authorities demand additional ways of cyber resilience, helping to safeguard against such incidents.
Having several copies with older versions of data where we can restore from a previous point in time – which is ideally recent as possible – is seen as a key technical capability to recover here.
For many years, the DS8000 and other IBM storage systeme have offered the FlashCopy function, to allow restoring data from a previous point in time.
However, regarding DS8000, using FlashCopy can have some disadvantages, especially when trying to potentially work with many dozens of copy target volumes, e.g.:
- FlashCopy on DS8000 is limited to max. 12 copies per source.
- Each FlashCopy target consumes another volume address, and with around 65,000 volumes max. possible, the number of such addresses is limited.
- When using FlashCopy, there is a performance penalty, when going to ever more target copies: Each additional target copy costs more internal bandwidth to maintain.
- The current implementation of DS8880 FlashCopy works for instance with 16-MiB "Small Extents", when working with space-efficient thin volumes, for capacity allocation granularity. This is not as space efficient as the former "FlashCopy SE" function, that we had on earlier DS8000 models and which works with track-level allocation, but which was not carried forward due to many installations found performing slow as a result of inappropriate design and sizing.
- The FlashCopy target space are volumes that can be mapped to a host, so how they will be secured, both against having their content overwritten or manipulated also, as well as against becoming deleted.
Hence the new DS8880 Safeguarded Copy function has been designed with three main targets in mind:
- Securing the data for the Safeguarded Copies to prevent this data being compromised either accidentally or deliberately.
- Enabling a previous Point-in-Time to be either restored to the production volumes or made available on another set of volumes while the production environment continues to run.
- Enabling the capturing of many (500 for initial delivery) Point-in-Time images of a production environment with optimised capacity usage and minimised performance impact.
Having this new function, it's now possible to not only restore an entire environment back, but also to work first through several versions of older copies to see which one most recent is corruption free.
Or can even think of some automated way to regularly check with an analytics agent on earlier point-in-time copies to see if some suspicious changes are going on, allowing early detection of a problem.
Safeguarded Copy works with track-level space allocation granularity, with a track being around 64 KB so a factor of 256 more granular than small-extent FlashCopy. But avoiding any allocation performance penalty out of this, since we are working within some extra backup capacity area to store the copies, instead of lots of individual target volumes.
And there is no performance penalty even when scaling to hundreds of targets per single source volume: Even then it only costs as much internal bandwidth like when doing just one internal FlashCopy.
And this backup capacity is additionally protected in several aspects:
- It cannot be deleted, while there are any remaining targets inside.
- To restore data from there, you first need to restore them to an extra and separate recovery volume.
- Using the Copy Services Manager (CSM) is mandatory for this to work, and the whole logic of creating the regular copies lies with CSM only, not the DS CLI or GUI of the DS8000. So a command similar to "rmflash" (removing a FlashCopy relation) could not be issued here.
- Source volumes which have some SGC targets will be additionally protected, as long as such targets exist.
Regular and automated copies can be done in intervals as short as 30 min. A new session type "Safeguarded Copy" is existing in CSM for this.
Prerequisites are CSM versions 184.108.40.206 or higher, and DS8880 code bundle versions 220.127.116.11 or higher.
The Backup capacity can be specified, and there is an automated way for letting old copies expire, e.g. after a week. In case that reserved and dedicated Backup capacity area would not be large enough to hold as many copies as we want, the Safeguarded Copy function will delete the oldest copies automatically and prematurely, to allow creating newer copies, but at the same time raise an alert.
In a comparison with FlashCopy, and apart from the performance and security aspects: Safeguarded Copy does not replace FlashCopy and we believe both technologies to remain relevant in Logical Corruption Protection scenarios. FlashCopy provides an instantly accessible copy of a production volume or dataset, and for multiple FlashCopies each copy is independent from the others from a data perspective.
Safeguarded copies could be used to take many frequent copies of a production environment (e.g. hourly copies maintained for a number of days) while FlashCopy continues to be used to take a small number of less frequent copies (e.g. weekly copies maintained for 1–2 weeks).
Like FlashCopy, Safeguarded Copy can be combined with 2-site, 3-site, or even 4-site PPRC topologies.
Sizing the Safeguarded Backup Capacity can be a bit tricky, considering the potentially hundreds of older point-in-time copies, for a single source volume.
That capacity sizing will involve to determine the required physical extent pool space for the Safeguarded Copy backups, and we need to specify some Backup Capacity Multiplier for the virtual capacity for all backup volumes.
For sure all that required space will depend on the frequency of backups to be taken, and on the retention period for the backups. We also need to set aside some extra capacity for the recovery volumes and size these correctly in case they are thin-provisioned volumes. But then the needed capacity will mainly depend on the write activity of the operational volumes, and here especially those writes which lead to destages. As mentioned in the respective Redpaper chapter (link below), IBM Spectrum Control comes with a metric "Cache-to-disk transfer rate", to monitor track destage rates, or for mainframe clients there is the SMF 74-5 record type to tell more on that kind of volume cache activity. Alternative options for this sizing task are to monitor out-of-sync tracks from either suspended PPRC relations, or FlashCopy no-copy relationships.
ATS can help here not only with the system design, but also with such sizing questions, if you open a ticket to us (w3) using