IBM Support

Running deduplication concurrently with a random, 4KB overwrite workload on heavily utilized systems can cause file system inconsistencies and a disruption of storage services.

Flashes (Alerts)


Abstract

A new performance optimization of deduplication in Data ONTAP® 8.2 introduced a rare race condition in which file system inconsistencies can be introduced when applications overwrite isolated 4K blocks of shared data while deduplication is running.

Content

Problem Description

A new performance optimization of deduplication was introduced in Data ONTAP 8.2 to increase the speed at which deduplication runs in certain scenarios. A software issue in the performance optimization code introduced a race condition that can occur on busy systems. This rare race condition can occur when a new client or host overwrites the same block that the deduplication process is trying to actively deduplicate, while the deduplication process suspends with incorrect state information for the block. The new single 4KB block overwrite can either be replaced with stale contents or be accidentally marked as a free block, leading to a file system inconsistency.

Symptom

This issue can result in a disruption to storage services with the following error messages:
PANIC Illegal container vbn 0 for a loadable vvbn # 
PANIC ../common/wafl/space_cpledger.c:335: Assertion failure fs->blks.used_by_plane0 ≤ fs->blks.used
PANIC Mismatch found between spacemap fbn = # and sum of active-map and summary-map fbn
PANIC Spacemap of volume ... is not in-sync with snapmap of deleted snapshot in volume ...

Followed by the deduplicated volume in question being marked inconsistent.  

Systems encountering this issue might also show the following error messages:
[wafl.raid.incons.userdata:error]: WAFL inconsistent: inconsistent user data block at ... level:0 in public inode ... error:117
[wafl.incons.userdata.vol:error]: WAFL inconsistent: volume ... has an inconsistent user data block.
[callhome.wafl.inconsistent.user.block:ALERT]: Call home for WAFL INCONSISTENT USER BLOCK

These error messages indicate that a 4KB portion of user data has been impacted by this issue. Clients or hosts reading that 4KB portion of data will receive an error when trying to access the data in question, but the rest of the file or LUN may be available for data access.

If you have encountered any of the above error messages, contact your IBM support representative.

Workaround

This BUG can be avoided by disabling deduplication on all volumes in Data ONTAP 8.2RC1, 8.2, 8.2.P1 or 8.2P2. Deduplication should not be re-enabled until the system is upgraded to Data ONTAP 8.2P3 or later. The commands to disable deduplication are documented in the Solution section below.

If deduplication is not disabled, reducing the system workload while the deduplication process is running can further lower the likelihood of encountering this issue. Reducing or changing a 4KB random overwrite workload to modify more than 4KB of data in a single I/O can also lower the unlikely chance of encountering this issue. Since it is difficult to control the patterns of how client or host applications write data to the storage system, IBM recommends that deduplication be turned off on all systems running Data ONTAP 8.2x, until they can be upgraded to Data ONTAP 8.2P3 or later.

Solution

Disable deduplication on Data ONTAP 8.2x systems and plan to upgrade those systems to Data ONTAP 8.2P3 or later as soon as is operationally feasible (where a software fix for this issue will be delivered). Do not re-enable deduplication until the system can be upgraded to Data ONTAP 8.2P3 or later. Data ONTAP 8.2P3 or later is available for download from the IBM Support Site.

Note
: Disabling deduplication only prevents new data from being deduplicated. Any existing space savings from deduplication will not be affected (as it is not necessary to run any “SIS undo” commands), and deduplication may be re-enabled once the system has upgraded to Data ONTAP 8.2P3 or later.

Disabling deduplication can be performed by issuing the following commands for each deduplicated volume hosted on the system:

Data ONTAP 7-Mode:

sis off </vol/volname>
Example: node> sis off /vol/volHomeD

Clustered Data ONTAP:

volume efficiency off –vserver <vservername> -volume <volname>
Example: cluster1::> volume efficiency off -vserver vs1 -volume volArchive

[{"Product":{"code":"nseries","label":"IBM System Storage N series"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Not Applicable","Platform":[{"code":"","label":"Data ONTAP"}],"Version":"8.2","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
25 September 2022

UID

ssg1S1004443