Flashes (Alerts)
Abstract
A new performance optimization of deduplication in Data ONTAP® 8.2 introduced a rare race condition in which file system inconsistencies can be introduced when applications overwrite isolated 4K blocks of shared data while deduplication is running.
Content
Problem Description
A new performance optimization of deduplication was introduced in Data ONTAP 8.2 to increase the speed at which deduplication runs in certain scenarios. A software issue in the performance optimization code introduced a race condition that can occur on busy systems. This rare race condition can occur when a new client or host overwrites the same block that the deduplication process is trying to actively deduplicate, while the deduplication process suspends with incorrect state information for the block. The new single 4KB block overwrite can either be replaced with stale contents or be accidentally marked as a free block, leading to a file system inconsistency.
Symptom
This issue can result in a disruption to storage services with the following error messages:
PANIC Illegal container vbn 0 for a loadable vvbn #
PANIC ../common/wafl/space_cpledger.c:335: Assertion failure fs->blks.used_by_plane0 ≤ fs->blks.used
PANIC Mismatch found between spacemap fbn = # and sum of active-map and summary-map fbn
PANIC Spacemap of volume ... is not in-sync with snapmap of deleted snapshot in volume ...
Followed by the deduplicated volume in question being marked inconsistent.
Systems encountering this issue might also show the following error messages:
[wafl.raid.incons.userdata:error]: WAFL inconsistent: inconsistent user data block at ... level:0 in public inode ... error:117
[wafl.incons.userdata.vol:error]: WAFL inconsistent: volume ... has an inconsistent user data block.
[callhome.wafl.inconsistent.user.block:ALERT]: Call home for WAFL INCONSISTENT USER BLOCK
These error messages indicate that a 4KB portion of user data has been impacted by this issue. Clients or hosts reading that 4KB portion of data will receive an error when trying to access the data in question, but the rest of the file or LUN may be available for data access.
If you have encountered any of the above error messages, contact your IBM support representative.
Workaround
This BUG can be avoided by disabling deduplication on all volumes in Data ONTAP 8.2RC1, 8.2, 8.2.P1 or 8.2P2. Deduplication should not be re-enabled until the system is upgraded to Data ONTAP 8.2P3 or later. The commands to disable deduplication are documented in the Solution section below.
If deduplication is not disabled, reducing the system workload while the deduplication process is running can further lower the likelihood of encountering this issue. Reducing or changing a 4KB random overwrite workload to modify more than 4KB of data in a single I/O can also lower the unlikely chance of encountering this issue. Since it is difficult to control the patterns of how client or host applications write data to the storage system, IBM recommends that deduplication be turned off on all systems running Data ONTAP 8.2x, until they can be upgraded to Data ONTAP 8.2P3 or later.
Solution
Disable deduplication on Data ONTAP 8.2x systems and plan to upgrade those systems to Data ONTAP 8.2P3 or later as soon as is operationally feasible (where a software fix for this issue will be delivered). Do not re-enable deduplication until the system can be upgraded to Data ONTAP 8.2P3 or later. Data ONTAP 8.2P3 or later is available for download from the IBM Support Site.
Note: Disabling deduplication only prevents new data from being deduplicated. Any existing space savings from deduplication will not be affected (as it is not necessary to run any “SIS undo” commands), and deduplication may be re-enabled once the system has upgraded to Data ONTAP 8.2P3 or later.
Disabling deduplication can be performed by issuing the following commands for each deduplicated volume hosted on the system:
Data ONTAP 7-Mode:
sis off </vol/volname>
Example: node> sis off /vol/volHomeD
Clustered Data ONTAP:
volume efficiency off –vserver <vservername> -volume <volname>
Example: cluster1::> volume efficiency off -vserver vs1 -volume volArchive
Was this topic helpful?
Document Information
Modified date:
25 September 2022
UID
ssg1S1004443