IBM Support

Do not provision 100% of the physical flash

Troubleshooting


Problem

When flash arrays are provisioned to 100% of their capacity and are eventually filled up, the over-provisioned space will eventually be insufficient to keep up with write operations and garbage collection. The resulting write amplification with insufficient free pages can result in a variety of drive problems. In multi-tier pools, Easy Tier will promote hot extents into flash tiers. As a result, many IO requests will hit the SSD tier and force more ops to more regions of the flash drives and eventually, the flash drives will run out of free pages to maintain themselves (depending on write workload and over-provisioning in the drive). When Flash drives run out of space and lack sufficient overhead to perform their garbage collection and read_modify_write processes the result is high drive response times which may PFA the drive.

Symptom

  • Poor drive level performance over time (response times near or above 1ms/op)
  • Error codes 1370, 1340, 1322, 1680, 1685, and 1686 against multiple drives in the same array (typically)
  • Excessive drive failures that may result in array failures and offline pools

Cause

Flash drives depend on free pages being available to process new write operations and to be able to quickly process garbage collection. Without some level of free space, the internal operations to maintain drive health and host requests may over-work the drive causing the software to proactively fail the drive, or a hard failure may occur in the form of the drive becoming write-protected (0 free space left).

Environment

  • Flash/SSD arrays that are in an Easy Tier (multi-tier) pool
  • Flash/SSD arrays in a single array pool with used capacity at 85+%

Diagnosing The Problem

  1. Find one or more of the symptoms previously listed
  2. Identify the array/pool of the problematic drives
  3. Check the capacity
    1. Using lsmdiskgrp check if the customer configuration matches the environment listed in this page
    2. On systems with drive level compression, check lsdrive output for the various capacity values
lsdrive 0 | grep capacity
effective_used_capacity: 16.59TB
capacity: 20.0TB
physical_capacity: 8.73TB
physical_used_capacity: 8.51TB
In this case, 8.51TB/8.73TB is roughly 95% physical used. It is also helpful to check if the configuration matches the environments listed.

Resolving The Problem

Develop a plan to migrate data off of the flash array such that the used space is less than 85% (80% if being virtualized behind SVC or some other virtualization engine).  For assistance in making this plan, please refer to tip ibm10717581.

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STSLR9","label":"IBM FlashSystem 9x00"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"ST3FR7","label":"IBM Storwize V7000"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STHGUJ","label":"IBM Storwize V5000"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STPVGU","label":"SAN Volume Controller"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"STKMQV","label":"IBM FlashSystem V9000"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
28 March 2023

UID

ibm10874814