Troubleshooting medium errors and bad blocks
A storage system returns a medium error response to a host when it is unable to successfully read a block. The system response to a host read follows this behavior.
The system allocates volumes from the extents that are on the managed disks (MDisks). The MDisk can be a volume on an external storage controller or a RAID array that is created from internal drives. In either case, depending on the RAID level that is used, there is normally protection against a read error on a single drive. However, it is still possible to get a medium error on a read request if multiple drives have errors or if the drives are rebuilding or are offline due to other issues.
The system provides migration facilities to move a volume from one underlying set of physical storage to another. In all these cases, the migrated volume or the replicated volume returns a medium error to the host when the logical block address on the original volume is read. The system maintains tables of bad blocks to record where the logical block addresses that cannot be read are. These tables are associated with the MDisks that are providing storage for the volumes.
It is possible that the tables that are used to record bad block locations can fill up. The table can fill either on an MDisk or on the system as a whole. If a table does fill up, the migration or replication that was creating the bad block fails because it was not possible to create an exact image of the source volume.
This table lists the bad block error codes. The recommended actions for these alerts guide you in correcting the situation.
Error code | Description |
---|---|
1840 | The managed disk has bad blocks. On an external controller, this error must be a copied medium error. |
1226 | The system fails to create a bad block because the MDisk already has the maximum number of allowed bad blocks. |
1225 | The system fails to create a bad block because the system already has the maximum number of allowed bad blocks. |
Clear bad blocks by deallocating the volume disk extent, by deleting the volume or by issuing write I/O to the block. It is good practice to correct bad blocks as soon as they are detected. This action prevents the bad block from being propagated when the volume is replicated or migrated. However, it is possible for the bad block to be on part of the volume that is not used by the application. For example, it can be in part of a database that is not initialized. These bad blocks are corrected when the application writes data to these areas. Before the correction happens, the bad block records continue to use up the available bad block space.