Question & Answer
I have a controller that does its own data reduction while virtualized under Spectrum Virtualize and it has run out of physical space. What can be done to reclaim space in order to prevent it reaching 100% full and taking mdisks offline?
A Space Efficient controller virtualized under Spectrum Virtualize is out of physical space and Spectrum Virtualize does not have any indication the controller is, in fact, out of space. The out of space condition can happen for various reasons including:
- Poorer than expected capacity savings on the virtualized controller
- Under-sizing the capacity need for the solution
- The virtualized controller is presenting thin provisioned volumes to the Spectrum Virtualize cluster, as well as other hosts, that results in space being consumed faster than expected
The ultimate objective is to move data off one or more of the affected mdisks onto alternate storage or to use SCSI Unmap to free space on the back-end storage. The best way to move this data varies, based on the system configuration and circumstances surrounding the event. If you are unsure about what the best plan is or not comfortable with the contents of this page, it is highly recommended to contact IBM support before taking any action on the system. During the space reclamation action plans, it is required to have hosts accessing the out of space pool to be powered off.
Analyzing the situation
This stage of the recovery is meant to gather the pertinent details of the state of the system in order to form the recovery strategy. The following information needs to be known:
- Does the Spectrum Virtualize system support SCSI Unmap?
- Are there any active migration tasks?
- Are the affected volumes protected by replication, mirroring, or HyperSwap?
- Is alternate storage available to migrate data?
- Is the affected pool a Data Reduction Pool? If yes, what is the reclaimable capacity?
If there are active migration tasks, contact the IBM Support Center for assistance in devising a plan to either delete the migrating volumes, abort the migration, or safely complete the migration.
If the affected data set is protected by volume mirroring, HyperSwap, or an effective disaster recovery site it might be more expedient to fail over and resynchronize.
Reclaiming space in a standard pool
The typical action plan is to free up space in the out-of-space array by deleting mdisks from that controller out of the pool, off the controller itself, and relying on the controller to reclaim that capacity. This requires having storage from a controller or array that is not out of space in the same pool to move data off the out-of-space system and safely delete mdisks from the pool without data loss. The general procedure is as follows:
- Identify alternate storage in the same pool as the out of space array and ensure it is sufficient to vacate at least one mdisk from the out of space array.
- (a.) Capacity can be queried by by using the lsfreeextents command. This command returns the number of free extents on an mdisk.
- (b.)The capacity of the mdisk divided by the extent size is the total number of mdisk extents
- (c.)The capacity to be migrated is the total mdisk extents (b) less the free extents (a) from one of the mdisks that is out of space
- The target of the migration needs to have sufficient free extents to hold the capacity to be migrated (c.). If needed, multiple mdisks might be used to do this. The migration target must be on a storage array that is not out of space.
- Once the migration source and target is identified and in order to perform manual extent migrations, you must disable Easy Tier in the pool where the migration is to take place. Note the content in <> is meant to be replaced with actual values from the system.
chmdiskgrp -easytier measure <pool id>
- As a precaution, power off all hosts that access storage pools that have mdisks from the out of space array. This step prevents any host writes from potentially compromising the recovery plan.
- Follow instructions from the product support team of the out of space array to unlock the array to enable hosts to mount it in read/write mode (the host in this case is the Spectrum Virtualize system managing the array). If running version 22.214.171.124 or above, it is also possible to mount the mdisk in a write protected state using an IBM Support Only CLI command.
- Once the array is unlocked, execute the includemdisk command against the mdisk that has been identified as the migration source to bring it online.
- Once the source mdisk is online and the target mdisk is identified, use the migrateexts command to move data from the source to the target.
- To obtain all the needed command arguments, run the command lsmdiskextent against the source mdisk.
- Once this information is known use migrateexts to perform the migration
- This whole process can be scripted as follows:
lsmdiskextent -nohdr -delim " " <source mdisk> | while read vdisk count copy; do migrateexts -source <source mdisk> -target <target mdisk> -vdisk $vdisk -exts $count -copy $copy; echo "Moving vdisk "$vdisk; sleep 60; done
Note that only 32 migration tasks can be queued at a time. As such you might get errors failing to queue additional tasks. Simply break the script and rerun it when it is possible to queue additional migration tasks. As with the previous sample the values in <> are meant to be replaced with actual values from the system.
- Once the migrations are complete, validate the migration source is empty by using the lsmdiskextent command. If the mdisk is empty, this command returns nothing.
- Once it is validated that the migration source mdisk is empty, remove it from the pool using the rmmdisk command or using the GUI.
- Once the mdisk is out of the pool in an unmanaged state, it is safe to delete the backing LUN from the out of space controller and allow the backing controller to perform any needed procedures to reclaim the space.
This procedure might need to be repeated multiple times depending on how much capacity needs to be reclaimed by the out of space controller. Consult the support team of the out of space array.
Reclaiming space in a data reduction pool
Due to the basic organization of a data reduction pool, extent level migrations are generally not possible. However, data reduction pools make full use of the SCSI Unmap functionality including a garbage collection process that can be used to reclaim space. Because of this capability, there are a few different methods that can be used to reclaim space. If the out of space storage is in a data reduction pool, it is recommended to contact IBM Support.
Strategies to address the situation generally include:
- Create fully allocated, sequential, volumes to consume the capacity mdisks in the pool.
- Fully allocated volumes in a Data Reduction Pool will reserve extents in the data volume, but do not write to the mdisks. This makes Garbage Collection to increase its priority to reclaim garbage extents and send unmaps to the back-end while reducing the reclaimable capacity.
- Use host trim commands to get an accurate accounting of the free space of the volumes in the pool and to accurately report the reclaimable space.
- Consider migrating volumes to another pool using volume mirroring to move data out of the affected pool.
- If alternate storage can be found to replace an mdisk, it is also possible to make fully allocated volumes to reserve all the mdisk extents on the out of space system. Once this step is done, it is possible to use rmmdisk -force to force a mdisk to migrate all of its data onto other available storage in the pool and eventually remove one or more mdisks from the pool.
- The rmmdisk command does not allow you to specify the target mdisks for the migration tasks it creates. Because of this limitation, the step to create fully allocated, sequential, volumes to reserve all of the extents on the out of space controller is critical to success.
Note that all of these strategies require unlocking the write protected array or unlocking the array. As in a standard pool, it is recommended to power off all hosts accessing the pool in order to avoid host writes from impacting the success of the plan.
02 April 2020