Troubleshooting
Problem
When running operations that require tape drives inside a tape library sometimes drives are not available when it appears that there should be available drives.
Symptom
CPF415E MSGCPF415E BRM1033 MSGBRM1033
Environment
Fibre-Attached media library being shared by multiple IBM i hosts
Resolving The Problem
There are multiple reasons that a tape drive may not be available in a media library when it appears that there should be but it all comes down to reservations on tape resources.
- When an operation needing a tape resource runs the tape code will attempt to reserve a tape drive. If it cannot get a reservation on it's first choice it will try again to reserve a different drive. It will do this repeatedly until it either gets a reservation or all drive resources have been tried. Assuming a drive is found that is available a 'reservation' is put on the tape drive resource. This reservation exists on the tape drive itself and is assigned to a specific adapter port which is identified by World Wide Port Name (WWPN). This reservation will be persistent until the host releases the reservation. This reservation release can only come from the port adapter with the WWPN matching the WWPN that initiated the reservation. If anything stops the host from releasing the reservation the drive will remain reserved and can only be used by that specific port adapter.
- Once a drive is reserved and a tape needs to be mounted, tape code will check to see if the requested cartridge is already mounted in another drive. If the tape is mounted in another drive tape code will attempt to reserve the tape drive that the tape is already mounted in. If it gets the reservation for a short period of time the job will have caused two drives to be reserved. The first drive will be released shortly, however if multiple jobs are starting at the same time that need drives it is possible that each job could reserve multiple drives for short periods of time and some job may not be able to get a drive. Operations requiring more than one tape resource such as DUPMEDBRM or DUPTAP can magnify this issue. Slightly staggering job start times can reduce this possibility.
- Different types of errors (user, job, device) can cause a reservation to be left hanging on a drive. Once this happens, only the system using the adapter port with the proper WWPN will be able to use the tape drive or release the reservation.
- Although it may not be common; changes to the fabric can make it impossible for a system to release a reservation. For example:
> Job Runs and TAP01 is reserved
> Job ends abnormally and does not release reservation
> Fibre cable is moved to a different adapter port
> At this point there is no way any of the host systems can release this reservation...
Some examples of situations where a drive reservation may be left 'hanging' on a drive resource:
- There was a device error during a tape operation and the drive went into a failed state. Tape code cannot release the reservation. (When this happens use option 6=Deallocate resource and then option 5=Allocate unprotected to recover. DO NOT use option 3=Reset resource from WRKMLBSTS to try to recover as that will NOT release the reservation)
- The job was ended or cancelled by a user
- The system was IPL'ed *IMMED while the drive was reserved
- ETC...
The following steps could be used to help to narrow down what system may have had a problem or left a reservation on the drive and release a hanging reservation:
VRYCFG CFGOBJ(Media_Library_DEVD) CFGTYPE(*MLBRSC) STATUS(*ALLOCATE) RSRCNAME(Tape_Drive_Resource_Name)
If the drive is able to be allocated one of the following is true:
- No system currently has a reservation on the drive
- The system running the command owns the adapter with the WWPN that has the reservation on the drive
VRYCFG CFGOBJ(Media_Library_DEVD) CFGTYPE(*MLBRSC) STATUS(*UNPROTECTED) RSRCNAME(Tape_Drive_Resource_Name)
The command above will release the reservation on any drive that it was able to allocate and put the drive back in UNPROTECTED allocation.
IBM i does not have a central system that manages tape resources within a media library so how is the tape drive usage and sharing managed?
- When an operation needing a tape resource runs the tape code will attempt to reserve a tape drive. If it cannot get a reservation on it's first choice it will try again to reserve a different drive. It will do this repeatedly until it either gets a reservation or all drive resources have been tried. Assuming a drive is found that is available a 'reservation' is put on the tape drive resource. This reservation exists on the tape drive itself and is assigned to a specific adapter port which is identified by World Wide Port Name (WWPN). This reservation will be persistent until the host releases the reservation. This reservation release can only come from the port adapter with the WWPN matching the WWPN that initiated the reservation. If anything stops the host from releasing the reservation the drive will remain reserved and can only be used by that specific port adapter.
- Once a drive is reserved and a tape needs to be mounted, tape code will check to see if the requested cartridge is already mounted in another drive. If the tape is mounted in another drive tape code will attempt to reserve the tape drive that the tape is already mounted in. If it gets the reservation for a short period of time the job will have caused two drives to be reserved. The first drive will be released shortly, however if multiple jobs are starting at the same time that need drives it is possible that each job could reserve multiple drives for short periods of time and some job may not be able to get a drive. Operations requiring more than one tape resource such as DUPMEDBRM or DUPTAP can magnify this issue. Slightly staggering job start times can reduce this possibility.
- Different types of errors (user, job, device) can cause a reservation to be left hanging on a drive. Once this happens, only the system using the adapter port with the proper WWPN will be able to use the tape drive or release the reservation.
- Although it may not be common; changes to the fabric can make it impossible for a system to release a reservation. For example:
> Job Runs and TAP01 is reserved
> Job ends abnormally and does not release reservation
> Fibre cable is moved to a different adapter port
> At this point there is no way any of the host systems can release this reservation...
Some examples of situations where a drive reservation may be left 'hanging' on a drive resource:
- There was a device error during a tape operation and the drive went into a failed state. Tape code cannot release the reservation. (When this happens use option 6=Deallocate resource and then option 5=Allocate unprotected to recover. DO NOT use option 3=Reset resource from WRKMLBSTS to try to recover as that will NOT release the reservation)
- The job was ended or cancelled by a user
- The system was IPL'ed *IMMED while the drive was reserved
- ETC...
The following steps could be used to help to narrow down what system may have had a problem or left a reservation on the drive and release a hanging reservation:
VRYCFG CFGOBJ(Media_Library_DEVD) CFGTYPE(*MLBRSC) STATUS(*ALLOCATE) RSRCNAME(Tape_Drive_Resource_Name)
If the drive is able to be allocated one of the following is true:
- No system currently has a reservation on the drive
- The system running the command owns the adapter with the WWPN that has the reservation on the drive
VRYCFG CFGOBJ(Media_Library_DEVD) CFGTYPE(*MLBRSC) STATUS(*UNPROTECTED) RSRCNAME(Tape_Drive_Resource_Name)
The command above will release the reservation on any drive that it was able to allocate and put the drive back in UNPROTECTED allocation.
[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Platform":[{"code":"PF012","label":"IBM i"}],"Version":"7.1.0"}]
Was this topic helpful?
Document Information
Modified date:
22 April 2020
UID
nas8N1020679