PH62082: WHEN USING NVME STORAGE FOR TEMPSPACE1, AN UPGRADE OR RESET OF THE ACCELERATOR COULD LEAD TO AN OUTAGE

APAR status

Closed as documentation error.

Error description

When running the Accelerator deployed on IBM Z in a multi-node
cluster together with using NVMe (nonvolatile memory express)-
storage for TEMPSPACE1,
- a Reset (with or without wipe),
- a Shutdown/Deactivate/Activate of the LPARs,
- or an Accelerator upgrade
could lead to an outage: the head node and the data nodes will
be down.
The issue can be accompanied by the following error messages
(partially only seen in the internal logs of the database
engine)
- AQTST030E
- "SQL0290N Table space access is not allowed. SQLSTATE=55039"
- "ADM6047W The table space "TEMPSPACE1" (ID "1") is in the
DROP_PENDING state. The table space will be kept OFFLINE. The
table space state is "0x0000C000". This table space is
unusable and should be dropped."
- "SQL1034C The database was damaged " - " WARNING: Failed to
activate BLUDB database on some or all of the members ..."
Manual intervention by IBM support will be required to
re-activate the Accelerator (see the Local Fix/Workaround
section).

Please note:
Although one of the messages occurring could indicate a damage
of the database, the integrity of the data kept in the
accelerator-shadow tables or accelerator-only tables will not be
harmed.
The issue occurs during the startup of the accelerator: the
DB2wh engine wants to access the pool for temporary data defined
on NVMe storage, however, the tempspace on one or more nodes
(LPARs) is not yet available for processing.

A fix for the subject issue will be delivered with Accelerator
maintenance level 7.5.12.3.

Additional keywords:
TS016417509 TS016545740 TS017486846
NVME AQTST030E SQL0290N ADM6047W
SQL1034C TEMPSPACE1 DATABASE DAMAGED DT390652
GH/Everest/customer-cases/issues/716

Local fix

Run an online session with the IBM technical support team:
as db2inst1 user:

-- check the TablespaceID for TEMPSPACE1:
db2pd -db bludb -tablespace all
-- check status of TS whether it contains zeroes only and not
0x0000C000 (which indicates the DROP PENDING):
db2pd -db bludb -tablespace 1 -member all |grep -A1 -i "Status"
-- does not show EXPLICIT as activation_state:
db2 "select member,db_conn_time,db_activation_state from
table(mon_get_database(-2)) order by member"
-- restart drop pending, the key here is to use "db2_all" to
apply the command on all nodes.
db2_all 'db2 "restart database bludb drop pending tablespaces
(TEMPSPACE1)"'
as root (inside docker):
run script
/head/vidaa_scripts/restart_db2_with_broken_tempspace1.sh
as db2inst1:
db2 activate db bludb

via Admin GUI:
reset w/o wipe ---to re-install docker, the DROP/CREATE
TEMPSPACE1 will succeed now and it will be back on transient
storage.

Problem summary

Problem Summary:
See APAR Error description

Users Affected:
Customers, running the Accelerator deployed on IBM Z in a
multi-node cluster together with using NVMe (nonvolatile memory
express)-storage for TEMPSPACE1

Problem Scenario:
See APAR Error description.

Problem Symptoms:
See APAR Error description.

Problem conclusion

Conclusion:
The issue has been fixed with Accelerator maintenance level
7.5.12.3.

Upgrade your Accelerator environments accordingly.

Temporary fix

Comments

APAR Information

APAR number
PH62082
Reported component name
ANYTCS ACCLTR Z
Reported component ID
5697DA700
Reported release
750
Status
CLOSED DOC
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2024-06-27
Closed date
2024-07-21
Last modified date
2024-10-03

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Business Unit":{"code":"BU011","label":"Systems - zSystems software"},"Product":{"code":"SG19M"},"Platform":[{"code":"PF054","label":"z Systems"}],"Version":"750"}]

Document Information

Modified date:
03 October 2024

Tips

PH62082: WHEN USING NVME STORAGE FOR TEMPSPACE1, AN UPGRADE OR RESET OF THE ACCELERATOR COULD LEAD TO AN OUTAGE

Subscribe

APAR status

Closed as documentation error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

Document Information

Share your feedback

Need support?