Troubleshooting
Problem
Troubleshooting while running nzredrexpand.
Symptom
This process encountered the following issues:
-
Multiple open sessions
[2025-03-07 19:52:06.193850 UTC] RetCode = 1 [2025-03-07 19:52:04.062911 CET] Status: disabling database access... [2025-03-07 19:52:04.114514 CET] Status: database access is already disabled [2025-03-07 19:52:04.190755 CET] Status: Error: found open sessions, all sessions must be closed before launching nzredr [2025-03-07 19:52:04.191071 CET] Status: Fatal: found 2 open sessions [2025-03-07 19:52:06.193992 UTC] Current exception State: RedrFailed [2025-03-07 19:52:06.194073 UTC] Updating Current Status: RedrFailed [2025-03-07 19:52:06.194456 UTC] State file: 202773:RedrFailed:8:12:192:288:False:TrueVerify the active sessions and proceed with the resume operation.
-
No such table exists, proceed with the resume operation.
[2025-03-08 01:47:37.695885 CET] Status: == Redistribute table 2368 of 15747 1719470 EAG_DWH_CORE_TEST.ADMIN.NZ_MAT_META_CW_KRM_COVAR_ALL_V2_RMX_2 27.00 Mb 9 extents [2025-03-08 01:47:38.165917 CET] Status: Fatal: 2147483646 : No such table exists [2025-03-08 01:47:45.199776 UTC] Current exception State: RedrFailed [2025-03-08 01:47:45.199900 UTC] Updating Current Status: RedrFailed [2025-03-08 01:47:45.200264 UTC] State file: 268567:RedrFailed:8:12:192:288:False:True -
Illegal column type, proceed with the resume operation.
[2025-03-08 08:21:13.830181 CET] Status: == Redistribute table 1528 of 13372 15291028 SWM_DWH_PRE_CORE_TEST.ADMIN.NSHD_ISU_B_DRUCKBELEG 1275.00 Mb 425 extents [2025-03-08 08:21:14.400096 CET] Status: Fatal: 0 : illegal column type in computeFieldSizes [2025-03-08 08:21:21.438866 UTC] Current exception State: RedrFailed [2025-03-08 08:21:21.439043 UTC] Updating Current Status: RedrFailed [2025-03-08 08:21:21.439556 UTC] State file: 778453:RedrFailed:8:12:192:288:False:True -
Transaction rolled back due to restart or failover
[2025-03-07 18:27:56.689515 CET] Status: Table 1411712 migrated 80086 of 80086 extents 100.0% Time elapsed 276.7s remaining 0.0s | Total 279250 of 8531066 extents 3.3% Time elapsed 1125.6s remaining 33260.9s [2025-03-07 18:36:30.165345 CET] Status: Warning: transaction rollback: Transaction rolled back due to restart or failover [2025-03-07 18:54:59.750057 CET] Status: Fatal: System state (Pausing Now) invalid for request [2025-03-07 18:55:06.776096 UTC] Current exception State: RedrFailed [2025-03-07 18:55:06.776194 UTC] Updating Current Status: RedrFailed [2025-03-07 18:55:06.776587 UTC] State file: 46136:RedrFailed:8:12:192:288:False:TrueResolving the problem- Check the sysmgr logs for root cause of restart
- Execute the following command to collect the core backtraces.
apdiag collect --components ips/analyze_cores -
Examine the nz_analyzecore_report.out.
-
If SPU cores are present, verify whether the backtrace matches the ones listed below:
Keywords to search: DsidToRecordStore, UnpackFpgaPacket.(gdb) #0 0x00007feeaaa8658b in raise () from /nz/kit/sys/cc/mcpnps/sysroot/lib64/libpthread.so.0 #1 0x0000000000704ece in _raise_dfl (sig=sig@entry=11) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1672 #2 0x0000000000705039 in _crash_handler (sig=11, si=<optimized out>, ignore=<optimized out>) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1780 #3 <signal handler called> #4 0x0000000000800aa9 in CTable::DsidToRecordStore (this=0x5e326410, dsId=133) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/sys/tablesliceinfo.cpp:298 #5 0x000000000080a82c in TDownloadTableNode::spuDistWrapup (this=0x59eff4a8) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/xp/xpdownload.cpp:2267 #6 0x00007feeab6feeb2 in GenPlan(CPlan*, char*, char*, bool) () from /tmp/Eni2eWQRBd/10932_1_133.o #7 0x000000005aebd118 in ?? () #8 0x0000000059eff950 in ?? () #9 0x0000000000614178 in _do_gened_code_final_call (plan=0x100000000) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2102 #10 _handle_scan_io_complete (dj=<optimized out>, plan=0x100000000) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2159 #11 spueventEmuJobSink::processJob (this=this@entry=0x7ffdd9db3190, job=<optimized out>) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:1917 #12 0x00000000007cbec0 in filterPassthrough (sink=..., source=...) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/comp/emu/emuJobFilter.cpp:66 #13 emuJobFilter::filter (source=..., sink=..., passthrough=<optimized out>) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/comp/emu/emuJobFilter.cpp:47 #14 0x00000000006131f2 in SpuProcessPlanEvents (plan=0x59efd660) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2044 #15 0x00000000007054aa in _invoke_entry (arg=<optimized out>, entry=<optimized out>) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:788 #16 _childproc (time_slice=10, priority=10, jobtask=0x59f71d18) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1134 #17 __create_task (jobtask=0x59f71d18, priority=priority@entry=10, time_slice=time_slice@entry=10) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1199 #18 0x0000000000706a13 in _create_tasks (priority=10, time_slice=10, req=0x5a543dd8) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1233 #19 _create_job_tasks (req=<optimized out>) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1582 #20 nzprocmgr_main () at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1925 #21 0x000000000060c2b0 in NzSpuMain () at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spumain.cpp:1081 #22 0x00000000005ff195 in main (argc=<optimized out>, argv=<optimized out>) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spumain.cpp:1090 (gdb)(gdb) #0 0x00007f6877c5e58b in raise () from /nz/kit/sys/cc/mcpnps/sysroot/lib64/libpthread.so.0 #1 0x0000000000704ece in _raise_dfl (sig=sig@entry=6) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1672 #2 0x0000000000705039 in _crash_handler (sig=6, si=<optimized out>, ignore=<optimized out>) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1780 #3 <signal handler called> #4 0x00007f6877099267 in raise () from /nz/kit/sys/cc/mcpnps/sysroot/lib64/libc.so.6 #5 0x00007f687709a958 in abort () from /nz/kit/sys/cc/mcpnps/sysroot/lib64/libc.so.6 #6 0x00000000008636d6 in CError_abort (err=...) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/sys/error.cpp:511 #7 0x0000000000863c92 in CError_AssertFailed (line=line@entry=2851, src=src@entry=0xbe57f0 "/gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/fpgawrap.cpp") at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/sys/error.cpp:576 #8 0x000000000063863d in CFpgaWrap::UnpackFpgaPacket (this=0x5b33e000, pPlan=pPlan@entry=0x5b3398e0, pPacket=0x5b33ada0, XIDMode=<optimized out>, totalXIDs=<optimized out>, invisibleXIDs=<optimized out>, xIds=0x5b33adf8) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/fpgawrap.cpp:2851 #9 0x0000000000613390 in _init_scan (plan=0x5b3398e0) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2313 #10 _launch_plan_scan (cl=0x5b33b898, event_queue=0x34cda78, plan=0x5b3398e0) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2410 #11 SpuProcessPlanEvents (plan=0x5b3398e0) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2030 #12 0x00000000007054aa in _invoke_entry (arg=<optimized out>, entry=<optimized out>) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:788 #13 _childproc (time_slice=10, priority=10, jobtask=0x53ec8818) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1134 #14 __create_task (jobtask=0x53ec8818, priority=priority@entry=10, time_slice=time_slice@entry=10) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1199 #15 0x0000000000706a13 in _create_tasks (priority=10, time_slice=10, req=0x5ab319d8) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1233 #16 _create_job_tasks (req=<optimized out>) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1582 #17 nzprocmgr_main () at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1925 #18 0x000000000060c2b0 in NzSpuMain () at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spumain.cpp:1081 #19 0x00000000005ff195 in main (argc=<optimized out>, argv=<optimized out>) at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spumain.cpp:1090 (gdb) -
If the backtrace matches the ones mentioned above, follow these steps:
- Verify nzhw -issues.
- Verify the nzstate. The system should be online.
- Proceed with the resume operation if no issues are found.
-
Run the following commands to collect the logs, if the backtrace is different.
apdiag collect --components platform_manager/ ips/ --minus-components ips/spu_all_logs apdiag collect --components platform_manager/ ips/ --spus <spus which restarted>
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
21 March 2025
UID
ibm17228723