IBM Support

Troubleshooting while running nzredrexpand

Troubleshooting


Problem

Troubleshooting while running nzredrexpand.

 
 

Symptom

This process encountered the following issues:

  1. Multiple open sessions

    [2025-03-07 19:52:06.193850 UTC]  RetCode = 1 
    [2025-03-07 19:52:04.062911 CET] Status: disabling database access...
    [2025-03-07 19:52:04.114514 CET] Status: database access is already disabled
    [2025-03-07 19:52:04.190755 CET] Status: Error: found open sessions, all sessions must be closed before launching nzredr
    [2025-03-07 19:52:04.191071 CET] Status: Fatal: found 2 open sessions
    
    [2025-03-07 19:52:06.193992 UTC]  Current exception State: RedrFailed
    [2025-03-07 19:52:06.194073 UTC]  Updating Current Status: RedrFailed
    [2025-03-07 19:52:06.194456 UTC]  State file: 202773:RedrFailed:8:12:192:288:False:True

    Verify the active sessions and proceed with the resume operation.

  2. No such table exists, proceed with the resume operation.

    [2025-03-08 01:47:37.695885 CET] Status: == Redistribute table 2368 of 15747 1719470 EAG_DWH_CORE_TEST.ADMIN.NZ_MAT_META_CW_KRM_COVAR_ALL_V2_RMX_2 27.00 Mb 9 extents
    [2025-03-08 01:47:38.165917 CET] Status: Fatal: 2147483646 : No such table exists
    
    [2025-03-08 01:47:45.199776 UTC]  Current exception State: RedrFailed
    [2025-03-08 01:47:45.199900 UTC]  Updating Current Status: RedrFailed
    [2025-03-08 01:47:45.200264 UTC]  State file: 268567:RedrFailed:8:12:192:288:False:True
  3. Illegal column type, proceed with the resume operation.

    [2025-03-08 08:21:13.830181 CET] Status: == Redistribute table 1528 of 13372 15291028 SWM_DWH_PRE_CORE_TEST.ADMIN.NSHD_ISU_B_DRUCKBELEG 1275.00 Mb 425 extents
    [2025-03-08 08:21:14.400096 CET] Status: Fatal: 0 : illegal column type in computeFieldSizes
    
    [2025-03-08 08:21:21.438866 UTC]  Current exception State: RedrFailed
    [2025-03-08 08:21:21.439043 UTC]  Updating Current Status: RedrFailed
    [2025-03-08 08:21:21.439556 UTC]  State file: 778453:RedrFailed:8:12:192:288:False:True
  4. Transaction rolled back due to restart or failover

    [2025-03-07 18:27:56.689515 CET] Status:    Table 1411712 migrated 80086 of 80086 extents 100.0% Time elapsed 276.7s remaining 0.0s | Total 279250 of 8531066 extents 3.3% Time elapsed 1125.6s remaining 33260.9s
    [2025-03-07 18:36:30.165345 CET] Status: Warning: transaction rollback: Transaction rolled back due to restart or failover
    [2025-03-07 18:54:59.750057 CET] Status: Fatal: System state (Pausing Now) invalid for request
    
    [2025-03-07 18:55:06.776096 UTC]  Current exception State: RedrFailed
    [2025-03-07 18:55:06.776194 UTC]  Updating Current Status: RedrFailed
    [2025-03-07 18:55:06.776587 UTC]  State file: 46136:RedrFailed:8:12:192:288:False:True
    Resolving the problem
    1. Check the sysmgr logs for root cause of restart
    2. Execute the following command to collect the core backtraces. 
      apdiag collect --components ips/analyze_cores
    3. Examine the nz_analyzecore_report.out.

    4. If SPU cores are present, verify whether the backtrace matches the ones listed below:
      Keywords to search: DsidToRecordStore, UnpackFpgaPacket.

      (gdb) #0  0x00007feeaaa8658b in raise ()
         from /nz/kit/sys/cc/mcpnps/sysroot/lib64/libpthread.so.0
      #1  0x0000000000704ece in _raise_dfl (sig=sig@entry=11)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1672
      #2  0x0000000000705039 in _crash_handler (sig=11, si=<optimized out>, 
          ignore=<optimized out>)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1780
      #3  <signal handler called>
      #4  0x0000000000800aa9 in CTable::DsidToRecordStore (this=0x5e326410, 
          dsId=133)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/sys/tablesliceinfo.cpp:298
      #5  0x000000000080a82c in TDownloadTableNode::spuDistWrapup (this=0x59eff4a8)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/xp/xpdownload.cpp:2267
      #6  0x00007feeab6feeb2 in GenPlan(CPlan*, char*, char*, bool) ()
         from /tmp/Eni2eWQRBd/10932_1_133.o
      #7  0x000000005aebd118 in ?? ()
      #8  0x0000000059eff950 in ?? ()
      #9  0x0000000000614178 in _do_gened_code_final_call (plan=0x100000000)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2102
      #10 _handle_scan_io_complete (dj=<optimized out>, plan=0x100000000)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2159
      #11 spueventEmuJobSink::processJob (this=this@entry=0x7ffdd9db3190, 
          job=<optimized out>)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:1917
      #12 0x00000000007cbec0 in filterPassthrough (sink=..., source=...)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/comp/emu/emuJobFilter.cpp:66
      #13 emuJobFilter::filter (source=..., sink=..., passthrough=<optimized out>)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/comp/emu/emuJobFilter.cpp:47
      #14 0x00000000006131f2 in SpuProcessPlanEvents (plan=0x59efd660)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2044
      #15 0x00000000007054aa in _invoke_entry (arg=<optimized out>, 
          entry=<optimized out>)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:788
      #16 _childproc (time_slice=10, priority=10, jobtask=0x59f71d18)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1134
      #17 __create_task (jobtask=0x59f71d18, priority=priority@entry=10, 
          time_slice=time_slice@entry=10)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1199
      #18 0x0000000000706a13 in _create_tasks (priority=10, time_slice=10, 
          req=0x5a543dd8)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1233
      #19 _create_job_tasks (req=<optimized out>)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1582
      #20 nzprocmgr_main ()
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1925
      #21 0x000000000060c2b0 in NzSpuMain ()
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spumain.cpp:1081
      #22 0x00000000005ff195 in main (argc=<optimized out>, argv=<optimized out>)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spumain.cpp:1090
      (gdb) 
      (gdb) #0  0x00007f6877c5e58b in raise ()
         from /nz/kit/sys/cc/mcpnps/sysroot/lib64/libpthread.so.0
      #1  0x0000000000704ece in _raise_dfl (sig=sig@entry=6)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1672
      #2  0x0000000000705039 in _crash_handler (sig=6, si=<optimized out>, 
          ignore=<optimized out>)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1780
      #3  <signal handler called>
      #4  0x00007f6877099267 in raise ()
         from /nz/kit/sys/cc/mcpnps/sysroot/lib64/libc.so.6
      #5  0x00007f687709a958 in abort ()
         from /nz/kit/sys/cc/mcpnps/sysroot/lib64/libc.so.6
      #6  0x00000000008636d6 in CError_abort (err=...)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/sys/error.cpp:511
      #7  0x0000000000863c92 in CError_AssertFailed (line=line@entry=2851, 
          src=src@entry=0xbe57f0 "/gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/fpgawrap.cpp")
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/sys/error.cpp:576
      #8  0x000000000063863d in CFpgaWrap::UnpackFpgaPacket (this=0x5b33e000, 
          pPlan=pPlan@entry=0x5b3398e0, pPacket=0x5b33ada0, 
          XIDMode=<optimized out>, totalXIDs=<optimized out>, 
          invisibleXIDs=<optimized out>, xIds=0x5b33adf8)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/fpgawrap.cpp:2851
      #9  0x0000000000613390 in _init_scan (plan=0x5b3398e0)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2313
      #10 _launch_plan_scan (cl=0x5b33b898, event_queue=0x34cda78, plan=0x5b3398e0)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2410
      #11 SpuProcessPlanEvents (plan=0x5b3398e0)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spuevent.cpp:2030
      #12 0x00000000007054aa in _invoke_entry (arg=<optimized out>, 
          entry=<optimized out>)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:788
      #13 _childproc (time_slice=10, priority=10, jobtask=0x53ec8818)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1134
      #14 __create_task (jobtask=0x53ec8818, priority=priority@entry=10, 
          time_slice=time_slice@entry=10)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1199
      #15 0x0000000000706a13 in _create_tasks (priority=10, time_slice=10, 
          req=0x5ab319d8)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1233
      #16 _create_job_tasks (req=<optimized out>)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1582
      #17 nzprocmgr_main ()
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spujobtask.cpp:1925
      #18 0x000000000060c2b0 in NzSpuMain ()
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spumain.cpp:1081
      #19 0x00000000005ff195 in main (argc=<optimized out>, argv=<optimized out>)
          at /gpfs_production/builds/Voldemorts/release-11.2.1.12/240820-150/main/src/nde/spu/spumain.cpp:1090
      (gdb) 
    5. If the backtrace matches the ones mentioned above, follow these steps:

      1. Verify nzhw -issues.
      2. Verify the nzstate. The system should be online.
    6. Proceed with the resume operation if no issues are found.
    7. Run the following commands to collect the logs, if the backtrace is different.

      apdiag collect --components platform_manager/ ips/ --minus-components ips/spu_all_logs
      apdiag collect --components platform_manager/ ips/ --spus <spus which restarted>

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSTNZ3","label":"IBM Netezza for Cloud Pak for Data"},"ARM Category":[{"code":"a8m0z000000cvScAAI","label":"Netezza Performance Server"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
21 March 2025

UID

ibm17228723