Topic
  • 2 replies
  • Latest Post - ‏2014-02-11T02:02:41Z by CurrentResident
CurrentResident
CurrentResident
2 Posts

Pinned topic 9133-55a reboot issue: nvram corrupted? Suggestions?

‏2014-02-08T02:07:56Z |

Hello,

I've been using a P 720 at work, and been so happy with it, that I bought a used 550 (9133-55a) to play with at home.  I installed Fedora 20 onto it, and it worked great for the first couple of boots, though there were some interesting SCSI controller reset/error messages during boot.  Anyway, now immediately after I see the huge STARTING SOFTWARE message, this happens:

PFW: Unable to send error log!
PFW: Unable to send error log!

Base of Data Stack corrupted!!! Restored deadbeef.Call History
------------
exit  - c3a128
collect-free  - c60458
(poplocals)  - c3a758
shrink-variable-partitions  - c611e0
(poplocals)  - c3a758
purge-net-parameter-settings  - c613cc
(poplocals)  - c3a758
add2part  - c6157c
(poplocals)  - c3a758
maybe-expand-common  - c6187c
(poplocals)  - c3a758
go  - c7e734
boot  - c7ec80
evaluate  - c4a638
invalid pointer - 19c6b01
invalid pointer - 2f
invalid pointer - 2f
catch  - c38fe8
attempt-sequence  - d36864
(poplocals)  - c3a758
PFW: Unable to send error log!
Continue with execution (c), Abort execution (a), or Pseudo-prompt (p) ?:

 

I went ahead and followed the advice given here:

https://www.ibm.com/developerworks/community/forums/html/topic?id=15102c02-8c24-4b46-bc3f-02d0967cef15

Which seems to work for one boot.  However, as soon as I power down or reboot, I get the "Base of Data Stack corrupted!!!" message again.  Sure, I could wipe nvram again, but I'd really rather not have to do so after every reboot.  In case it matters, this system is an unmanaged, two-way P5+ with firmware SF240_371.

UPDATE:  I was finally able to obtain SF240_418 as well as updated SCSI controller microcode.  I applied them, but neither fixed the issue.

In any case, please let me know if you have any guidance/suggestions/thoughts.

Thank you!

Jim

Updated on 2014-02-10T02:47:58Z at 2014-02-10T02:47:58Z by CurrentResident
  • willschm
    willschm
    47 Posts

    Re: 9133-55a reboot issue: nvram corrupted? Suggestions?

    ‏2014-02-10T16:55:40Z  

    I went ahead and followed the advice given here:

    https://www.ibm.com/developerworks/community/forums/html/topic?id=15102c02-8c24-4b46-bc3f-02d0967cef15

     

    That probably the "wipe-nvram" approach that you took?     From what you describe, i'd guess that the system re-adjusts its real-base during the first boot attempt to a value that just doesn't work on subsequent attempts.   

    I'd try manually setting the real-base value to a few different values, and see if you can find one that works consistently.    You should be able to see the current settings via "printenv" at the OF prompt, (though I expect it is getting updated during the first boot), or via the OS "nvram --print-config | grep base" .

     

     Per some old-ish notes, I've tried values across the spectrum. ( 0x4000, 0xc00000 , 0x1000000, 0x4000000, 0x6000000 ), though I left myself no hints regarding the success of those values, so your mileage may vary.  :-)   

     

     

     

     

     

  • CurrentResident
    CurrentResident
    2 Posts

    Re: 9133-55a reboot issue: nvram corrupted? Suggestions?

    ‏2014-02-11T02:02:41Z  
    • willschm
    • ‏2014-02-10T16:55:40Z

    I went ahead and followed the advice given here:

    https://www.ibm.com/developerworks/community/forums/html/topic?id=15102c02-8c24-4b46-bc3f-02d0967cef15

     

    That probably the "wipe-nvram" approach that you took?     From what you describe, i'd guess that the system re-adjusts its real-base during the first boot attempt to a value that just doesn't work on subsequent attempts.   

    I'd try manually setting the real-base value to a few different values, and see if you can find one that works consistently.    You should be able to see the current settings via "printenv" at the OF prompt, (though I expect it is getting updated during the first boot), or via the OS "nvram --print-config | grep base" .

     

     Per some old-ish notes, I've tried values across the spectrum. ( 0x4000, 0xc00000 , 0x1000000, 0x4000000, 0x6000000 ), though I left myself no hints regarding the success of those values, so your mileage may vary.  :-)   

     

     

     

     

     

    Yep, I was referring to the wipe-nvram workaround.

    Thanks for the advice - It turns out that the real-base was reset to c00000, so I went ahead and fiddled around with the real-base, trying c00000, 2000000, 6000000, but all cases seemed to yield the same results, except of course the dumped call stack addresses being relative to the real-base.

    But I found another workaround!  I can get it to boot successfully every time with real-base c00000, 2000000, or 6000000 if I go through SMS to boot.  I select Boot Options -> Select Install/Boot Device -> List all Devices -> (hard drive) -> Normal Boot Mode -> Yes exit SMS.  Simply logging into SMS then exiting does not work; I have to go through the long way.  It seems preferable to wiping nvram every time, but still less than ideal.  Why would it matter that I manually boot thru SMS?  WEIRD!

    For kicks I decided to clear everything except the hard drive from the boot order list, suspecting that it might have something to do with processing the boot list, but that didn't work.