• Share
  • ?
  • Profiles ▼
  • Communities ▼
  • Apps ▼

Blogs

  • My Blogs
  • Public Blogs
  • My Updates
  • Administration
  • Log in to participate

▼ Tags

 

▼ Similar Entries

THINK Conference - D...

Blog: DB2 Performan...
DaveBeulke 2700023WUN
Updated
1 people likes thisLikes 1
No CommentsComments 0

Next Linux webcast: ...

Blog: Ingolf's z/VS...
Ingolf24 120000DRN3
Updated
0 people like thisLikes 0
No CommentsComments 0

Analyzing Global Mir...

Blog: The BVQ Blog
mipi 270004DGB0
Updated
0 people like thisLikes 0
No CommentsComments 0

Announcing IBM High-...

Blog: Dino Quintero...
DinoQuintero 2700050KT5
Updated
0 people like thisLikes 0
No CommentsComments 0

Installation of IBM ...

Blog: Malarvizhi K ...
Malarvizhi_Kandasamy 060000VYUA
Updated
0 people like thisLikes 0
No CommentsComments 0

▼ Archive

  • December 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010

▼ Blog Authors

Anthony's Blog: Using System Storage - An Aussie Storage Blog

View All Entries
Clicking the button causes a full page refresh. The user could go to the "Entry list" region to view the new content.) Entry list

208 day reboot bug

anthonyv 2000004B9K | | Tags:  command-line kernel svc interface v7000 firmware uptime. unified linux storwize ibm ‎ | 8 Comments ‎ | 34,849 Views

It is ironic that only days after I wrote that 497 is the IT number of the beast, I learn that Linux has another unfortunate number:  208.

The reason for this is a defect in the internal Linux kernel used in recent firmware levels of SVC, Storwize V7000 and Storwize V7000 Unified nodes.  This defect will cause each node to reboot after 208 days of uptime.   This issue exists in unfixed versions of the 6.2 and 6.3 level of firmware, so a large number of users are going to need to take some action on this (except those who are still on a 4.x,  5.x, 6.0 or 6.1 release).   If you have done a code update after June 2011, then you are probably affected.   This means that if you are an IBM client you need to read this alert now and determine how far you are into that 208 day period.   If you are an IBMer or an IBM Business Partner, you need to make sure your clients are aware of this issue, though hopefully they have signed up for IBM My Notifications and have already been notified by e-mail.

In short what needs to happen is that you must:

  1. Determine your current firmware level.
  2. Check the table in the alert to determine if you are affected at all, and if so, how far you are potentially into the 208 day period.
  3. Use the Software Upgrade Test Utility to confirm your actual uptime.
  4. Prior to the 208 day period finishing, either reboot your nodes (one at a time, with a decent interval between them) or install a fixed level of software (as detailed in the alert).

To give you an example of the process, my lab machine is on software version 6.3.0.1 which you can see in the screen capture below.  So when I check the table in the alert, I see that version 6.3.0.1 was made available on January 24, 2012, which means the 208 day period cannot possibly end before August 19, 2012.

Version NumberRelease DateEarliest possible date that a system running this release could hit the 208 day reboot.
SAN Volume Controller and Storwize V7000 Version 6.3
6.3.0.030 November 201125 June 2012
6.3.0.124 January 201219 August 2012

Regardless, I need to know the uptime of my nodes, so I download the Software Upgrade Test Utility (in case you have an older copy, we need at least version 7.9) and run it using the Upgrade Wizard (NOTE!  We are NOT updating anything here, just checking):

I Launch the Upgrade Wizard, use it to upload the tool and follow the prompts to run it, so that I get to see the output of that tool. The output in this example shows the uptime of each node is 56 days, so I have a maximum of 152 days remaining before I have to take any action.  At this point I select Cancel.   You can run this tool as often as you like to keep checking uptime.

Note if you are on 6.1 or 6.2 code you may see a timeout error when running the tool, especially for the first time.  If you do see an error, please follow the instructions in the section titled "When running the the upgrade test utility v7.5 or later on Storwize V7000 v6.1 or v6.2"  at the Test Utility download site.

As per the Alert:

  • If you are running a 6.0 or 6.1 level of firmware, you are not affected.
  • If you are running a 6.2 level of firmware, the fix level is v6.2.0.5 which is available here for Storwize V7000 and here for SVC.
  • If you are running a 6.3 level of firmware, the fix level is v6.3.0.2 which is available here for Storwize V7000 and here for SVC.
  • If you are using a Storwize V7000 Unified, the fix level is v1.3.0.5 which is available here.

You should keep checking the alert to find out any new details as they come to hand.  If you are curious about Linux and 208 day bugs,  try this Google search.

*** Updated April 4, 2012 with links to fix levels *** 

If you have any questions or need help, please reach out to your IBM support team or leave me a comment or a tweet.

*** April 10:   The IBM Web Alert has been updated with new information on what to do if your uptime has actually gone past 208 days without a reboot.  In short you still need to take action.  Please read the updated alert and follow the instructions given there. *** 
  • Add a Comment Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry
Notify Other People
notification

Send Email Notification

+

Quarantine this entry

deleteEntry
duplicateEntry

Mark as Duplicate

  • Previous Entry
  • Main
  • Next Entry
Feed for Blog Entries | Feed for Blog Comments | Feed for Comments for this Entry