Topic
  • 5 replies
  • Latest Post - ‏2013-10-15T18:39:35Z by PaulMacAlpine
PTollis
PTollis
7 Posts

Pinned topic Instances appear as "active" but not reachable

‏2013-10-11T09:20:01Z |

Hello.

In the last couple of weeks, i've noticed a strange behaviour on both Windows instances i've created (id's are 467474 and 467510, in Ehningen datacenter). In facts, the instances were in "Active" status from the Control Panel, but they were not accessible with the Remote Desktop Console (also "ping" command did not reach them). The only action to recover them was to force the reboot from the Control Panel itself. After the reboot, the connection was restored. This happened twice on instance 467474 and once on 467510 (all 3 cases in 3 different days).

In the attached screenshot, one of the 3 issue occurrencies described above is shown.

Is it an already-known issue? Could you please check and provide a solution for that?

Many thanks in advance
Best regards

Attachments

  • NachoDB
    NachoDB
    1 Post

    Re: Instances appear as "active" but not reachable

    ‏2013-10-11T11:30:50Z  

    Hi PTollis,

    In order for me to better understand your issue, please could you give me more details?

    In relation to the machines that you cannot reach:

    - Did you use the machine before? or Is just after creating it?

    - Have you updated the OS or installed any network SW?

     

    You can try to enable ping response on Windows Server 2003/2008:

    Here is how you can enabled it:

    1) Open the windows firewall console.

    2) Click inbound rules.

    3) Find "File and Printer Sharing (Echo Request - ICMPv4-In)" and click enable rule. (if you need IPv6 you will need to enable that too)

    Or by command line:

    netsh firewall set icmpsetting 8

    Enabling ping response will help you to know if the service is down.

    Regards

  • PTollis
    PTollis
    7 Posts

    Re: Instances appear as "active" but not reachable

    ‏2013-10-14T06:20:01Z  
    • NachoDB
    • ‏2013-10-11T11:30:50Z

    Hi PTollis,

    In order for me to better understand your issue, please could you give me more details?

    In relation to the machines that you cannot reach:

    - Did you use the machine before? or Is just after creating it?

    - Have you updated the OS or installed any network SW?

     

    You can try to enable ping response on Windows Server 2003/2008:

    Here is how you can enabled it:

    1) Open the windows firewall console.

    2) Click inbound rules.

    3) Find "File and Printer Sharing (Echo Request - ICMPv4-In)" and click enable rule. (if you need IPv6 you will need to enable that too)

    Or by command line:

    netsh firewall set icmpsetting 8

    Enabling ping response will help you to know if the service is down.

    Regards

    Hi Nacho and thanks for your reply.

    Both instances were created on Sept 26 and used since then. The issue appears randomly: instance 467474 has been recovered with a reboot from SCE CP on Oct. 07 around 11:47 and Oct. 09 around 10:57, while 1st issue on instance 467510 has been recovered with a reboot from SCE CP on Oct. 10 around 09:40 (all times are CET). Second occurrence of issue on instance 467510 is still ongoing.

    OS have been updated on both instances, by applying Microsoft updates, and we also installed a free antivirus software and some application software (e.g. MS-SQL server), but issue was present even before installing them.

    Instance with the issue can be reached again only by rebooting it from the SCE Control Panel, and, after some (random) time, issue can occur again.

    When issue is occurring, no RDC or ping are possible, while when issue is not present, both RDC and ping work fine (there is no need to enable ping on instances since it's already working when issue is not present).

    Best regards

    Updated on 2013-10-14T13:56:52Z at 2013-10-14T13:56:52Z by PTollis
  • PaulMacAlpine
    PaulMacAlpine
    12 Posts

    Re: Instances appear as "active" but not reachable

    ‏2013-10-14T18:55:06Z  
    • PTollis
    • ‏2013-10-14T06:20:01Z

    Hi Nacho and thanks for your reply.

    Both instances were created on Sept 26 and used since then. The issue appears randomly: instance 467474 has been recovered with a reboot from SCE CP on Oct. 07 around 11:47 and Oct. 09 around 10:57, while 1st issue on instance 467510 has been recovered with a reboot from SCE CP on Oct. 10 around 09:40 (all times are CET). Second occurrence of issue on instance 467510 is still ongoing.

    OS have been updated on both instances, by applying Microsoft updates, and we also installed a free antivirus software and some application software (e.g. MS-SQL server), but issue was present even before installing them.

    Instance with the issue can be reached again only by rebooting it from the SCE Control Panel, and, after some (random) time, issue can occur again.

    When issue is occurring, no RDC or ping are possible, while when issue is not present, both RDC and ping work fine (there is no need to enable ping on instances since it's already working when issue is not present).

    Best regards

    Hello,

    I have seen this issue occasionally.  Most likely what is happening is that the system randomly restarted (most likely because of a Blue Screen of Death).  Because the restart was unexpected, Windows then restarts into safe mode, where you cannot remote desktop into the system.

    As far as I can tell, the Blue Screen of Death or whatever causes the restart happens with specific systems.  At least for me, it seemed like an underlying hardware problem.  If this is the case, then you can somewhat mitigate the problem by configuring the instance to not restart into safe mode when an unexpected restart occurs.  See the FAQ (https://www.ibm.com/developerworks/community/forums/html/topic?id=ca16a1f4-ead8-4a5b-a721-c9acfc5464b4), Q16E:

    By default, Windows 2008 is configured to reboot into Recovery mode after unexpected reboot.
    This feature has been disabled in the base OS images starting with May 15 deployment. If you have provisioned your instance prior to this date you can use the following workaround:

    • Open a command prompt window with administrative rights (Start -> All Programs -> Accessories -> Right Click on "Command Prompt" -> Run as Administrator)
    • bcdedit /set {default} bootstatuspolicy ignoreallfailures

    I thought that I had made this change already in the ICM 5.1.1 image, but it has been a long time and I might be mistaken.  The ICM 5.2 image was made with the more recent base OS image, so that should definitely have this workaround.

    Of course, the above is at best a workaround.  It will not solve the underlying blue screen of death that is causing the unexpected restart in the first place.  To solve that problem, you will need to post to IBM SmartCloud Enterprise Support Community.  They fixed the BSOD/restart issue before, but this might be caused by a different issue with similar results.

     

    Good luck!

    -Paul

  • PTollis
    PTollis
    7 Posts

    Re: Instances appear as "active" but not reachable

    ‏2013-10-15T07:12:25Z  

    Hello,

    I have seen this issue occasionally.  Most likely what is happening is that the system randomly restarted (most likely because of a Blue Screen of Death).  Because the restart was unexpected, Windows then restarts into safe mode, where you cannot remote desktop into the system.

    As far as I can tell, the Blue Screen of Death or whatever causes the restart happens with specific systems.  At least for me, it seemed like an underlying hardware problem.  If this is the case, then you can somewhat mitigate the problem by configuring the instance to not restart into safe mode when an unexpected restart occurs.  See the FAQ (https://www.ibm.com/developerworks/community/forums/html/topic?id=ca16a1f4-ead8-4a5b-a721-c9acfc5464b4), Q16E:

    By default, Windows 2008 is configured to reboot into Recovery mode after unexpected reboot.
    This feature has been disabled in the base OS images starting with May 15 deployment. If you have provisioned your instance prior to this date you can use the following workaround:

    • Open a command prompt window with administrative rights (Start -> All Programs -> Accessories -> Right Click on "Command Prompt" -> Run as Administrator)
    • bcdedit /set {default} bootstatuspolicy ignoreallfailures

    I thought that I had made this change already in the ICM 5.1.1 image, but it has been a long time and I might be mistaken.  The ICM 5.2 image was made with the more recent base OS image, so that should definitely have this workaround.

    Of course, the above is at best a workaround.  It will not solve the underlying blue screen of death that is causing the unexpected restart in the first place.  To solve that problem, you will need to post to IBM SmartCloud Enterprise Support Community.  They fixed the BSOD/restart issue before, but this might be caused by a different issue with similar results.

     

    Good luck!

    -Paul

    Hi Paul

    Thanks for our reply. Actually i didn't find any evidence of spontaneous/unexpected restarts in Event Viewer of both instances (does this affect also the failure on ping commands?). They also have been created very recently (Sept. 26), so they should already include the workaround. In any case, i issued the suggested command to both servers/instances and let's see what happens.

    Unfortunately i could not open the FAQ link, since i receive a "permission denied" message: any way to get access?

    Best regards.

    Updated on 2013-10-15T07:12:52Z at 2013-10-15T07:12:52Z by PTollis
  • PaulMacAlpine
    PaulMacAlpine
    12 Posts

    Re: Instances appear as "active" but not reachable

    ‏2013-10-15T18:39:35Z  
    • PTollis
    • ‏2013-10-15T07:12:25Z

    Hi Paul

    Thanks for our reply. Actually i didn't find any evidence of spontaneous/unexpected restarts in Event Viewer of both instances (does this affect also the failure on ping commands?). They also have been created very recently (Sept. 26), so they should already include the workaround. In any case, i issued the suggested command to both servers/instances and let's see what happens.

    Unfortunately i could not open the FAQ link, since i receive a "permission denied" message: any way to get access?

    Best regards.

    Hello,

    I'm mostly just guessing here, as your issue seemed similar to the issue I had before.  As I mentioned in my previous post, I don't know if the workaround is in the ICM 5.1.1 image, as that image was created before the fix was introduced.  The important factor is when the image is created, not the instance.  So even though you created the instance last month, the image itself was created over a year ago.  I thought I had added the workaround previously, but it is possible I did not add it to ICM 5.1.1 image.  The workaround will definitely be on the ICM 5.2 image, since that image was created within the last couple of months.

    If you have an IBM SmartCloud Enterprise account, you should be able to access the IBM SCE support forums.  I'm not really sure how to get access to the IBM SCE support forums, as I think I was added to that community as a part of the onboarding process.  You can try searching Developer Works for the IBM SmartCloud Enterprise Support Community and then request access to the community.  However, I quoted the relevant parts of the FAQ.

    Ultimately, there isn't much I can do on this issue, as this is an SCE infrastructure problem, and not a ICM image problem.  You will be able to get a lot more help from the SCE support forums.  To get there, log into the SCE portal, go to the support tab, then click on the support community link on the right hand column (near the bottom of the screen).

    -Paul