Topic
  • 17 replies
  • Latest Post - ‏2012-04-18T11:25:37Z by bjyangf
vhavard
vhavard
21 Posts

Pinned topic Installer never finishes

‏2012-01-06T06:39:13Z |
I have tried a number of times (unsuccessfully) to get SCP installed and working so I can test it out and learn how to install, configure, and use it.

In my latest attempt, the installer never finishes.

After the PXE server is installed it tells me to start up come compute and storage nodes. I have now started up six storage nodes and 3 compute nodes with various amounts of disk space and memory and the GUI polls occassionally, but never gets past the message: Detecting power on nodes: The detected Storage Node (192.168.90.116) is powered on.

Occasionally, it will roll through all the other storage nodes, but so quickly I can't tell if any of them have any errors.

After the first couple of storage nodes were up there was a message that there was 0GB of RAM available and I needed to start up more storage nodes. At that point, I only had 2 small storage nodes installed, so I installed several others with between 8GB and 24GB of RAM each and started up several compute nodes as well. I have now run out of hardware (10 physical servers) and it still will not complete the install.

Are there specs on how much resource is required for each storage and compute node? Also, is there some kind of log or means of seeing what is still lacking so I can figure out what I need to do to make it complete?

  • Victor
Updated on 2012-04-18T11:25:37Z at 2012-04-18T11:25:37Z by bjyangf
  • YanHuang
    YanHuang
    7 Posts

    Re: Installer never finishes

    ‏2012-01-06T06:45:42Z  
    Hi Victor,
    I think you were select the Minimal installation, for this installation type, GUI installer will only deploy kernel servers VMs to storage nodes, this requires CPU of the storage nodes supports VT, otherwise Installer will think the storage nodes as 0 memory to deploy kernel services VMs.
  • vhavard
    vhavard
    21 Posts

    Re: Installer never finishes

    ‏2012-01-06T06:52:44Z  
    • YanHuang
    • ‏2012-01-06T06:45:42Z
    Hi Victor,
    I think you were select the Minimal installation, for this installation type, GUI installer will only deploy kernel servers VMs to storage nodes, this requires CPU of the storage nodes supports VT, otherwise Installer will think the storage nodes as 0 memory to deploy kernel services VMs.
    Thanks for the information. Do the Compute nodes also require VT CPU's?

    I chose the custom install because I wanted to test using vmware hypervisors as well. I suppose just to get it up and running I'll try the minimal install first.

    Is there any way to find what my installation is lacking to complete the install? I have 2 of my storage nodes on HS21 blades which have VT chips each with 8GB of RAM and 250GB hard drives.

    I was unable to install the compute node on an HS22 blade or an HS20 blade, both errored out and I was unable to figure out why because it scrolled off the screen before I could read the first error.

    • Victor
  • vhavard
    vhavard
    21 Posts

    Re: Installer never finishes

    ‏2012-01-06T06:54:38Z  
    • YanHuang
    • ‏2012-01-06T06:45:42Z
    Hi Victor,
    I think you were select the Minimal installation, for this installation type, GUI installer will only deploy kernel servers VMs to storage nodes, this requires CPU of the storage nodes supports VT, otherwise Installer will think the storage nodes as 0 memory to deploy kernel services VMs.
    One other question, is there any way to change the install type or do I have to reinstall the OS from scratch and reinstall SCP?
  • YanHuang
    YanHuang
    7 Posts

    Re: Installer never finishes

    ‏2012-01-06T07:03:07Z  
    • vhavard
    • ‏2012-01-06T06:52:44Z
    Thanks for the information. Do the Compute nodes also require VT CPU's?

    I chose the custom install because I wanted to test using vmware hypervisors as well. I suppose just to get it up and running I'll try the minimal install first.

    Is there any way to find what my installation is lacking to complete the install? I have 2 of my storage nodes on HS21 blades which have VT chips each with 8GB of RAM and 250GB hard drives.

    I was unable to install the compute node on an HS22 blade or an HS20 blade, both errored out and I was unable to figure out why because it scrolled off the screen before I could read the first error.

    • Victor
    Yes, compute nodes do need VT support.
    If use gui installer, you need reinstall them, BTW, why not just bootup 2 support VT box as storage nodes to continue your installation.
  • Thibaud
    Thibaud
    6 Posts

    Re: Installer never finishes

    ‏2012-01-06T08:27:14Z  
    • vhavard
    • ‏2012-01-06T06:52:44Z
    Thanks for the information. Do the Compute nodes also require VT CPU's?

    I chose the custom install because I wanted to test using vmware hypervisors as well. I suppose just to get it up and running I'll try the minimal install first.

    Is there any way to find what my installation is lacking to complete the install? I have 2 of my storage nodes on HS21 blades which have VT chips each with 8GB of RAM and 250GB hard drives.

    I was unable to install the compute node on an HS22 blade or an HS20 blade, both errored out and I was unable to figure out why because it scrolled off the screen before I could read the first error.

    • Victor
    Hi Victor,

    I guess we have the same problem. You need to have machines supporting VT, but also physical machine. In my case, I have virtual machines on a VT-compatible ESXi 5 (so my VMs are also supporting VT) and I stucked with that "not enough memory" problem.
  • vhavard
    vhavard
    21 Posts

    Re: Installer never finishes

    ‏2012-01-06T08:54:55Z  
    • Thibaud
    • ‏2012-01-06T08:27:14Z
    Hi Victor,

    I guess we have the same problem. You need to have machines supporting VT, but also physical machine. In my case, I have virtual machines on a VT-compatible ESXi 5 (so my VMs are also supporting VT) and I stucked with that "not enough memory" problem.
    At the moment, I have 3 physical storage nodes running with VT chips and 2 compute nodes running with VT chips, and the installer is still not completing.

    The first few storage and compute nodes I brought up were not VT chips, though, so I'm wondering if the installer is confused by them.

    I'm wondering if I don't just need to do a complete re-install only using blades which have a VT chip.

    I agree, though, that having an installation which could be completely virtual is very important for the ability to demo it without having to have a lot of hardware somewhere that you need internet and IBM VPN access to use.

    I have been able to get the storage nodes to install into a vmware workstation instance, but have never been able to get a compute node to install into one. Of course, I can't get the compute node to install onto a big HS22 blade either.

    I have been trying to get this thing installed for a month off and on and have yet to get a clean install. I may be expecting too much to try to get the vmware hypervisor support, I'm thinking I'd like to get it to install correctly just once just so I can see that it works.

    I may bring it back to just a minimal install for those purposes and worry about a more complex install later. Problem is, if I have to go on a customer site and install a more complex environment for a POC I need to know that it will work and how to make it work.
  • ValoryBatchellor
    ValoryBatchellor
    22 Posts

    Re: Installer never finishes

    ‏2012-01-06T10:40:18Z  
    Hi Victor

    first of all, let me reassure you that the install does work. I installed the 1.1 Beta about 6 months ago and had similar experiences to you, there were a bunch of things I had to overcome to do with my hardware and networking, before I got it going. And like you, I was trying to do it 'in my spare time'. But in the end it did work.

    My first question is, are you seeing the PXE boot starting on your storage nodes? It's an important question because unless the PXE client on your storage node is finding the correct PXE server, it could mean you've got networking issues (that one took up several weeks of my time). It's best to do this on an isolated network, I thought I had one, turned out the network admin had left some 'trunking' definitions which meant I wasn't as isolated as I thought.

    Second question - have you checked the BIOS on your storage nodes to make sure the system knows it's VT enabled? It isn't just a matter of having the right chips. I've found BIOSes vary widely, but you could be looking for a setting - probably under something like CPU options - such as 'VT enable' or in intel, perhaps 'intel virtualisation enabled'. Because without these settings, the firstbox server could be being told you're not VT enabled.

    I haven't tried to uninstall/reinstall the 1.2 code yet, but in 1.1 (where we didn't have a GUI option, we ran an install script) you could fool firstbox into doing a brand new run by deleting all content in /var/lib/tftpboot. It might be worth a try.
  • vhavard
    vhavard
    21 Posts

    Re: Installer never finishes

    ‏2012-01-06T17:41:01Z  
    Hi Victor

    first of all, let me reassure you that the install does work. I installed the 1.1 Beta about 6 months ago and had similar experiences to you, there were a bunch of things I had to overcome to do with my hardware and networking, before I got it going. And like you, I was trying to do it 'in my spare time'. But in the end it did work.

    My first question is, are you seeing the PXE boot starting on your storage nodes? It's an important question because unless the PXE client on your storage node is finding the correct PXE server, it could mean you've got networking issues (that one took up several weeks of my time). It's best to do this on an isolated network, I thought I had one, turned out the network admin had left some 'trunking' definitions which meant I wasn't as isolated as I thought.

    Second question - have you checked the BIOS on your storage nodes to make sure the system knows it's VT enabled? It isn't just a matter of having the right chips. I've found BIOSes vary widely, but you could be looking for a setting - probably under something like CPU options - such as 'VT enable' or in intel, perhaps 'intel virtualisation enabled'. Because without these settings, the firstbox server could be being told you're not VT enabled.

    I haven't tried to uninstall/reinstall the 1.2 code yet, but in 1.1 (where we didn't have a GUI option, we ran an install script) you could fool firstbox into doing a brand new run by deleting all content in /var/lib/tftpboot. It might be worth a try.
    It does seem to be a network issue. The environment is completely isolated through a separate VLAN. I was working with Yan Huang last night and it seems communication between the various modules is very slow. I started looking at the network interfaces and the storage nodes are installing bridge and setting the IP address of the bridge (br0) rather than the nic (eth0). There are multiple NICs in the machines and I'm wondering if it isn't getting confused and trying to send traffic out the wrong NIC.

    Any idea why it's installing a bridge on the storage node?

    I'm going to wipe it all out, greatly simplify the environment, and try again.
  • vhavard
    vhavard
    21 Posts

    Re: Installer never finishes

    ‏2012-01-09T20:32:11Z  
    • vhavard
    • ‏2012-01-06T17:41:01Z
    It does seem to be a network issue. The environment is completely isolated through a separate VLAN. I was working with Yan Huang last night and it seems communication between the various modules is very slow. I started looking at the network interfaces and the storage nodes are installing bridge and setting the IP address of the bridge (br0) rather than the nic (eth0). There are multiple NICs in the machines and I'm wondering if it isn't getting confused and trying to send traffic out the wrong NIC.

    Any idea why it's installing a bridge on the storage node?

    I'm going to wipe it all out, greatly simplify the environment, and try again.
    I finally got this resolved.

    It seems that the storage nodes were all installing correctly, but there was a problem communicating between the install server and the compute node. Each of these blades has 2 NICs and after some more troubleshooting I found that the compute node was coming up and attaching to eth1 rather than eth0. That was resulting in it getting an IP address on the wrong subnet and there was no route between them. When I partitioned that port on the switch and forced it to use eth0 the install finished.

    So my new question is, why does the install work perfectly for the storage nodes using eth0 as the primary interface, but for compute nodes it uses eth1? I suppose this would not have caused a problem if the second interface didn't respond to a dhcp request.

    I suppose the moral to this story is, make sure your network is completely isolated on all NICs prior to doing an install. Makes me wonder, though, how practical this will be in a real-world environment.

    • Victor
  • bjyangf
    bjyangf
    19 Posts

    Re: Installer never finishes

    ‏2012-01-10T06:26:59Z  
    • vhavard
    • ‏2012-01-09T20:32:11Z
    I finally got this resolved.

    It seems that the storage nodes were all installing correctly, but there was a problem communicating between the install server and the compute node. Each of these blades has 2 NICs and after some more troubleshooting I found that the compute node was coming up and attaching to eth1 rather than eth0. That was resulting in it getting an IP address on the wrong subnet and there was no route between them. When I partitioned that port on the switch and forced it to use eth0 the install finished.

    So my new question is, why does the install work perfectly for the storage nodes using eth0 as the primary interface, but for compute nodes it uses eth1? I suppose this would not have caused a problem if the second interface didn't respond to a dhcp request.

    I suppose the moral to this story is, make sure your network is completely isolated on all NICs prior to doing an install. Makes me wonder, though, how practical this will be in a real-world environment.

    • Victor
    1. For the eth0 issue, neither storage nodes or compute nodes have to bind to "eth0" specifically. In a typically isolated environment the installer will decide at run time which NIC physically connects to the PXE server and use that device accordingly. So you may need to check the vLan settings on the switch to see if eth0 actually "connects" to the PXE server.

    2. For the br0 issue, it actually is correct if br0 has an IP address instead of eth0 because br0 will further be used for kernel service VMs to communicate to the outside world. And if you check the network configurations eth0 is actually "bridged" to br0 so all network traffic of the host OS can still be transferred normally.
  • vhavard
    vhavard
    21 Posts

    Re: Installer never finishes

    ‏2012-01-10T18:02:34Z  
    • bjyangf
    • ‏2012-01-10T06:26:59Z
    1. For the eth0 issue, neither storage nodes or compute nodes have to bind to "eth0" specifically. In a typically isolated environment the installer will decide at run time which NIC physically connects to the PXE server and use that device accordingly. So you may need to check the vLan settings on the switch to see if eth0 actually "connects" to the PXE server.

    2. For the br0 issue, it actually is correct if br0 has an IP address instead of eth0 because br0 will further be used for kernel service VMs to communicate to the outside world. And if you check the network configurations eth0 is actually "bridged" to br0 so all network traffic of the host OS can still be transferred normally.
    What you describe is not what I'm seeing.

    The PXE server only has one NIC enabled and that NIC is attached to the same subnet as the compute server. The compute server has 2 NICs and I could disable one in the BIOS, but assumed the compute node would work like the storage node and respond on the NIC where the PXE server responded, but it is not. When the compute node is installed it brings up both NICs and the PXE server responds to eth0. After the installation is complete and the system has been rebooted, eth1 gets an IP address from the lab dhcp server and attempts to communicate with the install server over that interface rather than the one the PXE server responded on.

    br0 then binds with eth1 and not eth0 and gets an inappropriate IP address keeping it from being able to communicate with the install server. This results in the install server never completing because it cannot communicate with the compute server. Again, when I partition off the eth1 NIC, everything can communicate and the install completes.

    When I partition the port (on the switch) which contains eth1 everything works perfectly.

    It should be noted that the storage nodes do operate as I would expect and as you have described, but the compute nodes do not. This is easily repeatable.
  • bjyangf
    bjyangf
    19 Posts

    Re: Installer never finishes

    ‏2012-01-11T06:18:35Z  
    • vhavard
    • ‏2012-01-10T18:02:34Z
    What you describe is not what I'm seeing.

    The PXE server only has one NIC enabled and that NIC is attached to the same subnet as the compute server. The compute server has 2 NICs and I could disable one in the BIOS, but assumed the compute node would work like the storage node and respond on the NIC where the PXE server responded, but it is not. When the compute node is installed it brings up both NICs and the PXE server responds to eth0. After the installation is complete and the system has been rebooted, eth1 gets an IP address from the lab dhcp server and attempts to communicate with the install server over that interface rather than the one the PXE server responded on.

    br0 then binds with eth1 and not eth0 and gets an inappropriate IP address keeping it from being able to communicate with the install server. This results in the install server never completing because it cannot communicate with the compute server. Again, when I partition off the eth1 NIC, everything can communicate and the install completes.

    When I partition the port (on the switch) which contains eth1 everything works perfectly.

    It should be noted that the storage nodes do operate as I would expect and as you have described, but the compute nodes do not. This is easily repeatable.
    First let me get things clear. On the compute node you were talking about, eth0 connected to the cloud environment and eth1 connected to a lab DHCP server, which is a different node than the PXE server, right? If so, this doesn't meet the prerequisite of an isolated network and the installer isn't guaranteed to work properly. You can partition off eth1 during the installation process of compute node and re-enable it if you need the compute node to communicate with the outside world directly.
  • vhavard
    vhavard
    21 Posts

    Re: Installer never finishes

    ‏2012-01-11T19:32:54Z  
    • bjyangf
    • ‏2012-01-11T06:18:35Z
    First let me get things clear. On the compute node you were talking about, eth0 connected to the cloud environment and eth1 connected to a lab DHCP server, which is a different node than the PXE server, right? If so, this doesn't meet the prerequisite of an isolated network and the installer isn't guaranteed to work properly. You can partition off eth1 during the installation process of compute node and re-enable it if you need the compute node to communicate with the outside world directly.
    How realistic is an isolated network in your average production data center? I would argue, not very. As an academic exercise it's fine, but in the real world this environment must be able to co-exists with other systems in the data center.

    I understand that the prerequisites for this release are for an isolated environment. Once I figured out what was going on, I found the problem and corrected it. If, however, the storage nodes are working as desired and are using the PXE responding interface as its primary interface and the compute nodes are not, shouldn't it be easy enough to have the compute nodes use the PXE responding interface as well? This is as easy as adding a single line (ipappend 2) to the pxelinux.cfg/default file for the compute node entry.

    In the lab you can specify any requirements you like as prerequisites, however, in the real world, you have to be able to handle whatever environment the average customer is willing to provide. The solution should default to being more robust.
  • SystemAdmin
    SystemAdmin
    92 Posts

    Re: Installer never finishes

    ‏2012-03-27T16:46:07Z  
    Hi. I am also trying to install SCP, but getting the same problem. The installation never finishes. I am installing the components on 4 laptops; since I do not have any other resources. Here is the configuration of my environment:

    • PXE Server : Lenovo T61 - 2GB Ram, Intel Core2Duo 7300 2GHz CPU, 120 GB Disk (RHEL 6.1 Server x86_64)
    • Storage Node 1 : Lenovo T61 - 4GB Ram, Intel Core2Duo 7300 2GHz CPU, 120 GB Disk
    • Storage Node 2 : Lenovo T61 - 3GB Ram, Intel Core2Duo 7300 2GHz CPU, 120 GB Disk
    • Compute Node : Lenovo T410 - 6GB Ram, Intel Core i5 dual-core CPU, 320 GB Disk

    I am installing PXE Server by selecting minimal deployment option and the installation program asks for Storage Nodes and Compute Nodes. I power on the Storage Nodes and both of the Storage Nodes gets installed successfully. At this time the installation program shows "The detected Storage Node (192.168.19.115) is powered on" and "The detected Storage Node (192.168.19.116) is powered on". Then, I am powering on the Compute Node and it also gets installed and I can login it as root. However, installation program never detects the Compute Node and it waits on the same screen and there is not any entry in /var/lib/tftpboot/hosts file for the Compute Node.

    Note : On each laptop I selected "Enabled" for the Virtualization Technology option in Boot Menu - Config - CPU.

    Do you have any suggestions?

    Best Regards
  • vhavard
    vhavard
    21 Posts

    Re: Installer never finishes

    ‏2012-03-27T17:13:43Z  
    Hi. I am also trying to install SCP, but getting the same problem. The installation never finishes. I am installing the components on 4 laptops; since I do not have any other resources. Here is the configuration of my environment:

    • PXE Server : Lenovo T61 - 2GB Ram, Intel Core2Duo 7300 2GHz CPU, 120 GB Disk (RHEL 6.1 Server x86_64)
    • Storage Node 1 : Lenovo T61 - 4GB Ram, Intel Core2Duo 7300 2GHz CPU, 120 GB Disk
    • Storage Node 2 : Lenovo T61 - 3GB Ram, Intel Core2Duo 7300 2GHz CPU, 120 GB Disk
    • Compute Node : Lenovo T410 - 6GB Ram, Intel Core i5 dual-core CPU, 320 GB Disk

    I am installing PXE Server by selecting minimal deployment option and the installation program asks for Storage Nodes and Compute Nodes. I power on the Storage Nodes and both of the Storage Nodes gets installed successfully. At this time the installation program shows "The detected Storage Node (192.168.19.115) is powered on" and "The detected Storage Node (192.168.19.116) is powered on". Then, I am powering on the Compute Node and it also gets installed and I can login it as root. However, installation program never detects the Compute Node and it waits on the same screen and there is not any entry in /var/lib/tftpboot/hosts file for the Compute Node.

    Note : On each laptop I selected "Enabled" for the Virtualization Technology option in Boot Menu - Config - CPU.

    Do you have any suggestions?

    Best Regards
    Most of the issues I've had with the compute node have been network related. I have had to disable all but one NIC in the bios or it will try to communicate through the wrong one, invariably.

    To get it isolated on the network I have also had to use vlans, but for some reason, after all the nodes have booted up I have to remove all vlans to get the compute node to talk to the install machine.

    It seems the installer is not very intelligent and expects the hardware and network to be configured in a certain way in order to complete the install.

    This product is in desperate need of some kind of a dashboard or active checklist on the installer which tells you what it's currently doing and what it's waiting on. You have to scour the hard drive for log files because there are different log files being used at different points in the installation.

    Some of the processes just take a long time. The final bit of bringing up the various VMs can take hours, but you get no feedback that it's working or something is hung. I found the inability to communicate by tailing a certain logfile, when I removed the vlan it started working, but if I hadn't found the right logfile to see that the ssh was failing I wouldn't have known what to do.

    During the initial installation of the install server there is a details button that is helpful becuase it shows you what's going on and you can see any errors, this does not exist in every phase of the process, though.
  • rossdavibm
    rossdavibm
    25 Posts

    Re: Installer never finishes

    ‏2012-04-13T20:31:52Z  
    I read through this thread today, and sense the frustration from those having installation issues. I also concur with the post that this product has installed successfully since the 1.1 version for me numerous times (I have installed/reinstalled ISCP no less than 50 times since last August, repeatedly on three different sets of hardware, as we were developing training, capturing the installation in video form, testing new drops, and so on). The big things are network-related, BIOS related (virtualization chip set), and things of that sort as already mentioned in this thread.

    I encountered a new one with 2.1, however, with a solution to share. Perhaps this belongs in another thread, but I had the same symptoms: The installer (GUI) would not finish. It would reach the point it was detecting power-on nodes, but in my situation, the result was that EVERY node was reported to the installer as a compute node regardless of whether it was storage or compute.

    In digging in scripts related to the installer, I noted that at least one script (deploy.sh) refers to a file: /var/lib/tftpboot/hosts, in which it looks for the string 'storage'. Now, how and when this file is created or updated I have not had time to discover; suffice to say that it should be getting updated somewhere. In examining this file, I found that I had NO entries in there for my three storage nodes. Let me add that I am using a modified dhcpd.static file (among others) to name my compute nodes accordingly (with the revisions to startup.sh & elsewhere documented for defining your own names for these nodes).

    Anyway, my solution/workaround for now is to add entries to /var/lib/tftpboot/hosts for my storage nodes and restart the ruby processes on each. The installer GUI went from reporting "compute node at xxx.15" to "storage node at xxx.15" after about 10 seconds. At the time the three compute nodes were found properly, the installation proceeded at that point.

    Question: Why is the script looking at /var/lib/tftpboot/hosts? Why not /etc/hosts? It seems redundant to be looking at files with identical or nearly identical content, when one would suffice.
  • bjyangf
    bjyangf
    19 Posts

    Re: Installer never finishes

    ‏2012-04-18T11:25:37Z  
    Hi. I am also trying to install SCP, but getting the same problem. The installation never finishes. I am installing the components on 4 laptops; since I do not have any other resources. Here is the configuration of my environment:

    • PXE Server : Lenovo T61 - 2GB Ram, Intel Core2Duo 7300 2GHz CPU, 120 GB Disk (RHEL 6.1 Server x86_64)
    • Storage Node 1 : Lenovo T61 - 4GB Ram, Intel Core2Duo 7300 2GHz CPU, 120 GB Disk
    • Storage Node 2 : Lenovo T61 - 3GB Ram, Intel Core2Duo 7300 2GHz CPU, 120 GB Disk
    • Compute Node : Lenovo T410 - 6GB Ram, Intel Core i5 dual-core CPU, 320 GB Disk

    I am installing PXE Server by selecting minimal deployment option and the installation program asks for Storage Nodes and Compute Nodes. I power on the Storage Nodes and both of the Storage Nodes gets installed successfully. At this time the installation program shows "The detected Storage Node (192.168.19.115) is powered on" and "The detected Storage Node (192.168.19.116) is powered on". Then, I am powering on the Compute Node and it also gets installed and I can login it as root. However, installation program never detects the Compute Node and it waits on the same screen and there is not any entry in /var/lib/tftpboot/hosts file for the Compute Node.

    Note : On each laptop I selected "Enabled" for the Virtualization Technology option in Boot Menu - Config - CPU.

    Do you have any suggestions?

    Best Regards
    Compute nodes' IP addresses will not be recorded in /var/lib/tftpboot/hosts. But they will be put in /etc/dhcp/dhcpd.static of the PXE server. So you can check that file to see if there is any relevant entry.

    And I may need the compute node's boot log to better understand what was going wrong. You can find it at /tmp/startup-<date and time>.log of the compute node.

    Or you can simply reboot the compute node to see if the problem can be solved :-)