IBM BigInsights

Running the installation package

To install the IBM® Open Platform with Apache Hadoop software, download the repository definition, install and start the Ambari server, and complete the installation wizard steps.

Before you begin

Obtain the appropriate IBM Open Platform with Apache Hadoop software package as described in Obtaining software for the IBM Open Platform with Apache Hadoop. Begin your installation with a clean cluster of three nodes and install the operating system. The IBM Open Platform with Apache Hadoop installation includes OpenJDK 1.7.0 and 1.8; OpenJDK 1.8 is preferred and is the default. During installation, you can either install the version provided or make sure Java™ 7 is installed on all nodes in the cluster. For overview information about the Ambari server, see Installing IBM Open Platform with Apache Hadoop.

If you install IBM Open Platform with Apache Hadoop as the non-root user, preface the instructions with sudo, where the instruction would normally require the root user.

UIDs and GIDs must be consistent across all nodes. If you use local service IDs for IBM Open Platform with Apache Hadoop services, ensure that the UIDs and GIDs are consistent across the cluster by creating them manually. For more information about what users and groups to create, see Users and groups for IBM Open Platform with Apache Hadoop.

Note: If you use the SUSE Linux Enterprise Server (SLES) operating system, make sure that you install the ZYpp package for installing and maintaining software packages.

There are a few steps that relate to IBM BigInsights Enterprise Management module and particularly to Spectrum Scale. For more information about the contents of the Enterprise Management module, see Installing the Enterprise Manager module for IBM Open Platform with Apache Hadoop.

Procedure

  1. Log in to your Linux cluster as root, or as a user with root privileges.
  2. Ensure that the nc package is installed on all nodes:
    RHEL
    yum install -y nc
    SUSE
    zypper install -y nc
    If you installed the Basic Server option on your server, the nc package might not be installed, which might result in the failure on data nodes of the IBM Open Platform with Apache Hadoop.
  3. Locate the RPM file, such as iop-4.1.0.0-1.el6.x86_64.rpm, that you downloaded from the download site:
    Tip: The following is the list of current IBM Open Platform with Apache Hadoop RPMs:
    Red Hat Enterprise Linux on x86-64
    • iop-4.1.0.0-1.el6.x86_64.rpm
    • iop-4.1.0.0-1.el7.x86_64.rpm
    Red Hat Enterprise Linux on Power®
    • iop-4.1.0.0-1.el7.ppc64le.rpm
    SUSE Linux Enterprise Server
    • iop-4.1.0.0-1.sles.x86_64.rpm
    Run the following command to install the ambari.repo file into /etc/yum.repos.d in RHEL operating systems or the /etc/zypp/repos.d in the SLES operating systems:
    RHEL
    yum install iop-4.1.0.0-1.<version>.<platform>.rpm
    SUSE
    zypper install iop-4.1.0.0-1.<version>.<platform>.rpm
    If you are using a mirror repository, complete the following steps:
    1. Edit the file /etc/yum.repos.d/ambari.repo or the /etc/zypp/repos.d/ambari.repo. Replace the value for baseURL with your mirror URL. The original baseURL might look like one of the following:
      xRHEL6
      • Ambari: https://ibm-open-platform.ibm.com/repos/Ambari/rhel/6/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/BI-AMBARI-2.1.0-Spark-1.5.1-20160105_1211.el6.x86_64.tar.gz
      • IOP: https://ibm-open-platform.ibm.com/repos/IOP/rhel/6/x86_64/4.1.x/IOP-4.1-Spark-1.5.1-20151210_1028.el6.x86_64.tar.gz
      • IOP-Utils: https://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/6/x86_64/1.1/
      xRHEL7
      • Ambari: https://ibm-open-platform.ibm.com/repos/Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/BI-AMBARI-2.1.0-Spark-1.5.1-20160105_1212.el7.x86_64.tar.gz
      • IOP: https://ibm-open-platform.ibm.com/repos/IOP/rhel/7/x86_64/4.1.x/Updates/4.1.0.0_Spark-1.5.1/IOP-4.1-Spark-1.5.1-20151209_2001.el7.x86_64.tar.gz
      • IOP-UTILS: https://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/7/x86_64/1.1/
      SUSE
      • Ambari: https://ibm-open-platform.ibm.com/repos/Ambari/sles/11/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/BI-AMBARI-2.1.0-Spark-1.5.1-20160105_1528.sles11.x86_64.tar.gz
      • IOP: https://ibm-open-platform.ibm.com/repos/IOP/sles/11/x86_64/4.1.x/Updates/4.1.0.0_Spark-1.5.1/IOP-4.1-Spark-1.5.1-20160105_1528.sles11.x86_64.tar.gz
      • IOP-UTILS: https://ibm-open-platform.ibm.com/repos/IOP-UTILS/sles/11/x86_64/1.1/
      For example:
      baseurl=http://web_server/repos/Ambari/rhel/6/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/...
      Remember, the Linux version number and the platform might be different.
    2. Perform one of the following two actions:
      • Disable gpgcheck in the ambari.repo file. To disable signature validation, change gpgcheck=1 to gpgcheck=0.
      • Keep gpgcheck enabled and change the public key file location to the mirror Ambari repository. Replace gpgkey=http://ibm-open-platform.ibm.com/repos/Ambari/rhel/6/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/.../BI-GPG-KEY.public with the mirror Ambari repository location. For example:
        gpgkey=http://web_server/repos/Ambari/rhel/6/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/.../BI-GPG-KEY.public
      Remember, the Linux version number and the platform might be different.
      Note:
      • The IBM hosted repository uses HTTPS. If your local mirror is not configured for HTTPS, use http:// instead of https://.
      • If you are installing on an operating system other than RHEL6, the paths will be slightly different. Modify as appropriate.
  4. Clean the yum cache on each node so that the right packages from the remote repository are seen by your local yum.
    yum clean all
    or
    zypper clean -a
  5. Install the Ambari server on the intended management node, using the following command:
    yum install ambari-server
    or
    zypper install ambari-server
    Accept the install defaults.
    For IBM BigInsights Enterprise Management module
    Spectrum Scale
    If you will be installing the IBM BigInsights Enterprise Management module to use Spectrum Scale, install the Ambari-gpfs integration package by running the following command:
    ./gpfs.ambari-4.1-0.noarch.bin
  6. If you are using a mirror repository, after you install the Ambari server, update the following file with the mirror repository URLs.
    /var/lib/ambari-server/resources/stacks/BigInsights/4.1/repos/repoinfo.xml
    In the file, change the information from the Original content to the Modified content. Modify according to your level of Linux and platform:
    Example original content Modified content
    <os type="redhat6"> 
    <repo> 
    <baseurl>
    https://ibm-open-platform.ibm.com/repos/Ambari/rhel/6/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/.../</baseurl> 
    <repoid>IOP-4.1</repoid> 
    <reponame>IOP</reponame> 
    </repo> 
    <repo> 
    <baseurl>
    http://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/6/x86_64/1.0</baseurl> 
    <repoid>IOP-UTILS-1.0</repoid> 
    <reponame>IOP-UTILS</reponame> 
    </repo> 
    </os>
    <os type="redhat6"> 
    <repo> 
    <baseurl>
    http://<web.server>/repos/IOP/rhel/6/x86_64/4.1</baseurl> 
    <repoid>IOP-4.1</repoid> 
    <reponame>IOP</reponame> 
    </repo> 
    <repo> 
    <baseurl>
    http://<web.server>/repos/IOP-UTILS/rhel/6/x86_64/1.0</baseurl> 
    <repoid>IOP-UTILS-1.0</repoid> 
    <reponame>IOP-UTILS</reponame> 
    </repo> 
    </os>
    Tip:
    If you use a local repository URL, and you must modify the repo URL, there are three ways to change the repo URL:
    1. Change the repo.xml file manually.
    2. During the cluster installation, change repos on the Ambari web interface at the Select Stack step.
    3. After you complete an installation, use the Ambari web tool:
      1. From the Ambari web dashboard, in the menu bar, click admin > Manage Ambari.
        Managing the repositories from the Ambari dashboard.
      2. From the Clusters panel, click Versions > <stack name>.
      3. Change the URL as needed, and click Save.
      4. Restart the Ambari server.
    There is no need to restart the Ambari server for the second or third option.
    Edit the /etc/ambari-server/conf/ambari.properties file. Change the information from the Original content to the Modified content
    Original content Modified content
    openjdk1.8.url=http://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/6/x86_64/1.1/openjdk/jdk-1.8.0.tar.gz
    openjdk1.8.url=http://<web.server>/repos/IOP-UTILS/rhel/6/x86_64/1.1/openjdk/jdk-1.8.0.tar.gz
    jdk1.7.url=http://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/6/x86_64/2.1.x/GA/2.1/openjdk/jdk-1.7.0.tar.gz
    jdk1.7.url=http://<web.server>/repos/IOP-UTILS/rhel/6/x86_64/1.0/openjdk/jdk-1.7.0.tar.gz
    For IBM BigInsights Enterprise Management module
    Spectrum Scale
    If you will be using Spectrum Scale, update the file /var/lib/ambari-server/resources/stacks/BigInsights/4.1.SpectrumScale/repos/repoinfo.xml to add the GPFS RPM repository:
    <repo>
      <baseurl>http://<web.server>/repos/GPFS</baseurl>
      <repoid>GPFS-4.1.1</repoid>
      <reponame>GPFS</reponame>
    </repo>
  7. Set up the Ambari server:
    1. Run the following command:
      >sudo ambari-server setup
    A Java JDK is installed as part of the Ambari server setup. However, the Ambari server setup also allows you to reuse an existing JDK. The command is:
    >sudo ambari-server setup -j /full/path/to/JDK
    The JDK path set by the -j parameter must be the same on each node in the cluster. For a list of the currently supported JDK versions, see Upgrading the Java (JDK) version.
    Tip: You might need to reboot your nodes if you see a message that SELinux is still enabled.
  8. Start the Ambari server, using the following command:
    >sudo ambari-server start
  9. If the Ambari server had been installed on your node previously, the node may contain old cluster information. Reset the Ambari server to clean up its cluster information in the database, using the following commands:
    >sudo ambari-server stop
    
    >sudo ambari-server reset
    
    >sudo ambari-server start
  10. Access the Ambari web user interface from a web browser by using the server name (the fully qualified domain name) on which you installed the software, and port 8080. For example, enter abc.com:8080.
    Note: You can use any available port other than 8080 that will allow you to connect to the Ambari server. In some networks, port 8080 is already in use. To use another port, do the following:
    1. Edit the ambari.properties file:
      vi /etc/ambari-server/conf/ambari.properties   
    2. Add a line in the file to select another port:
      client.api.port=8081
    3. Save the file and restart the Ambari server:
      >sudo ambari-server restart
  11. Log in to the Ambari server (http://<server-name>:8080) with the default username and password: admin/admin.

    The default username and password is required only for the first login. You can configure users and groups after the first login to the Ambari web interface.

    The port 8080 is the default. If you changed the default port in the previous step, use the modified port number.

  12. On the Welcome page, click Launch Install Wizard.
  13. On the Get Started page, enter a name for the cluster you want to create. The name cannot contain blank spaces or special characters. Click Next.
  14. On the Select Stack page, click the Stack version you want to install:
    Option Description
    IBM Open Platform with Apache Hadoop The administrator selects BigInsights 4.1 as the stack.
    IBM BigInsights Enterprise Management module, Spectrum Scale The administrator selects BigInsights 4.1 SpectrumScale as the stack.
    Note: Remember that you can change the repo URL if you use a mirror site.

    Click Next.

  15. On the Install Options page, complete the following two steps:
    1. In Target Hosts, add the list of hosts that the Ambari server will manage and the IBM Open Platform with Apache Hadoop software will deploy. Specify one node per line, as in the following example:
      host1.company.com
      host2.company.com
      host3.company.com
      host4.company.com

      The host name must be the fully qualified domain name (FQDN).

    2. In Host Registration Information, select one of the two options:
      • Provide your SSH Private Key to automatically register hosts

        Click SSH Private Key. The private key file is /root/.ssh/id_rsa, if the root user installed the Ambari server. If you installed as a non-root user, then the default private key is in the .ssh directory in the non-root user's home directory. Click Choose File to find the private key file you installed previously. You should have retained a copy of the SSH private key (.ssh/id_rsa) in your local directory when you set up password-less SSH (see Preparing your environment). Copy and paste the key into the text box manually. Click the Register and Confirm button.

        You have the option to either upload the private key file with the Ambari web interface or you can paste the content of the file to the web interface.

      • Perform manual registration on hosts and do not use SSH

        You can choose this option when the ambari-agents are manually installed on all nodes, and they are running. In this case, no password-less SSH setup is required. For more information, see https://ambari.apache.org/1.2.0/installing-hadoop-using-ambari/content/ambari-chap6-1.html.

  16. On the Confirm Hosts page, you check that the correct hosts for your cluster have been located:
    If hosts were selected in error, remove the hosts one-by-one by following these steps:
    1. Click the box next to the server to be removed.
    2. Click Remove in the Action column.

    If warnings are found during the check process, you can click Click here to see the warnings to see what caused the warnings. The Host Checks page identifies any issues with the hosts. For example, a host may have Transparent Huge Pages or Firewall issues. For information on how to address these issues, see Preparing your environment.

    After you resolve the issues, click Rerun Checks on the Host Checks page. When you have confirmed the hosts, click Next.

  17. On the Choose Services page, select the services you want to install.
    Option Description
    IBM Open Platform with Apache Hadoop

    You must select to install HDFS.

    Ambari shows a confirmation message to install the required service dependencies. For example,when selecting Oozie only, the Ambari web interface shows messages for accepting YARN/MR2, HDFS and Zookeeper installations.

    IBM BigInsights Enterprise Management module, Spectrum Scale

    You must select the Spectrum Scale service. This service is available only if you ran the Spectrum Scale Ambari integration package in Step 5.

    IBM Open Platform with Apache Hadoop and Spectrum Scale will be installed together.

    .

    Click Next.

  18. On the Assign Masters page, assign the master nodes to hosts in your cluster for the services you selected. Refer to the right column for the default service assignments by host. You can accept the current default assignments. To assign a new host to run services, click the dropdown list next to the master node in the left column and select a new host. Click Next.

    To see a suggested layout of services, see Suggested services layout for IBM Open Platform with Apache Hadoop

  19. On the Assign Slaves and Clients page, assign the slave and client components to hosts in your cluster. You can accept the default assignments. Click all or none to decide the host assignments. Or, you can select one or more components next to a selected host (that is, DataNode, NodeManager, RegionServer, Flume, Client). Click Next.
    Tip: If you anticipate adding the Big SQL service at some later time, you must include all clients on all the anticipated Big SQL worker nodes. Big SQL specifically needs the HDFS, Hive, HBase, Sqoop, HCat, and Oozie clients.
  20. On the Customize Services page, select configuration settings for the services selected. Default values are filled in automatically when available and they are the recommended values. The installation wizard prompts you for required fields (such as password entries) by displaying a number in a circle next to an installed service. Click the number and enter the requested information in the field outlined in red. Make sure that the service port that is set is not already used by another component.
    Important: For example, the Knox gateway port is, by default, set as 8443. But, when the Ambari server is set up with HTTPs, and the SSL port is set up using 8443, then you must change the Knox gateway port to some other value.

    Warning: Thoroughly review all of the directory settings for data and logs in the services configuration to make sure that the proper disks are used. This is especially important for Hadoop data directory settings such as namenode directories, and datanode direc

    Note: If you are working in an LDAP environment where users are set up centrally by the LDAP administrator and therefore, already exist, selecting the defaults can cause the installation to fail. Open the Misc tab, and check the box to ignore user modification errors.
    For IBM BigInsights Enterprise Management module
    Spectrum Scale
    If you will be using Spectrum Scale, the settings for core-site, hdfs-site and hadoop-env are located on the Spectrum Scale tab.
  21. When you have completed the configuration of the services, click Next.
  22. On the Review page, verify that your settings are correct. Click Deploy.
  23. The Install, Start, and Test page shows the progress of the installation. The progress bar at the top of the page gives the overall status while the main section of the page gives the status for each host. Logs for a specific task can be displayed by clicking on the task. Click the link in the Message column to find out what tasks have been completed for a specific host or to see the warnings that have been encountered. When the Ambari web interface displays messages about the install status and displays the Next button, click it to proceed to the next page.
  24. On the Summary page, review the accomplished tasks. Click Complete to go to the IBM Open Platform with Apache Hadoop dashboard.

What to do next

Note about upgrading RHEL
BigInsights does not support active upgrade of the RHEL operating system. If you want to upgrade from RHEL version 6.x to version 7.x, you must re-install BigInsights using the RHEL 7.x options described in this topic.
MySQL/MariaDB
If you plan to use MySQL/MariaDB for the Hive Metastore, ensure that the mysqld service starts on reboot. Run the following command on the node that will host the Hive Metastore:
Note: Install MySQL in the RHEL 6.x operating system. Install MariaDB in the RHEL 7.x operating system.
RHEL 6.x
chkconfig mysqld on
service mysqld start
RHEL 7.x
systemctl start mariadb.service
systemctl enable mariadb.service
postgresql
Ensure that the postgresql service, which is used by Ambari, starts automatically on reboot. Run the following command on the node that will host Ambari database:
RHEL 6.x
chkconfig postgresql on
service postgresql start
RHEL 7.x
systemctl start postgresql.service
systemctl enable postgresql.service
HDFS caching
HDFS caching is supported in the IBM Open Platform with Apache Hadoop. To make sure that it can be started successfully, you must change two configuration settings:
  1. From the Linux command line where the Ambari server is running, navigate to the /etc/security/ folder, and edit the limits.conf file:
    hdfs - memlock 200000
    This value must be set to equal or greater than the value that you set for the dfs.datanode.max.locked.memory property. Also, consider the memory available on the server when you set the memlock value.
  2. Open the Ambari web interface, and select the HDFS service.
  3. Stop the HDFS service.
  4. Click the Configs tab.
  5. Expand the Advanced hdfs-site section and locate the following property to add the value:
    dfs.datanode.max.locked.memory       200000
    Restriction: To make sure data is cached, the lowest value for this property must be bigger than the virtual memory page size (the default value is 4096 bytes = getconf PAGE_SIZE).
  6. Restart the HDFS service.
For more information about HDFS caching, see HDFS caching.
Troubleshooting
Be sure to check the available troubleshooting information if you have problems.


Feedback