ESS 5000 Common installation instructions
prt1
and prt2
for protocol nodes. You might have more than two protocol nodes.- Log in to the EMS node by using the management IP (set up by SSR by using the provided
worksheet). The default password is
ibmesscluster
. - Set up a campus or a public connection (interface enP1p8s0f2). Connect an Ethernet cable
to C11-T3 on the EMS node to your lab network. This connection serves as a way to access the GUI or
the ESA agent (call home) from outside of the management network. The container creates a bridge to
the management network, thus having a campus connection is highly advised.Note: It is recommended but not mandatory to set up a campus or public connection. If you do not set up a campus or a public connection, you will temporarily lose your connection when the container bridge is created in a later step.
This method is for configuring the campus network, not any other network in the EMS node. Do not modify T1, T2, or T4 connections in the system after they are set by SSR, and use the SSR method only to configure T1 and T2 (if changing is mandatory after SSR is finished). That includes renaming the interface, setting IP, or any other interaction with those interfaces.
You can use the nmtui command to set the IP address of the campus interface. For more information, see Configuring IP networking with nmtui.
- Complete the /etc/hosts file on the EMS node. This file must contain the
low-speed (management) and high-speed (cluster) IP addresses, FQDNs, and short names. The high-speed
names must contain a suffix to the low-speed names (For example, essio1-hs (high-speed name) to
essio1 (low-speed name)). This file must also contain the container host name and the IP address.
127.0.0.1 localhost localhost.localdomain.local localhost4 localhost4.localdomain4 ## Management IPs 192.168.45.0/24 192.168.45.20 ems1.localdomain.local ems1 192.168.45.21 essio1.localdomain.local essio1 192.168.45.22 essio2.localdomain.local essio2 192.168.45.23 prt1.localdomain.local prt1 192.168.45.24 prt2.localdomain.local prt2 ## High-speed IPs 10.0.11.0/24 10.0.11.1 ems1-hs.localdomain.local ems1-hs 10.0.11.2 essio1-hs.localdomain.local essio1-hs 10.0.11.3 essio2-hs.localdomain.local essio2-hs 10.0.11.4 pr1-hs.localdomain.local prt1-hs 10.0.11.5 pr2-hs.localdomain.local prt2-hs ## Container info 192.168.45.0/24 192.168.45.80 cems0.localdomain.local cems0 ## Protocol CES IPs 10.0.11.100 prt_ces1.localdomain.local prt_ces1 10.0.11.101 prt_ces1.localdomain.local prt_ces1 10.0.11.102 prt_ces2.localdomain.local prt_ces2 10.0.11.103 prt_ces2.localdomain.local prt_ces2
Note:localdomain.local
is just an example and cannot be used for deployment. You must change it to a valid fully qualified domain name (FQDN) during the /etc/hosts setup. The domain must be the same for each network subnet that is defined. Also, ensure that you set the domain on the EMS node (hostnamectl set-hostname NAME).NAME must be the FQDN of the management interface (T1) of the EMS node. If you need to set other names for campus, or other interfaces, those names must be the alias but not the main host name as returned by the hostnamectl command.
You can set up the EMS FQDN manually or wait until prompted when the ESS deployment binary is started. At that time, the scripts confirms the FQDN and provides the user a chance to make changes.
- If you are planning to set up an ESS 3000 system with the ESS 5000 EMS node, add the ESS 3000 host names to /etc/hosts by using the same structure (low-speed (management) and high-speed (cluster) IP addresses, FQDNs, and short names).
- Do not use any special characters, underscores, or dashes in the host names other than the high speed suffix (example: -hs). Doing this might cause issues with the deployment procedure.
- Clean up the old containers and images.Note: Typically, this is applicable only for upgrades.
- List the containers.
podman ps -a
- Stop and remove the
containers.
podman stop ContainerName podman rm ContainerName -f
- List the images.
podman images
- Remove the
images.
podman image rm ImageID -f
- [Recommended] Remove container bridges as follows.
- List the currently configured bridges.
nmcli c
- Clean up any existing bridges before the new container is set up. The bridge names must be
mgmt_bridge
andfsp_bridge
.nmcli c del BridgeName
- List the currently configured bridges.
- List the containers.
- Extract the installation package.Note: Ensure that you check the version that is installed from manufacturing (SSR worksheet). If there is a newer version available on Fix Central, replace the existing image in /home/deploy with the new image and then remove the old tgz file before doing this step.
cd /home/deploy tar zxvf ess5000_6.0.1.2_1204-02_dme.tgz ess5000_6.0.1.2_1204-02_dme.sh ess5000_6.0.1.2_1204-02_dme.sh.sha256
- Accept the license and install the accepted image.
./ess5000_6.0.1.2_1204-02_dme.sh --text-only --start-container
Note:- The --install-image flag will be deprecated soon. Stop and remove any existing container.
- The --text-only flag is used to extract the contents of the tgz file after
accepting the license agreement. Immediately afterward, the --start-container flag
is used to do the following steps automatically.
- Run essmkyml that prompts the user to:
- Confirm EMS FQDN and change it.
- Provide the container short name.
- Provide a free IP address on the FSP subnet for the container FSP connection.
- Run essmkyml that prompts the user to:
Press 1 to accept the license after reading the agreement.
Example of contents of the extracted installation package:ess5000_6.0.1.2_1204-02_dme.dir/ ess5000_6.0.1.2_1204-02_dme.dir/ess5000_6.0.1.2_1204-02_dme.tar ess5000_6.0.1.2_1204-02_dme.dir/ess5000_6.0.1.2_1204-02_dme_binaries.iso ess5000_6.0.1.2_1204-02_dme.dir/rhel-8.1-server-ppc64le.iso ess5000_6.0.1.2_1204-02_dme.dir/podman_rh8.tgz ess5000_6.0.1.2_1204-02_dme.dir/essmgr ess5000_6.0.1.2_1204-02_dme.dir/essmgr.yml ess5000_6.0.1.2_1204-02_dme.dir/Release_note.ess5000_6.0.1.2_1204-02_dme.txt ess5000_6.0.1.2_1204-02_dme.dir/classes/ ess5000_6.0.1.2_1204-02_dme.dir/classes/essmgr_yml.py ess5000_6.0.1.2_1204-02_dme.dir/classes/__init__.py ess5000_6.0.1.2_1204-02_dme.dir/essmkyml
For this step, you must provide these inputs:- Container name (must be in /etc/hosts or be resolvable by using DNS)
- Container FSP IP address (must be on the same network block that is set on C11-T2)
- Confirmation of the EMS FQDN (must match what is set for the management IP in /etc/hosts). If this value needs to be changed or set, essmkyml helps with that task.
- EMS host name must be on the management network (also called xCAT).
Other networks can be aliases (A) or canonical names (CNAME) on DNS or on the
/etc/hosts
file.
Is the current EMS FQDN c145f05zems06.gpfs.net correct (y/n):
- Remember not to add the DNS domain localdomain to the
input:
Please type the desired and resolvable short hostname [ess5k-cems0]: cems0
- Remember that the IP address must belong to the 10.0.0.x/24 network block (It is assumed that
the recommended FSP network was
used):
Please type the FSP IP of the container [10.0.0.5]: 10.0.0.80
Note: The values in parentheses ([ ]) are just examples or the last entered values.If all of the checks pass, the essmgr.yml file is written and you can proceed to bridge creation, if applicable, and running the container.
Note: The original essmgr.yml file and detailed logs of checks that are performed are stored in the ./logs directory.At this point, if all checks are successful, the image is loaded and container is started. Example:ESS 5000 CONTAINER root@cems0:/ #
- Run the config load command.
essrun -N essio1,essio2,ems1 config load -p ibmesscluster
Note:- Use the low-speed management host names. Specify the root password with
-p
. - The password (-p) is the root password of the node. By default, it is
ibmesscluster
. Consider changing the root password after deployment is complete. - This command attempts to connect to each node's FSP interface through IPMI by using the default
password (serial number). If the password has changed, you are prompted to enter the new
password.To determine the serial number, do the following:
- Log in to the node by using the management IP address.
- Issue this command: cat /proc/device-tree/system-id
After this command is run, you can use
-G
for future essrun steps (For example,-G ess_ppc64le
). - Use the low-speed management host names. Specify the root password with
Instructions if the latest ESS 5000 package version is the same as the one from manufacturing
If the ESS version on the system is already at the latest version shipped from manufacturing, proceed directly to the network bond creation step. If the version is different, use the instructions in the next section.
- Create network
bonds.
essrun -G ess_ppc64le network --suffix=-hs essrun -N ems1 network --suffix=-hs
- Run the network test.
This test uses nsdperf to determine if the newly created network bonds are healthy.
SSH from the container to an I/O node or the EMS node.
This command performs the test (with optional RDMA test after if Infiniband). Ensure that there are no errors in the output indicating dropped packets have exceeded thresholds. When completed, type exit to return back to the container.ssh essio1 ESSENV=TEST essnettest -N essio1,essio2 --suffix=-hs
- Create the
cluster.
essrun -G ess_ppc64le cluster --suffix=-hs
- Add the EMS node to the
cluster.
essrun -N essio1 cluster --add-ems ems1 --suffix=-hs
- Create the file system.
essrun -G ess_ppc64le filesystem --suffix=-hs
Note:- By default, this command attempts to use all the available space. If you need to create multiple
file systems or a CES shared root file system for protocol nodes, consider using less space. For
example:
essrun -G ess_ppc64le filesystem --suffix=-hs --size 80%
- This step creates combined metadata + data vdisk sets by using a default RAID code and block size. You can use additional flags to customize or use the mmvdisk command directly for advanced configurations.
- By default, this command attempts to use all the available space. If you need to create multiple
file systems or a CES shared root file system for protocol nodes, consider using less space. For
example:
Instructions if the latest ESS 5000 package version is not the same as the one from manufacturing
- Update the EMS node.Important: [Online update only] Ensure that all ESS 5000 nodes are active by first running this command from one of the cluster nodes: mmgetstate -N ess5k_ppc64le. If any nodes are not active, quit the upgrade procedure and resolve this issue before proceeding with the upgrade.
essrun -N ems1 update --offline Please enter 'accept' indicating that you want to update the following list of nodes: ems1 >>> accept
Note: If the kernel is changed, you are prompted to leave the container, reboot the EMS node, restart the container, and run this command again.For example:
Navigate back to ESS 6.0.1.2 extracted directory and run the following commands:essrun -N ems1 --offline Exit systemctl reboot
./essmgr -r essrun -N ems1 --offline
- Update the IO
nodes.
essrun -G ess_ppc64le update --offline
- Create network
bonds.
essrun -G ess_ppc64le network --suffix=-hs essrun -N ems1 network --suffix=-hs
- Run the network test.
This test uses nsdperf to determine if the newly created network bonds are healthy.
SSH from the container to an I/O node or the EMS node.
This command performs the test with an optional RDMA test afterward if there is Infiniband. Ensure that there are no errors in the output indicating dropped packets have exceeded thresholds. When completed, type exit to return back to the container.ssh essio1 ESSENV=TEST essnettest -N essio1,essio2 --suffix=-hs
- Create the
cluster.
essrun -G ess_ppc64le cluster --suffix=-hs
- Add the EMS node to the
cluster.
essrun -N essio1 cluster --add-ems ems1 --suffix=-hs
- Create the file system.
essrun -G ess_ppc64le filesystem --suffix=-hs
Note:- By default, this command attempts to use all the available space. If you need to create multiple
file systems or a CES shared root file system for protocol nodes, consider using less space. For
example:
essrun -G ess_ppc64le filesystem --suffix=-hs --size 80%
- This step creates combined metadata + data vdisk sets by using a default RAID code and block size. You can use additional flags to customize or use the mmvdisk command directly for advanced configurations.
- By default, this command attempts to use all the available space. If you need to create multiple
file systems or a CES shared root file system for protocol nodes, consider using less space. For
example:
Final setup instructions
- From the EMS node (outside of the container), configure and start the performance monitoring
collector.
mmperfmon config generate --collectors ems1-hs
- From the EMS node (outside of the container), configure and start the performance monitoring
sensors.
mmchnode --perfmon -N ems1-hs,essio1-hs,essio2-hs
- Capacity and fileset quota monitoring is not enabled in the GUI by default. You must correctly
update the values and restrict collection to the EMS node only.
- To modify the GPFS Disk Capacity
collection interval, run the following
command.
mmperfmon config update GPFSDiskCap.restrict=EMSNodeName GPFSDiskCap.period=PeriodInSeconds
The recommended period is 86400 so that the collection is done once per day.
- To restrict GPFS Fileset Quota to run on
the management server node only, run the following
command.
mmperfmon config update GPFSFilesetQuota.period=600 GPFSFilesetQuota.restrict=EMSNodeName
Here the EMSNodeName must be the name shown in the mmlscluster output.
Note: To enable quota, the filesystem quota checking must be enabled. Refer mmchfs -Q and mmcheckquota commands in IBM Spectrum Scale: Command and Programming Reference.
- To modify the GPFS Disk Capacity
collection interval, run the following
command.
- Verify that the values are set correctly in the performance monitoring configuration by running
the mmperfmon config show command on the EMS node. Ensure that
GPFSDiskCap.period
is properly set, andGPFSFilesetQuota
andGPFSDiskCap
are both restricted to the EMS only.Note: If you are moving from manual configuration to auto configuration then all sensors are set to default. Make the necessary changes using the mmperfmon command to customize your environment accordingly. For information on how to configure various sensors using mmperfmon, see Manually installing IBM Spectrum Scale GUI. - Start the performance collector on the EMS
node.
systemctl start pmcollector
- Start the GUI.
systemctl start gpfsgui
- Create the GUI admin
user.
/usr/lpp/mmfs/gui/cli/mkuser UserName -g SecurityAdmin
- In a web browser, enter the public or campus IP address with
https
and walk through the System Setup wizard instructions.
- Create the GUI admin
user.
- Log in to each node and run the following
command.
essinstallcheck -N localhost
Doing this step verifies that all software and cluster versions are up-to-date.
- From the EMS node, outside of the container, run the following final health check commands to
verify your system
health.
gnrhealthcheck mmhealth node show -a
- Set the time zone and set up Chrony.
Before getting started, ensure that Chrony and time zone are set correctly on the EMS and I/O nodes. Refer to How to set up chronyd (time server) to perform these tasks before proceeding.
- Set up call home. For more information, see Drive call home.The supported call home configurations are:
- Software call home
- Node call home (including for protocol nodes)
- Drive call home
- Refer to Client node tuning recommendations.