Topic
10 replies Latest Post - ‏2009-04-30T10:51:45Z by SystemAdmin
SystemAdmin
SystemAdmin
476 Posts
ACCEPTED ANSWER

Pinned topic CSM FAQs and Tips

‏2006-07-13T19:15:18Z |
This thread will contain Frequently Asked Questions and tips for CSM versions newer than 1.5. Some of the items may link to the IBM support site, other forum postings, or other sites.

This FAQ covers CSM versions 1.5 through 1.7

Click on the FAQ categories below:
-General
-Installation
-Migration, Coexistence and Interoperability
-Security
-Hardware/Remote Control
-Monitoring
-Node Commands
-RMC
-Configuration File Manager
-Software Maintenance

If you have a question that you'd like to add to this FAQ, please post a new topic to this forum.

Edited by: zhaoyang_admin on {1}

Edited by: sunjing on Apr 30, 2009 3:32 AM
Updated on 2009-04-30T10:51:45Z at 2009-04-30T10:51:45Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    476 Posts
    ACCEPTED ANSWER

    CSM General FAQ

    ‏2009-05-05T02:46:13Z  in response to SystemAdmin
    This is the General CSM FAQ

    FAQ Index

    Q: Where can I find the documentation for CSM?
    A: You can access the documentation from here.

    Q: Are there any resources beyond the standard documentation to help me learn and use CSM?
    A: Other resourses can be found here.

    Q: What are the supported HMC levels for which CSM versions?
    A: Go to the POWER5 code matrix website for information regarding latest release levels and code combinations.

    Q: What hardware is supported by which Linux OS distributions?
    A: Please see the Planning for cluster hardware sections of the CSM Planning and Installation guide. You may need to page down in this chapter to locate the OS and Hardware tables.

    Q: Can I use a logical partition (LPAR) on a pSeries server as my management server?
    A: Yes. You can use an LPAR as a CSM management server, but it is important to understand the limitations and considerations to decide if it is appropriate for your CSM cluster. These considerations are discussed in the Planning chapter in the CSM Planning and Installation Guide, which can be found here and include scenarios such as:
    • The LPAR can be inadvertently brought down by someone on the HMC running commands either directly to the LPAR or to the System p server
    • If the firmware on the System p server needs to be upgraded, the entire System p server may need to be brought down
    • If the LPAR belongs to a System p server that is down for a hardware or power failure, you will lose access to your management server

    • An LPAR may not have an attached display, or may not contain media devices such as CD, tape or diskette drives, which may impact your management or backup strategies.
    Q: When I issue the command "dsh -vn , I get the following:
    "dsh: 2617-025 Hostfd is not responding. No command will be issued to this host"
    even though the host is responding to pings.

    A: dsh uses the CSM "Status" attribute to determine node availability when CSM is installed. Run "lsnode -a Status" to check the CSM node status attribute. To force dsh to use ping to check node status when using -v, add --forcePing to the command line or set the DSH_PING_VERIFY environment variable.

    Q: How can I ask a more specific question about CSM?
    A: CSM maintains a forum on IBM developerWorks. Click here to go to the forum. For actual defects in the software, contact IBM support at 1-800-IBM-SERV.

    Q: How can I suggest a question that should be added to this FAQ?
    A: We welcome suggestions on how this FAQ can be more useful. Please post all suggestions to the forum. The CSM Development community monitors this forum and will respond appropriately. Thanks.

    Q: Can I combine different operating systems and versions in the same CSM cluster?
    A: Yes, starting with CSM 1.4.1.10, CSM supports full interoperability between different OS versions and CSM versions.

    Q: How can I check to see if any updates or fixes are available for CSM?
    A: Go here for the CSM Service website. This site contains software updates, the FAQs, documentation errata for CSM, and other related products and documentation.


    Edited by: sunjing on Apr 30, 2009 3:05 AM

    Edited by: sunjing_admin on May 4, 2009 10:44 PM
  • SystemAdmin
    SystemAdmin
    476 Posts
    ACCEPTED ANSWER

    CSM Installation FAQ

    ‏2009-05-06T02:40:38Z  in response to SystemAdmin

    This is the CSM Installation FAQ

    FAQ Index

    Q: On a CSM for AIX management server why does the updatenode command fail after I have run the csmsetupnim command?
    A: The csmsetupnim command will create and allocate a NIM script resource.
    As part of the allocation the directory which contains the script will
    be NFS exported. The directory will not be unexported until the operating
    system installation is complete. During this time if you try to run
    updatenode on another node it may fail because it also exports the same
    directory and there may be a conflict. This is a restriction that will
    be removed in a future release.

    Q: When I use the CSM SMIT support why do I see a mixture of languages in the panels?
    A: The CSM messages in the commands SMIT panels are only translated for major AIX releases (AIX 5.2, AIX 5.3, etc.). In between these releases, CSM adds messages in English only. If the language you have set on your system is not English, you may see a mixture of your language and English in some of the panels. The complete set of message catalogs, for all supported languages, will be provided in the next major AIX release.

    Q: Some required packages of CSM have been installed on the Management Server already, why does installms fail and report they do not exist?
    A: Even though the required packages have been installed on the Management Server, CSM still needs all required packages in RPM format. Be sure to follow the steps in the Planning and Installation Guide to obtain all RPMs and install them in the appropriate directory.

    Q: What should I do if I see this error when running installms:



    sh: -c: line 1: syntax error near unexpected token '('
    sh: -c: line 1: 'chmod 0440 /tftpboot/ES4524C_17(V1.1.0.2).bix 2>&1'
    installms: Exit code 2 from command: chmod 0440
    /tftpboot/ES4524C_17(V1.1.0.2).bix 2>&1

    A: This happens when SMC firmware updates are placed in the /tftpboot
    directory to update the SMC 8624T switches. The filenames have parentheses
    in them, which causes problems in installms. To fix this, just rename the
    SMC files and remove the parentheses. Then rerun installms.

    Q: My application requires a different version of Java than the one provided by CSM. Can both versions of Java coexist on the same machine?
    A: Yes. In general, each version of Java is installed into a different directory,
    so different versions of Java will not conflict with each other. Also, many
    versions of Java have different RPM names than the version CSM provides
    (IBMJava2-JRE). In these cases, there is no coexistance problem. However,
    in some cases, the Linux distribution provides a version of java that has the
    same RPM name as the one provided with CSM (IBMJava2-JRE). You might see this
    error when running installms:




    The IBMJava2-JRE packaged by Red Hat is already installed. This version of
    IBMJava2-JRE is not supported. Please uninstall this rpm and run
    installms -x.



    If you see this error, you need to uninstall your version of IBMJava2-JRE,
    rerun installms, and then reinstall your version of IBMJava2-JRE after CSM
    is installed.

    Q: The installms command displayed errors and stopped. Can I just rerun it with a partially installed system?
    A: Yes, after you have corrected the original error, you can run installms again (with the same flags used in the previous run) and it will only install what was not already successfully installed.

    Q: What is the quickest way to install a new management server if I already have a management server set up in a cluster and I want to install another one?
    A: The easiest way to install a new management server is to use the updateisvr command.
    This makes use of the heirarchical capabilities of CSM.
    Here we will use 2 nodes: The Executive Management Server (EMS) and the
    Leaf Management Server (LMS). The EMS is simply a management server of leaf
    management servers.
    First install the LMS as you would a regular node in the cluster from the EMS
    In this example the Leaf management server is b5 and the executive is c5bn14
    From the EMS run:
    • chnode -n b5 InstallServer=b5:/csminstall # changes b5 into an install server
    • updateisvr -n b5 # this will copy the contenets of /csminstall on the EMS to the LMS

    • Next run installms -x via dsh on the EMS or ssh to the LMS and run it there.
      Your new management server should now be up and running!

      Q: Do I need to update any package on my Red Hat EL 3(GA) cluster before installing CSM?
      A: The glibc RPMs need to be updated to 2.3.2-95.6, which are available with RedHat EL 3 QU1. The glibc RPMs are:
    • glibc-2.3.2-95.6.i386.rpm
    • glibc-devel-2.3.2-95.6.i386.rpm
    • glibc-utils-2.3.2-95.6.i386.rpm
    • glibc-2.3.2-95.6.i686.rpm
    • glibc-headers-2.3.2-95.6.i386.rpm

    • glibc-common-2.3.2-95.6.i386.rpm
    • glibc-profile-2.3.2-95.6.i386.rpm


    • The Management server needs to be updated before running installms
    • If OS is installed on the node then update glibc on the node before bring the node in to the cluster, i.e. before issuing updatenode command for that node.
    • If OS is to be installed on the node, then copy the glibc RPMs to /csminstall/csm/scripts/data directory before running installnode.

    Q: When installing a node why does the CSM script (csmfirstboot) hang when doing a mount of /csminstall from the management server? This only seems to happen on nodes that have additional (secondary) adapters configured.
    A: When the installation adapter and an additional (secondary) adapter are on the same subnet, then problems may occur when both adapters are configured but one does not have a connection. For example if the secondary adapter is configured, but unplugged, then the mount command may fail. This is because packets alternate between adapters on the same subnet and some are sent out of the adapter with a bad connection to the network.

    To fix the problem either connect the secondary adapter or unconfigure it.

    Q: Do I need to reboot the management server after running installms?
    A: No.

    Q: What version of the Broadcom Driver should I be using on my x445?
    A: If there are any x445 nodes in your cluster, and you want CSM to automatically obtain their MAC addresses using getadapters or csmsetupks, the management server must be running the 2.4.9-e.24smp or 2.4.9-e.24summit kernel.

    Q: When I clone a node, "updatenode" fails. Why?
    A: Running updatenode on a cloned CSM node will fail because the rmc resource
    id is the same for two nodes. This needs to be changed.
    It can be changed by going to the cloned node and running /usr/sbin/rsct/install/bin/recfgct.
    Run updatenode from the management server as usual for that node and it will install.

    Q: Why is it when I reinstall my blades, the BIOS comes up and says there has been a change, and then requests you to go into setup to config the BIOS?
    A: There is a "Stop On POST Errors" option in the BIOS setup that is set by default by manufacturing. This was required for their testing to assure the machines were running properly. You can go to the BIOS and disable this option. It can be found in the "Start Options" submenu off of the main menu in the BIOS setup. In many cases these "errors" that will cause the BIOS to stop and go to the configuration menu is because some setting in the BIOS was changed. Most of these are harmless due to the remote nature of the Blades.

    Q: I installed CSM but when I tried to do "csmconfig -L" the default license wasn't there!
    And when I tried "lppchk -f" I get the following error:
    lppchk: 0504-206 File /var/opt/csm/lic/default could not be located.
    Why is this?

    A: This packaging glitch happens when the following operations occur:
    • You install csm.server without first installing the prerequisite rpms.
    • CSM fails to install, and you realize that you forget to install the prerequisite rpms.
    • You install the prerequisite RPMS.
    • You reinstall csm.server.

    • You run csmconfig -L to install the "try and buy" license and it can not be found.
    To work around this, the best thing to do is to uninstall the csm.server installp image
    and then force install CSM again. Also, if you have the purchased license then you can
    simply ignore this, because the license is on the CD and you can run
    "csmconfig -L <license file>".

    Q: Can the definenode command be used to redefine a node that already exists?
    A: Yes. The definenode command supports the "-m" flag which allows the command to
    modify the definitions of a node.

    Q: Why do some CSM commands require the use of the -n flag when specifying nodes, and some commands do not require the -n flag?
    A: The -n flag is used by all CSM commands to specify the nodes to act upon. As a convenience, for commands whose primary input is node names, CSM supports node names as positional arguments, and the -n flag is not required. Some of the commands that do not require the -n flag to specify nodes are lsnode, chnode, updatenode and installnode. See each command's man page or usage message to determine each command's syntax.

    Q: Will CSM format and install all my storage devices during a kickstart install?
    A: Yes, the default kickstart configuration file shipped by CSM installs to all drives
    found on your node, including all externally attached storage devices such as ServeRAID.
    If you have some drives that should not be formatted and installed during a kickstart install you must modify the kickstart configuration template file to list the specific drives to install.
    Use the "--ondisk" directive, for example:
    part / --size 1024 --grow --fstype ext3 --ondisk sda

    Q: After the operating system installed, the reboot to the local disk failed. How can I fix this?
    A: In certain cases, the level of syslinux may need to be updated on your management server. Follow this
    procedure.
    1. Download the latest version of the syslinux RPM from here!
    2. Install the new syslinux RPM using the "rpm" command.
    3. Rerun installms -x
    4. Rerun csmsetupks -x or csmsetupsis -x
    5. Rerun installnode for the failed nodes.


    Q: How can I check if the OS install of a node install is hung?
    A: If a node takes an unusually long time to install (more than a half hour or so), the operating system install might be hung or waiting for user input. Open a remote console to that node using the rconsole command to see the progress.

    Q: Can I run csmsetupyast, csmsetupks, or csmsetupinstall while an install is running, or while another instance of csmsetupyast, csmsetupks, or csmsetupinstall is running?
    A: No, while an install is running, or while MAC addresses are being collected by csmsetupyast, csmsetupks or getadapters, you cannot run another instance of csmsetupyast, csmsetupks, csmsetupinstall, getadapters, or installnode.

    Q: Where can I download drivers needed for x335 and x345 nodes?
    A: http://techsupport.services.ibm.com/server/cluster2/fixes/csmdriverdownload.html

    Q: I see the message:
    Error opening kickstart file: /tmp/ks.cfg

    A: There are several reasons why one might see
    this message:
    1. The Management Server can not be reached from the node and thus the kickstart file can not be found. Check the:

      • /etc/hosts
      • /etc/nsswitch.conf
      • /etc/resolv.conf

      files on the management server. If the management server's
      IP address is wrong in the /etc/hosts file, and the "hosts:" line in the
      /etc/nsswitch.conf file lists "files" before "dns", then the management server's hostname will be mapped to the
      wrong IP address. Change or remove the management server's entry in /etc/hosts, and then

      rerun csmsetupks and installnode.

    2. You changed the kickstart file to look for and it can't be found. When this happens, the node by default will look for /tmp/ks.cfg.

    3. Your switch's spanning tree protocol is getting in the way. We have found this to be a very common problem in the Cisco switches. With out 'spanning-tree portfast' option in the switch configuration, DHCP may fail because it takes too long for a port to come online after a machine powers up. Do not set spanning-tree portfast on ports that will connect to other switches, but do it to ports connecting to nodes.


    Q: What do I do if one of the install commands displays errors and I don't understand why?
    A: Look at the command's log file to see the verbose messages. The installms, csmsetupks, updatenode and installnode commands create a log file in /var/log/csm/<commandname>.log. You can also see the verbose messages on the screen by running the command again with the -v (verbose) option.

    Q: What do I do when a node that I want to add to the cluster is said to be already defined?
    A: Run the definenode command with the -m flag. This flag allows you to modify attributes for existing nodes, and at the same time add new node definitions.

    Q: How do I update the cluster if I change the host name of a managed node?
    A: From the management server, first run the chnode command to set the ManagementServer attribute appropriately for all the nodes, for example chnode -a ManagementServer=<newhostname>. (The ManagementServer attribute specifies what network hostname the node should try to reach the management server by. If the nodes are partitioned into more than one subnet, with multiple network adapters on the management server, then the ManagementServer setting will have to be different for nodes in different subnets.) Then run updatenode -k -a. Note that you may have to update the remote shell access files (for example, the .rhosts file) on the nodes because the management server host name has changed.

    Q: How do I pass in a null value for an attribute in the definenode command?
    A: Just specify the attribute name with nothing after the equal sign. Here is an example of setting the PowerMethod and ConsoleMethod attributes to null:


      definenode -n clsn01 PowerMethod= ConsoleMethod=

    Q: Why does installms prompt for the operating system CDs? Why can't CSM just use RPM prerequisite functionality to get the software installed?
    A: Installms does the following: installs CSM and needed prerequisites on the management server and copies CSM and prerequisite RPMs into /csminstall directories. The actual RPM packages are copied, so they will be available during CSM node installation. In order to copy the prerequisite RPMs, the operating system CDs are needed.

    Q: Can I have more than one management server in a CSM cluster?
    A: Normally, you can only have 1 management server per CSM cluster, but there are 2 exceptions to this rule:
    1. Using the CSM HA MS product (Highly Available Management Server) you can have a primary (active) management server and a backup (passive) management server. When the primary management server fails, the backup will automatically become the new active management server.
    2. Starting with CSM 1.6, you can create hierarchical clusters. You still have one management server per first line cluster, but then you can tie all of those clusters together using an additional, top-level, management server. See the Hierarchical CSM chapter of the CSM Administration Guide for more information.


    Q: Is it possible to add the management server as a node in the cluster?
    A: Yes, this is a supported configuration.

    Q: Does deleting a node from my cluster also uninstall the CSM RPMs from the node?
    A: By default, the CSM software is left on the node. Use the rmnode -u command to delete the node from the cluster and also remove the CSM software from the node.

    Q: Why am I getting errors while running updatenode to a AIX 5.1 ML 3 node?
    A: Look through the updatenode error log, /var/log/csm/updatenode.log, if you see an error like:
    node1.cluster.com: MISSING REQUISITES: The following filesets are required by one or more of the selected filesets listed above. They are not currently installed and could not be found on the installation media.
    node1.cluster.com: rsct.core.rmc 2.2.1.21

    This means that RSCT PTF1 is not installed on your node. Install the RSCT PTF through the normal AIX procedure.

    Q: How can I run csmsetupks without having hardware control or a remote console?
    A: csmsetupks and csmsetupyast attempt to gather MAC addresses or UUIDs for the nodes. For certain types of hardware, this requires hardware control and remote console. CSM uses a number of methods to try to gather the MAC addresses or the UUIDs, including querying the HMC (for System p), querying the management module (for BladeCenter), rebooting the node and using the remote console to read the MAC address off the remote console (for System x), quering the RSA or management module for the UUID, or using dsh to find the MAC address on an already-installed node. If none of these methods work, you should manually set either the InstallAdapterMacaddr or UUID attribute for each node before running csmsetupks or csmsetupyast.

    To set each node's InstallAdapaterMacaddr attribute to the node's MAC address,
    run:


    chnode <node> InstallAdapterMacaddr=00:00:00:00:00:00

    (where <node> is the node's hostname and 00:00:00:00:00:00
    is the MAC address)

    A node's MAC addresses can be found by running ifconfig on the node. If the node is not installed yet, the MAC address can be found by looking at the node's console as it does a network boot and broadcasts its MAC address. The MAC address can also usually be found in the BIOS.



    Q: How can I perform a full install of CSM and the operating system on the nodes without having hardware control?
    A: Hardware control is required to automatically reboot the nodes during a full install. If you do not have hardware control, you should either set the HWControlPoint attribute to blank, or use the --norebootflag on the installnode command. This will prompt you when it is time to reboot the nodes and allow you to do it manually.

    Q: I already use kickstart (or AutoYaST). How can I use my kickstart or AutoYaST configuration with CSM?
    A: You can create a custom kickstart or AutoYaST template file and associate it with the nodes in your cluster:
    1. Locate the default template kickstart or AutoYaST file in directory /opt/csm/install/. The file name is kscfg.tmpl.<distro> or yastcfg.<distro>.xml.

    2. Copy the file and modify the copy with your unique configurations, i.e. particular disk partition tables, generate and encrypted root password, modify the list of packages, components, and package groups to install, and customize the post install script. The template files contain variables that will get replaced with actual values when processed by csmsetupks or csmsetupyast. The variables are in the format #CSMVAR:variable#. If you do not want CSM to fill in these variables, you can replace them with your own hardcoded values. The template files also contain lines like #INCLUDE:filename#. Do not remove these lines, or else the CSM install willnot work properly.
    3. Set each node's InstallTemplate attibute to the full pathname of your copy of the template file.

    4. Run csmsetupks or csmsetupyast and installnode as usual.


    Q: I am having network and DHCP problems that are causing installnode to fail. What could be the problem?
    A: With some ethernet switches (for example, the ones that come in the IBM 1350 Cluster), there may be DHCP problems, caused by the Spanning-Tree protocol setting in the ethernet switch. To work around this problem, you can either turn off Spanning-Tree (usually not a good idea) or enable the portFast setting for the ports (recommended).

    Q: When I run the installnode command, the nodes reboot, but the network install process doesn't start. Why?
    A: If you have another DHCP server running on the same subnet as the CSM management server and nodes, it is probably intercepting the DHCP broadcasts from the nodes. This will cause the network install process not to start. The CSM management server must be the only DHCP server running on the subnet.

    Q: How do I move a node from one cluster to another?
    A: First, use the rmnode command to remove the node from the old management server, then use the definenode command to add the node to the new management server.

    For Linux clusters, if the two management servers are in the same DHCP domain and you have previously done a full install of the node from the first management server, the node must be removed from the first management server's /etc/dhcpd.conf file, or else the two DHCP servers will both respond to the node's DHCP broadcast. Remove the stanza for that node, and then run service dhcpd restart.

    Q: The getadapters command can not successfully get the MAC addresses for some or all of the nodes. What are the possible reasons?
    A:
    1. Two or more DHCP servers, serving the same subnet will cause problems. You can only have one DHCP server on the cluster network. This is because when the node reboots it needs to get it's PXE files from the management server as well as a dynamic IP address.
    2. If the ConsoleSerialDevice is set to an incorrect value, then the MAC address will not be displayed on the terminal. If you feel this is your problem, then you can manually watch it by doing the following:

      cp /tftpboot/pxelinux.cfg/<nodename>.getmacs /tftpboot/pxelinux.cfg/<node's future IP address in HEX>

      rconsole -n <nodename> # this will pull up the rconsole screen and allow you to monitor the node.
      rpower -n <nodename> reboot
      # this will reboot the node.
      Watch to see what happens. After the node comes back up you should see it receive the getmacs ramdisk and kernel. You should see it load up the kernel, try a few ethernet drivers and then display the MAC address.

      If you do not see anything after the ramdisk and kernel are uncompressed, then you have the wrong ConsoleSerialDevice (i.e.ttyS0 or ttyS1) in the pxelinux.cfg/HEX file. You should do a chnode to the node to fix this.

    3. Perhaps the most common problem is that rconsole or rpower is not set up correctly. You can test these by running:

      rpower -n <nodename> query # it should say the status:
      on, off, etc.
      rconsole -n <nodename> # a new window should pop up.


      If one of these does not work, then getadapters will not work either. Also it is important that the attributes are set correctly in the CSM database. If you want to get the mac address of node2, it won't help you to reboot node1 and have the serial console be opened on node3. The HW and Console attributes must correspond to node2.
    4. For x345s and x335s it is important that the drivers for the ethernet adapters be available in the ramdisk that getadapters creates. Look in /opt/csm/install/drivers. There should be a subdirectory for the kernel level you are running, with the ethernet drivers for the x335 (bcm5700) & x345 (e1000) in it. If there isn't, you need to obtain the proper drivers and put them there.
    5. The getadapters command was run during a period of heavy network traffic. This may cause timeouts to occur and getadapters to fail. Run the csmsetupks command again when the network traffic has subsided.


    Q: Why does getadapters -D report incorrect information sometimes?
    A: The getadapters command, when invoked with the -D (Discovery) option for AIX nodes, may occasionally report No adapters found for an LPAR when, in fact, the LPAR does contain an adapter that can reach the given server. In this case, the getadapters log file in the /var/log/csm/getadapters directory will contain a message indicating that the given client IP address already exists on the network. This is a known issue with system firmware, and is being addressed by IBM. This site will be updated when a fix is available. As a workaround, use the getadapters command without the -D option, which will return all the adapters installed in an LPAR.

    Q: How do I update the cluster if I change the host name of the management server?
    A: Please refer section "Changing the host name or IP address of the
    management server" in Chapter 10. "Reconfiguring a cluster"
    in "IBM Cluster Systems Management for AIX and Linux Administration
    Guide".Run this xcat command: "rcatftpd stop", or run "chkconfig
    --level 345 atftp off"

    Q: When I run installms I am getting a message that the device is busy. What do I do?
    A:Check that your current directory is not CD or DVD mount directory (for examples, /media/cdrom /media/dvd, or the directory specified in /etc/fstab.) when running the installms command. Running installms when your current directory is this directory causes a device-busy condition because it interferes with the mounting and unmounting.


    Q: Can I use other Korn shell distributions with CSM?
    A:Yes. Before running installms, please make sure the your Korn shell executable
    is /bin/ksh, or has been linked to /bin/ksh. The installms command will
    avoid to install the ksh package when there is a file /bin/ksh.


    Edited by: safron on {1}

    Edited by: zhaoyang_admin on {1}

    Edited by: sunjing_admin on May 5, 2009 10:22 PM

    Edited by: sunjing_admin on May 5, 2009 10:28 PM

  • SystemAdmin
    SystemAdmin
    476 Posts
    ACCEPTED ANSWER

    CSM Migration, Coexistence and Interoperability

    ‏2009-04-30T09:24:49Z  in response to SystemAdmin
    This is the CSM 1.5 migration FAQ

    FAQ Index

    Q: Are there any coexistence issues between xcat and csm when both are installed?
    A: One issue is that the atftp daemon installed by xcat conflicts with the tftp-hpa tftp daemon that csm uses. This conflict does not appear until you attempt to install a node. A solution is to stop atftpd before using CSM to install nodes.
    Run this xcat command: "rcatftpd stop", or run "chkconfig --level 345 atftp off"

    Q: After you updated the CSM from 1.7.0.x to 1.7.0.10, most of the csm commands do not work with error message "2651-095 CSM license has expired or has not been accepted. Run csmconfig -L if you have installed a new release."
    A: Some packaing changes were made in CSM 1.7.0.10 to fix several issues including one security issue, the CSM 1.7.0.10 was shipped as base filesets instead of bff updates on AIX, so you need to run csmconfig -L to accept the license again after updating from 1.7.0.x to 1.7.0.10.

    Edited by: zhaoyang_admin on {1}

    Edited by: zhaoyang_admin on Jun 3, 2008 4:12 AM

    Edited by: sunjing on Apr 30, 2009 5:23 AM
  • SystemAdmin
    SystemAdmin
    476 Posts
    ACCEPTED ANSWER

    CSM Security FAQ

    ‏2009-04-30T09:37:33Z  in response to SystemAdmin
    This is the CSM security FAQ

    FAQ Index

    Q: Why does the following message appear: "Warning: No xauth data; using fake authentication data for X11 forwarding."? Is this an error?
    A: This message appears when OpenSSH is configured as the CSM remote shell.
    OpenSSH provides a secure shell session between a server and a client, and includes a feature to extend its security to X windows-based applications via X11Forwarding.
    When X11Forwarding is enabled, OpenSSH uses key based X authentication to ensure the connection from the OpenSSH client to the X windows display (specified by the DISPLAY environment variable on the client) has been authenticated. OpenSSH does this by examining the local user's .Xauthority file for an entry that corresponds to the value set in the DISPLAY environment variable. If it cannot find such an entry, then the above warning message is displayed. It is important to note that this message is only a warning, not an error message. OpenSSH will still function normally. It is simply reporting that is was unable to find key based authentication data in the .Xauthority file, and cannot guarantee a authenticated connection between the OpenSSH client and the X windows display. (For additional information, please refer to the OpenSSH documentation)


    To prevent the warning message from appearing, try one of the following:
    1. Ensure that the local user's .Xauthority file includes appropriate X authentication data for the display specified by the DISPLAY environment variable. The authentication data needs to be defined before the OpenSSH client is called. Refer to X Windows documentation and the xauth command syntax for more information.

    2. If an OpenSSH session does not involve the execution of any X windows oriented commands, then either turn of X11Forwarding in the OpenSSH client configuration file, or execute ssh with the -x option. The -x flag can also be specified to OpenSSH through the -o flag on the dsh command. This will cause OpenSSH to disable X11Forwarding and bypass examination of the local user's .Xauthority file for an X authentication key, eliminating the message. Refer to OpenSSH documentation for more information on configuration and command syntax.


    Q: What if updatenode displays errors because it can't dsh to the node (when trying to do a CSM-only install of the node)?
    A: The updatenode command uses dsh to run its operations on the nodes. CSM supports using dsh over rsh, openssh, or another remote shell command. Dsh uses the remote shell stored by (and changed with) the csmconfig command. This value can be temporarily changed by setting the DSH_REMOTE_CMD environment variable to the path of the desired remote shell. To view the remote shell currently used by dsh, run csmconfig. To change to another shell, run:
    csmconfig RemoteShell="path to remote shell"

    Updatenode can perform setup of some remote shells (if the SetupRemoteShell attribute stored by csmconfig is set to 1). Updatenode can setup rsh and OpenSSH authentication to AIX nodes (the OpenSSH packages have to be installed on the machines for its setup to be automated). Updatenode can also setup OpenSSH authentication to Linux nodes. Normally, setup of remote authentication is done during the first updatenode to the node. However, if you wish to have it redone later, run updatenode -k node_name.

    If updatenode is unable to setup remote authentication and/or you wish to set it up manually, here are the procedures (Please note: $HOME is the root users home irectory):

    RSH

    AIX nodes
    Add the management server's hostname and "root" to the $HOME/.rhosts file. Here is an example:

    echo ms_hostname root >> $HOME/.rhosts

    RedHat Linux Nodes
    1. Ensure that the rsh-server RPM is installed on the node.
    2. Turn on the rsh daemon by running chkconfig rsh on. To check to make sure that the daemon is running, run chkconfig --list rsh.
    3. Add the management server's hostname followed by the word root, to a new line in the $HOME/.rhosts file (you may have to create the $HOME/.rhosts file - it should be owned by user and group root and have permissions of 644).

    4. Append the line rsh to /etc/securetty (this allows root to have rsh access to the node).
    SuSE or SuSE SLES Linux nodes
    1. Uncomment all the lines that start with # shell in the /etc/inetd.conf file.

    2. Run /etc/rc.d/inetd restart.
    3. Add the management server’s host name and root to a new line in the $HOME/.rhosts file.

    OpenSSH(All node types)

    Complete the following tasks on the management
    server:

    1. Ensure the openssh and openssh-clients packages are installed.
    2. Run the ssh-keygen command to generate a public/private key pair. Save the key in the default location and, when prompted for a passphrase, press the Enter key (both times). It is important that the private key not be encrypted with a passphrase so that dsh can access the nodes without a prompt.

    3. Edit the ssh-config file so that it contains the line Protocol 1,2. This instructs ssh to try to connect with ssh protocol 1, corresponding to the default key type generated by the ssh-keygen command.

    Complete the following tasks on each node:


    1. Ensure the openssh and openssh-server packages are installed. If you just installed the
      packages, check to see if sshd is up and running. If it is not, you may have to reboot
      (or run the correct init file) in order to start the daemon.
    2. If you generated an rsa1 public/private key pair on the management server (no command
      line options on the ssh_keygen command), append the public key stored in
      identity.pub (in the management server’s $HOME/.ssh directory) to each node’s $HOME/.ssh/authorized_keys file (you may have to create this file on each node).

    Perform this last step on the management server:

    1. Run an ssh command to each node’s full hostname (or the hostname that the lsnode command reports) and accept the key fingerprint for each node.
    2. At this point you should be able to run a dsh command to all the nodes. To verify that this works, run dsh -a /bin/hostname. You should receive a list of all the nodes in the cluster with their respective hostnames. (If dsh appears to hang, it is probably being prompted by one of the nodes. Cancel out of the dsh command and run the command again with streaming enabled, i.e. dsh -s -a hostname, you will then see output from
      the hosts that were not configured correctly. Correct those configurations and reverify that dsh is working to all nodes.

    3. Finally, rerun updatenode.


    Q: Why is updatenode just hanging?
    A: Updatenode may hang if dsh is not configured correctly. To verify that dsh can connect to all of the nodes, run dsh to all the nodes with streaming output enabled, for example dsh -a -s hostname. If you see any lines besides the hostname output from each node, then you may have a problem connecting to the nodes with dsh. See the CSM Installation and Planning Guide for more information on configuring dsh.

    If you suspect that dsh is not the problem, run updatenode in verbose mode for more diagnostic information.



    Edited by: zhaoyang_admin on {1}

    Edited by: sunjing on Apr 30, 2009 5:31 AM
  • SystemAdmin
    SystemAdmin
    476 Posts
    ACCEPTED ANSWER

    CSM Hardware &#38; Remote Control FAQ

    ‏2009-04-30T09:51:35Z  in response to SystemAdmin
    This is the CSM Hardware & Remote Control FAQ

    FAQ Index

    Q: Why is the message 'INIT: Id "s1" respawning too fast : disabled for 5 minutes' written to the node's console over and over?
    A: The ConsoleSerialDevice attribute in the CSM database for this node contains a value that does not correspond to a valid serial port on the node. This can occur if the node has no serial port, or if the serial port has been disabled in the node's BIOS settings. This can also occur if serial port maps to a tty device other than the one defined in the ConsoleSerialDevice attribute. Verify the tty device defined maps to one of the node's serial ports. One way to do this is to scan the node's log file in /var/log/messages for the tty devices loaded at the most recent system boot. If no tty devices were loaded, reboot the node into it's BIOS Setup and Configuration menus, and verify the serial port has been enabled. Once the correct tty device has been identified, use the chnode command to update the ConsoleSerialDevice attribute. Then edit the file /etc/inittab and change the line labeled "s1" to reflect the correct tty device. Once /etc/inittab has been edited, run the command "telinit q" to refresh the init process. NOTE: The updatenode command cannot be used in this scenario because it would only add a new entry for the updated device; it would not remove the incorrect entry.

    Q: What servers and BIOS levels are supported by the remote BIOS configuration utility rfwcfg?
    A: For CSM 1.7.0, rfwcfg supports the following servers and BIOS levles:
    Server                     BIOS Level
    ---------                     ------------------
    BMC                        all
    RSA                         all
    e326m                     1.29A, 1.31A, 1.34A
    x335                        1.09, 1.11, 1.12, 1.13, 1.14, 1.15
    x336                        1.04, 1.09, 1.10, 1.13, 1.14
    x345                        1.15, 1.16, 1.17, 1.18, 1.19, 1.21
    x346                        1.08, 1.09, 1.10, 1.11, 1.12, 1.13, 1.14, 1.16
    x366                        1.08, 1.09, 1.11
    x440                        1.11, 1.12
    x445                        1.12, 1.15, 1.17, 1.19
    x3455                      1.27, 1.28, 1.32, 1.35
    x3550                      1.00, 1.01, 1.03, 1.04, 1.05, 1.06
    x3650                      1.00, 1.01, 1.03, 1.04, 1.05, 1.06
    x3655                      1.00, 1.01, 1.03
    x3755                      1.01, 1.06, 1.09
    HS20-7981              1.01
    HS20-8678              1.07, 1.08, 1.09
    HS20-8832              1.04, 1.05, 1.06, 1.07, 1.09, 1.10, 1.11, 1.12
    HS20-8843              1.01, 1.03, 1.04, 1.07, 1.08, 1.09, 1.10
    HS21-8853              1.00, 1.04, 1.06
    HS21XM-7995         1.02, 1.03
    LS20-8850              1.05, 1.22, 1.25, 1.26
    LS21/LS41              1.00, 1.01, 1.03, 1.04

    Q: Why does rfwscan show the same service release level for "Active" and "Accepted" for a POWER5 server after running the rfwflash command?
    A: This can occur when the microcode on a POWER5 system is flashed to a lower level than it was previously running. Normally, when rfwflash is used to install a newer microcode level, it will write the new code to the temporary side of the flash only, and leave the permanent side untouched. However, when flashing a lower microcode level, the lower level will be written to both the temporary and permanent sides of the flash.

    Q: Why can't I see all the console output on the remote console?
    A: This depends on the operating system that is being installed, and the console server that is being used. All console output appears on the local video, but some of this output does not appear on the remote console. For example, the items that appear on the remote console during a Red Hat Linux install are:
    • PXELinux output
    • Kickstart installation progress (blue screens)
    • Login prompt (after the node is installed)

    During a Linux install or reboot, the following items do not appear:
    • Kernel messages

    • GRUB output
    • Virtual terminals (alt-F2, alt-F3, etc).

    These items must be viewed from the local console.

    Q: If an updated BIOS becomes available, is a CSM update needed for rfwcfg to support it?
    A: If the update is for a server listed in the above table, a CSM update is not required. The ASU patch file for the new BIOS may be downloaded and applied by following the instructions in the CSM Administration Guide. Refer to the "Remote hardware inventory and maintenance" chapter for more information.
    If the server is not listed in the table, a CSM update is required.

    Q: How can I enable hardware flow control on my remote console login sessions?
    A: Hardware flow control uses the "Request To Send" and "Clear To Send" (RTS/CTS) signals
    of the RS-232 serial interface to pace the flow of data over the line. It is generally
    considered more reliable than Software flow control, which uses the XON and XOFF
    characters to control the flow of data.


    In CSM for Linux on System x, the remote console login session is controlled by the
    agetty program, which is running on each node. When a node is installed with Red Hat
    Linux, CSM will generate a line in /etc/inittab that invokes agetty with the -h option,
    causing it to use Hardware Flow Control. When a node is installed with
    SLES, the Operating System adds the agetty line to /etc/inittab during the full install.
    CSM does not add another line, as that would result in conflicting agetty sessions and
    would prevent login. For these nodes, simply run CSM's updatenode command following the
    full install for the SLES nodes in question. The updatenode command will
    remove the Operating System's agetty line from /etc/inittab, and replace it with CSM's
    agetty line, containing the -h option. Once the updatenode command is complete, use dsh

    to run the telinit q command on the affected nodes to activate the change, or else reboot
    the nodes.

    Q: When I open an rconsole window, I sometimes see a command prompt instead of a login prompt. Why does this occur? Is there any way to change this behavior?
    A: This behavior can occur when CSM accesses the node's console through a console server device. The default behavior of some console server devices is to keep the RS-232 Data Terminal Ready (DTR) signal high at all times. This behavior could result in a console session to a node remaining active even if the rconsole window it was opened with is closed. To change this behavior so that the console session to a node will always be terminated when the rconsole window is closed, the configuration of the console server's serial port used to connect to the node must be changed. Refer to the configuration documentation for the console server device for instructions on how to configure the serial port to drop the Dtr signal when the session to the serial port is closed.

    Q: Why does my xSeries node error LED come on and the event log continuously fill up?
    A: On System x nodes that do not support events, CSM Hardware Control periodically polls for the current power status. The default polling is on a 300 second interval. Each poll represents an additional entry in the node's event log as remote logons are recorded in the log. To prevent the event log filling up too quickly, you can either increase your Hardware Control polling interval, which will cause the event log to fill more slowly; or you can start a cron job to periodically clear the event log before it fills using "reventlog -n clear". The latter, however, will completely clear the event log, so more significant log entries may be lost as well.

    Q: Why can't I specify node names as positional arguments to the chrconsolecfg command?
    A: The chrconsolecfg command does not accept node names as positional arguments. You can use the "-n" flag to provide a comma-separated list of node names or node ranges, the "-N" flag to provide a comma-separated list of node groups, or the "-a" flag to specify all the managed nodes in the cluster.

    Q: Why does rpower on, off, and query returns success even after the target xSeries node has been removed from the RSA daisy-chain?
    A: This situation should not occur; but in the event that it does, restart the RSA. This will cause the RSA to resynchronize it's internal cache of nodes on the daisy-chain.

    Q: Where do I find information on how to set the node attributes such as HWControlPoint
    and ConsoleServer?

    A: This information is found in the CSM for Linux Planning and Installation Guide, and CSM for AIX Planning and Installation Guide. If you have already installed CSM on the management server, the documentation is located in /opt/csm/doc in both PDF and HTML formats. If you have not yet installed CSM, but have the CD, the documentation is in the /doc subdirectory of the CD.


    You can also access the documentation from here.

    Q: How can I check the hardware control settings for all my nodes?
    A: Use the lsnode command. For example:

    lsnode -a HWControlPoint,HWControlNodeId,PowerMethod



    Refer to the CSM Command and Technical Reference for more information and details.

    Q: How can I check the console server settings for all my nodes?
    A: Use the lsnode command. For example: lsnode -a
    ConsoleServerName,ConsoleMethod,ConsolePortNum,ConsoleSerialDevice,ConsoleSerialSpeed



    Refer to the CSM Command and Technical Reference for more information and details.

    Q: For a few of my nodes, rpower does not work correctly. Why?
    A: Verify that the HWControlNodeId attribute for this node in the CSM database
    matches the HWControlNodeId on the Hardware Control Point. Running lshwinfo
    for the Hardware Control Point
    of this node will give you a listing of the HWControlNodeId's.

    Q: Why do I get an error message about "invalid service processor hostname" when running rpower?
    A: First verify that the HWControlNodeId
    attribute for this node in the CSM database matches the
    text ID of the node. If this is correct, then the
    most likely explanation for the problem is that the
    password for the RSA or node was specified to CSM
    incorrectly. (This is specified via the systemid
    command.) If CSM passes an incorrect password to the RSA,
    the RSA shuts down the port that CSM uses to communicate
    with it. The port can be reopened by resetting the RSA
    via the telnet or web interface.

    Q: What versions of BIOS and service processor firmware do I need for my xSeries machines, and where can I get them?
    A: Go here to obtain the latest information regarding versions of BIOS and Service Processor firmware

    Q: How could I update Baseboard Management Controller (BMC) firmware on target nodes running Linux Operating System versions lower than Red Hat 4 QU3 or SLES 9 SP3?
    A: The rfwflash command cannot be used to flash the Baseboard Management Controller (BMC) firmware on target nodes running Linux Operating System versions lower than Red Hat 4 QU3 or SLES 9 SP3.
    Customers that require BMC updates on older versions of the Operating System will need to manually install the OSA IPMI Driver and the IBM IPMI Mapping Layer for Linux packages on the target node before flashing the BMC. Note that the Mapping Layer package for Linux will be a source rpm.
    These packages are available for download from the IBM Support Web Site at http://www.ibm.com/support. To install the driver and mapping layer software, follow these steps:
    1. Ensure that the packages for rpm, gcc, glibc, and kernel-source appropriate for the Operating System on the target node are installed.
    2. Remove any existing installations of the OSA IPMI Driver and IBM IPMI Mapping Layer software.
    3. Copy the new OSA Driver and IBM IPMI Mapping Layer packages to a temporary directory
    4. Use the rpm -i command to build and install the OSA IPMI drivers. If the build or installation fail, refer to the build.log file in the /usr/osa/<OSA Driver package name> directory for more information. Once the errors detailed in the build.log file have been corrected, run the build_osadrv script located in the same directory to try the build and install steps again.
    5. Run the ipmi_load script.
    6. Build the IBM IPMI Mapping Layer installation package by running the command rpmbuild --rebuild <IBM IPMI Mapping Layer Source RPM name>. Note the name of the output Mapping Layer rpm written by the rpmbuild command.
    7. Install the IBM IPMI Mapping Layer rpm.


    Q: I am using "fsp" as the power method and console method to control the Virtual I/O Server (VIOS) partition of a System p server using the Integrated Virtualization Manager to create and manage logical partitions on the server. An rcosnole session to the VIOS partition does not display any output even though the VIOS itself is running. How do I enable the console output?
    A:1. Run the command "rpower -n <node_name> fsp_vty" to switch the console output from the physical serial port to the virtual port. Be sure all LPARs are shut down before executing this command, since it will force the System p server to be recycled.
    2. Ensure the VIOS console is configured to use a virtual tty device. Once the system is back up, telnet into the VIOS partition. Run the "oem_setup_env" command to get to the administrator command prompt, then run "lscons" to see which device the console is set to. If the result is "NULL", or some device other than "/dev/vtyn", assign the console to an available vty device using either the "chcons" command or SMIT. Reboot the VIOS partition to activate the change.



    Edited by: zhaoyang_admin on {1}

    Edited by: sunjing on Apr 30, 2009 5:50 AM
  • SystemAdmin
    SystemAdmin
    476 Posts
    ACCEPTED ANSWER

    CSM Monitoring FAQ

    ‏2009-04-30T10:04:49Z  in response to SystemAdmin
    This is the CSM Monitoring FAQ

    FAQ Index


    Q: My responses to conditions don't seem to be running when the condition occurs. How do I determine the problem?
    A: First run lscondresp to verify that the state of the condition/response association is "Active". If it is not active, run startcondrespcondition response to make it active. If the response is still not being run when the condition occurs, run lsaudrec
    to see if ERRM is recognizing that the event occurred and to view any errors returned by the response script.

    Q: I would like to monitor some events on the management server itself, but CSM only pre-shipped the condition with MgtScope="m" (for example, condition "AnyNodeAnyLoggedError" for the error log and syslog monitoring), how could I enable the monitoring for the management server itself?
    A: The following two solutions can be used in this scenario:
    1. Make a copy of the CSM pre-shipped condition and change the MgtScope="l", using command "mkcondition -c CSM_Shipped_Condition -m l New_Condition_Name", and then associate any response with the new condition.
    2. Define the CSM management server as a managed node of itself, run updatenode command to make the management server itself as a node in "Managed" mode, then the monitoring for the management server itself can work.

    Q: To monitor some events on the management server itself, I have defined the management server as a managed node of itself, how could I exclude the management server itself when running CSM commands such as hardware control commands to all the managed nodes?
    A: To prevent the management server being affected by the hardware control commands, you can unset the hardware control related attributes HWControlPoint,HWControlNodeId,PowerMethod for the management server; then CSM hardware control commands will skip the management server. Or, you can create a node group that contains all the nodes except the MS itself, then use this nodegrp as the target machines in the future. Besides, you can use the noderange to exclude the MS itself, for example: "rpower -n '+AllNode,-ms_host_name' off".

    Q: How can I monitor the reachability of node?
    A: There is a condition NodeReachability shipped with CSM. NodeReachability
    is designed to only detect the condition when an existing online node goes
    offline. If you would like a condition that is triggered whenever a node
    goes online or offline, make one with an EventExpression of "Status!=1"
    and a rearm expression of "Status=1". Example: mkcondition -r
    IBM.ManagedNode -e "Status!=1" -E "Status=1" NodeStatus

    Edited by: zhaoyang_admin on {1}

    Edited by: sunjing on Apr 30, 2009 5:54 AM
  • SystemAdmin
    SystemAdmin
    476 Posts
    ACCEPTED ANSWER

    CSM Node Commands FAQ

    ‏2009-04-30T10:20:17Z  in response to SystemAdmin
    This is the CSM Node Commands FAQ

    FAQ Index


    Q: The file name contains special character such as "(", could this file be passed to CSM commands through -f or --file flag?
    A: No, the CSM commands do not support the special characters in the file name which is passed to the CSM commands through -f or --file flag. In this case, you should rename the file to some new file name that does not contain special characters and then rerun the command with the new file name

    Q: Why am I having trouble removing nodes from dynamic node groups or node groups that contain sub-groups?
    A: In these two cases, the complete member list of the node group is not what is stored as the
    definition of the group.  In the case of a dynamic group, the select string is stored.  
    In the case of a group that contains sub-groups, the sub-group names are stored.  
    It is not until the containing group is queried, that the definition of the 
    group is evaluated to get the complete member list.  
    The groups are designed this way so that the group membership will change dynamically as the
    things it is based on change.  Thus, even if you run nodegrp <groupname> and
    see a particular node listed that you want to remove from it, the command

    nodegrp -x <nodename> <groupname> will not work if that node is not specifically listed in the definition of the group.  To see the definition of a node group (which may be different from its current member list), run nodegrp -L <groupname>.

    Q: How do I make a copy of a node group so that I can modify it slightly?
    A: Run:  nodegrp -n `nodegrp -d, <fromgroup>` <togroup>
    Note:  those are backtics around the inner nodegrp command.

    Q: The lsnode and nodegrp commands are giving me errors. How do I figure out what is wrong?
    A: Make sure RMC is working properly using the procedure outlined in the RMC FAQ. After that, use the -v (verbose) option of the command to see the RMC command it is trying to run.

    Q: How do I save my node definitions?
    A: Run lsnode -F and redirect the output to a file. This will create a nodedef file with the current node definitions.

    Q: How do I restore my saved node definitions?
    A: Run definenode -f saved_nodedef_file.
    Please note, this will only define nodes that do not already exist in the cluster, so if you want to use the nodedef file to restore existing nodes, you will have to remove them from the cluster first.

    Q: How do I save off all of my CSM data?
    A: You can use the csmbackup command to save all of your CSM data to a directory on your system. This directory can later be given to the csmrestore command, which will restore the data on the current management server.
    This backup and restore functionality is very useful during CSM and OS upgrades of your system. See the CSM Library website and locate the latest version of the CSM Planning and Installation Guide for more information.

    Q: How do I make a node group that contains other node groups?
    A: Run nodegrp -a as normal, but prefix a + sign in front of any node group names. For example,
    to make a node group DevNodes that contains node groups Rack1 and Rack2, run nodegrp -a +Rack1,+Rack2 DevNodes.


    Edited by: sunjing on Apr 30, 2009 6:11 AM
  • SystemAdmin
    SystemAdmin
    476 Posts
    ACCEPTED ANSWER

    CSM RMC FAQ

    ‏2009-04-30T10:24:25Z  in response to SystemAdmin
    This is the CSM RMC FAQ

    FAQ Index

    Q: RMC doesn't seem to be working properly. What can I do?
    A: First check to see if the RMC daemon (ctrmc) and the resource managers are running by running lssrc -a. At least ctrmc, IBM.ERRM, and IBM.DMSRM should be running. (Some of the other resource managers are not started until they are needed.) If you need to restart RMC, run /usr/sbin/rsct/bin/rmcctrl -z and then /usr/sbin/rsct/bin/rmcctrl -s. After you are sure that RMC is running, try lsrsrc to see if the RMC daemon is responding and what resource classes it knows about. Some of the required classes are: IBM.ManagedNode, IBM.NodeGroup, IBM.NodeHwCtrl, IBM.HwCtrlPoint, IBM.Condition, IBM.EventResponse, and IBM.Association. The contents of any of these classes can be listed using lsrsrc <classname>.

    Q: Why is the node status reported as 0 (down) on some nodes even though the nodes can be reached via ping and the RMC subsystem on the nodes are active?
    A: On large clusters the TCP/IP Buffer size on the management server needs to be increased to overcome this problem. On Linux, the tunable's can be increased, as follows, and then RMC should be recycled (see below):
    echo 262142 > /proc/sys/net/core/rmem_max
    echo 262142 > /proc/sys/net/core/rmem_default

    For AIX, the tunable can be set via the no command. This again needs to be followed by a recycle of RMC:
    no -o udp_recvspace=262142


    The effects of these commands will not survive a reboot, so it is necessary to place these commands in a script which will always be executed at start up time. On Linux (RedHat) the best way to make these buffer changes early in the boot sequence is to add them to the /etc/sysctl.conf file:
    # CSM support. Increase socket buffer size to allow large incoming messages
    net.core.rmem_max = 262142
    net.core.rmem_default = 262142

    The associated sysctl command gets run by the rc.sysinit script before any other startup scripts are run.

    On AIX, put the no command in a script that gets called out of /etc/inittab, for example /etc/rc.local.

    To recycle RMC on both Linux and AIX run:

    /usr/sbin/rsct/bin/rmcctrl -z
    /usr/sbin/rsct/bin/rmcctrl -s


    Edited by: zhaoyang_admin on {1}

    Edited by: sunjing on Apr 30, 2009 6:23 AM
  • SystemAdmin
    SystemAdmin
    476 Posts
    ACCEPTED ANSWER

    CSM Configuration File Manager FAQ

    ‏2009-04-30T10:39:18Z  in response to SystemAdmin
    This is the CSM Configuration File Manager FAQ

    FAQ Index

    Q: With AIX management server, does CFM function work for Linux MinManaged node?
    A: CFM function may fail from AIX management server to Linux MinManaged nodes because CSM uses rdist on AIX management sever to distribute files by default. However, Linux MinManaged nodes may have no rdist installed. There are two solutions to solve this problem:
    1) Still use rdist to distribute files.
    You will need to manually install rdist on the Linux MinManaged nodes.
    2) Change CSM to use rsync to distirbute files.
    a. Ensure rsync installed on the MinManaged node. It's installed by default.
    b. Install rsync on the AIX management server.
    c. Run command "csmconfig SyncCmd=/usr/bin/rsync" before running cfmupdatenode. Q: Where do I put configuration files that I want distributed to all the nodes?
    A: Put them under /cfmroot on the management server and give the file the same name that it should have on the nodes. For example, if you are trying to distribute the /etc/services file, place a copy of it in /cfmroot/etc/services.

    Q: How do I mark files that should only be distributed to specific nodes?
    A: Use ._nodegroup notation to mark files that should only be distributed to nodes in certain node groups. For example, if you only want /etc/passwd distributed to nodes in the RedHat73Nodes node group, name the file /cfmroot/etc/passwd._RedHat73Nodes. You can also use ._hostname notation to mark files that should only be distributed to a specific machine.
    For example, if you want /etc/profile to only be distributed to node1.clusters.com name the file /cfmroot/etc/profile._node1.clusters.com.

    Q: How do I distribute the configuration files in /cfmroot to all the nodes?
    A: Run updatenode -a on the management server.

    Q: Can I put symbolic links in /cfmroot to my real configuration files?
    A: Yes, if you put a symbolic link in /cfmroot to a file somewhere else on the management server, CFM will copy to the node the contents of the file that the symbolic link points to.

    Q: What if I want the symolic link itself distributed to the nodes by CFM?
    A: Symbolic links should be created in <filename>.post files. These files are actually scripts that CFM will run after transferring <filename> to the node. The following scenario is an example:

    The administrator would like to distribute:
    /etc/rc.d/rc
    /etc/rc -> /etc/rc.d/rc (a symbolic link to /etc/rc.d/rc)

    The administrator should create the /cfmroot/etc/rc.d/rc file. Then create the file /cfmroot/etc/rc.d/rc.post which will be run every time /cfmroot/etc/rc.d/rc is copied to a node. The rc.post file should have contents similar to:
    /etc/rc -> /etc/rc.d/rc (a symbolic link to /etc/rc.d/rc)

    #!/bin/sh

    if [ -e /etc/rc ]
    then
    /bin/rm -f /etc/rc
    fi
    ln -s /etc/rc.d/rc /etc/rc

    These <filename>.post and <filename>.pre) files can be written in any scripting language available on the nodes and they will only be run when the <filename> file is updated. Therefore the rc.post script above would only get run on a node if someone changed and distributed the /cfmroot/etc/rc.d/rc file.


    Q: How many files can I transfer with CFM?
    A: There is no limitation to the number of files you can transfer with CFM. However, transfering large numbers of files or large file sizes can have an effect on your network traffic.

    Edited by: sunjing on Apr 30, 2009 6:30 AM
  • SystemAdmin
    SystemAdmin
    476 Posts
    ACCEPTED ANSWER

    CSM Software Maintenance FAQ

    ‏2009-04-30T10:51:45Z  in response to SystemAdmin
    This is the CSM Software Maintenance FAQ

    FAQ Index

    Q: Where can I find the Autoupdate RPM?
    A: http://www.mat.univie.ac.at/~gerald/ftp/autoupdate/

    Q: Once I have downloaded the Autoupdate RPM, where do I put it?
    A: Autoupdate needs to be installed on the nodes with CSM, in order for SMS to function. Therefore you should place the Autoupdate RPM in the directory where the CSM RPMs are stored:

    /csminstall/Linux/<InstallDistributionName>   \

    /csm/<InstallCSMVersion>/packages/.


    In the above path, <InstallDistributionName> and <InstallCSMVersion> match the node attributes for the nodes on which you wish to install Autoupdate. Once you have placed Autoupdate in the directory, run updatenode to install it on the nodes.

    Q: I keep getting these warnings from my nodes when I use software maintenance:
    ": warning: : V3 DSA signature: NOKEY, key ID ..."
    but the RPM installs fine. What can I do to stop these warnings?

    A: If you get this error on RedHat nodes, it is most likely the result of a known bug in the perl-RPM2 package. You can stop this warning message by running:
    dsh -n <node_list> "rpm --import /usr/share/rhn/RPM-GPG-KEY"

    Q: Can I use SMS to download software updates?
    A: You can use the open source package Autoupdate, which is used by SMS on the nodes, to download RPM updates and store them in /csminstall. To do this install the Autoupdate RPM on the management server. Then use the autodld command to download RPM updates into the /csminstall/.../updates directory, so that they can be automatically upgraded by SMS the next time it is run.

    A nice way to use this feature is to write a cron job that runs autodld, and then runs smsupdatenode to all the nodes.

    Here is an example of the autodld command that downloads all RedHat 7.3 updates from the RedHat site:

    /usr/sbin/autodld --noinstall --noupdate --nodldaddinstall
      --updatedir /csminstall/Linux/RedHat/7.3/i386/updates
      --rpmdir /csminstall/Linux/RedHat/7.3/i386/RedHat/RPMS
      --url http://ftp.redhat.com/pub/redhat/linux/updates/7.3/en/os/i386


    Note: this currently works best on a Linux management server, you may have to install additional perl packages, and add "--arch i386" if you want to do use an AIX management server to download updates for your linux nodes.

    For more information on the autodld command, run "man autoupdate", or visit the Autoupdate web page.

    Q: How do I update RPMs on my Linux nodes?
    A: Put updates to RPMs under the /csminstall/.../updates directory on the management server that corresponds to the nodes attri
    butes:


    /csminstall/<InstallOSName>/<InstallDistributionName>/  \
    <InstallDistributionVersion>/<InstallPkgArchitecture>
    /updates


    For example:

    /csminstall/Linux/RedHat/7.3/i386/updates/example-1.1-0.noarch.rpm
    Then run updatenode -a on the management server.

    Q: Updatenode seems to only update RPMs that already exist on the nodes. How do I install new RPMs on my Linu x nodes?
    A: Place the desired RPMs under the /csminstall/.../install directory on the management server that corresponds to the nodes at
    tributes:


    /csminstall/<InstallOSName>/<InstallDistributionName>/  \

    <InstallDistributionVersion>/<InstallPkgArchitecture>
    /install



    For example:
    /csminstall/Linux/RedHat/7.3/i386/install/newpkg-1.1-0.noarch.rpm
    Then run smsupdatenode -a -i newpkg-1.1-0.noarch.rpm on the management server. This will install the RPM and any needed prerequisites (gathered from /csminstall/.../RPMS) on all of the nodes.

    Q: How do I upgrade or install RPMs on specific nodes (instead of all the nodes that match a certain operati ng system and architecture)?
    A: To only install or upgrade RPMs on certain machines, you must make node group subdirectories in
    /csminstall/.../updates/.
    The node group subdirectories should only consist of the name of the node group, for example:

    /csminstall/Linux/RedHat/7.2/i386/updates/KickstartNodes/
    Any RPMs placed in a node group subdirectory will only be installed or updated on nodes in that node group.

    Q: How do I update the kernel on all of my nodes?
    A: <td colspan="2">Updating the kernel is similar to updating an RPM on all of your nodes. However, you may need to enable kernel updates in th
    e Autoupdate configuration file. See the CSM Administration Guide for exact details on how to upgrade the kernel on your nodes.

    Also note that on some machine models, you will need to install special device drivers along with the kernel. These drivers should be put
    into RPMs and also placed in the
    /csminstall/.../updates directory. Drivers for IBM machines can be downloaded from
    here.

    Q: How do I install a custom kernel on all of my nodes?
    A: <td colspan="2">In order to install a custom kernel on all of your nodes, you must create an RPM that contains the custom bzimage and any ne
    eded modules. Also, please ensure your custom kernel has a higher version number than the one already installed on your nodes. The steps req
    uired to build a kernel RPM are usually distribution specific. See the CSM Administration Guide and your Linux distribution's documentation
    for more details.

    Once a custom kernel RPM has been made, it can be installed on the machine
    according to the normal kernel upgrade procedure.

    Also note that on some machine models, you will need to install special device drivers along with the kernel. These drivers can be put into the RPM you create for the kernel. Drivers for IBM machines can be downloaded from here.

    Q: I upgraded to Perl 5.6.1 (from Perl 5.6.0) on my nodes and now I'm getting errors when I run updatenode or smsupdatenode. What should I do?
    A: If you are getting error like:


    node1.cluster.com: Can't locate DB_File.pm in @INC (@INC contains:
    /usr/lib/perl5/5.6.1/i386-linux ... .) at /usr/sbin/autoupd line 15.


    You will need to install perl-DB_File-*.rpm and db3-*.rpm on your nodes to provide
    DB_File.pm. Unfortunately, you will not be able to use smsupdatenode to do this install, since it needs that DB_File.pm to run.

    Edited by: sunjing on Apr 30, 2009 6:48 AM