ego.cluster reference

The ego.cluster file is the cluster configuration file. There is one for each cluster, called ego.cluster.cluster_name. The cluster_name suffix is the name of the cluster that is defined in the Cluster section of ego.shared. All IBM® Spectrum Conductor hosts are listed in this file, along with the list of IBM Spectrum Conductor administrators and the installed IBM Spectrum Conductor features.

The ego.cluster.cluster_name file contains configuration information that affects all IBM Spectrum Conductor Spark instance groups. It defines cluster administrators, hosts that make up the cluster, attributes of each individual host such as host type or host model, and resources using the names defined in ego.shared.

Changing ego.cluster configuration

After making any changes to ego.cluster.cluster_name, run the following command to restart the primary host:

egosh ego restart primary_host

File location

This file is typically installed in the directory defined by EGO_CONFDIR.

Structure

The ego.cluster.cluster_name file contains the following configuration sections:
  • Parameters section
  • ClusterAdmins section
  • Host section
  • ResourceMap section

Parameters section

Parameters

  • ELIM_POLL_INTERVAL
  • EGO_HOST_ADDR_RANGE
  • EXINTERVAL
  • HOST_INACTIVITY_LIMIT
  • MASTER_INACTIVITY_LIMIT

ELIM_POLL_INTERVAL

Description: Time interval, in seconds, that the LIM samples external load index information. If your elim executable is programmed to report values more frequently than every 5 seconds, set the ELIM_POLL_INTERVAL so that it samples information at a corresponding rate.

Syntax: ELIM_POLL_INTERVAL=seconds

Valid values: 1 to 5

Default value: 5 seconds

EGO_HOST_ADDR_RANGE

Description:
  • Identifies the range of IP addresses that are allowed to be IBM Spectrum Conductor hosts, which can be dynamically added to or removed from the cluster.
    CAUTION:
    To enable dynamically added hosts after installation, you must define EGO_HOST_ADDR_RANGE in ego.cluster.cluster_name, and EGO_DYNAMIC_HOST_WAIT_TIME in ego.conf. If you enable dynamic hosts during installation, you must define an IP address range after installation to enable security.
  • If a value is defined, security for dynamically adding and removing hosts is enabled, and only hosts with IP addresses within the specified range can be added to or removed from a cluster dynamically.
  • Specify an IP address or range of addresses, using either a dotted quad notation (IPv4) or IP Next Generation (IPv6) format. IBM Spectrum Conductor supports both formats; you do not have to map IPv4 addresses to an IPv6 format. Multiple ranges can be defined, separated by spaces.
    Note: To use IPv6 addresses, you must define the parameter EGO_ENABLE_SUPPORT_IPV6 in ego.conf.
  • If there is an error in the configuration of EGO_HOST_ADDR_RANGE (for example, an address range is not in the correct format), no host will be allowed to join the cluster dynamically and an error message will be logged in the LIM log. Address ranges are validated at startup, reconfiguration, or restart, so they must conform to the required format.
  • If a requesting host belongs to an IP address that falls in the specified range, the host will be accepted to become a dynamic IBM Spectrum Conductor host.
  • IP addresses are separated by spaces, and considered "OR" alternatives.
  • If you define the parameter EGO_HOST_ADDR_RANGE with:
    • No range specified, all IPv4 and IPv6 clients are allowed.
    • Only an IPv4 range specified, only IPv4 clients within the range are allowed.
    • Only an IPv6 range specified, only IPv6 clients within the range are allowed.
    • Both an IPv6 and IPv4 range specified, IPv6 and IPv4 clients within the ranges are allowed.
  • The asterisk (*) character indicates any value is allowed.
  • The hyphen (-) indicates an explicit range of values. For example 1-4 indicates 1,2,3,4 are allowed.
  • Open ranges such as *-30, or 10-*, are allowed.
  • For IPv6 addresses, the double colon symbol (::) indicates multiple groups of 16-bits of zeros. You can also use (::) to compress leading and trailing zeros in an address filter, as shown in the following example: EGO_HOST_ADDR_RANGE=1080::8:800:20fc:*
  • This definition allows hosts with addresses 1080:0:0:0:8:800:20fc:* (three leading zeros).
  • You cannot use the double colon (::) more than once within an IP address. You cannot use a zero before or after (::). For example, 1080:0::8:800:20fc:* is not a valid address.
  • If a range is specified with fewer fields than an IP address such as 10.161, it is considered as 10.161.*.*.
  • This parameter is limited to 2048 characters.
Syntax: EGO_HOST_ADDR_RANGE=IP_address
Note: After you configure EGO_HOST_ADDR_RANGE, check the lim.log.host_name file to make sure that this parameter is correctly set. If this parameter is not set or is wrong, this is indicated in the log file.
Examples:
  • EGO_HOST_ADDR_RANGE=100 - All IPv4 and IPv6 hosts with a domain address starting with 100 are allowed access.
    • To specify only IPv4 hosts, set the value to 100.*
    • To specify only IPv6 hosts, set the value to 100:*
  • EGO_HOST_ADDR_RANGE=100-110.34.1-10.4-56 - All hosts belonging to a domain with an address having the first number 100 - 110, then 34, then a number 1 - 10, then, a number 4 - 56 are allowed access. No IPv6 hosts are allowed. Example: 100.34.9.45, 100.34.1.4, 102.34.3.20, etc.
  • EGO_HOST_ADDR_RANGE=100.172.1.13 100.*.30-54 124.24-*.1.*-34 - The host with the address 100.172.1.13 is allowed access. All hosts belonging to domains starting with 100, then any number, then a range of 30 to 54 are allowed access. All hosts belonging to domains starting with 124, then from 24 onward, then 1, then 0 - 34 are allowed access. No IPv6 hosts are allowed.
  • EGO_HOST_ADDR_RANGE=12.23.45.* - All hosts belonging to domains starting with 12.23.45 are allowed. No IPv6 hosts are allowed.
  • EGO_HOST_ADDR_RANGE=100.*43 - The * character can be only used to indicate any value. The format of this example is not correct, and an error is inserted in the LIM log and no hosts are able to join the cluster dynamically. No IPv6 hosts are allowed.
  • EGO_HOST_ADDR_RANGE=100.*43 100.172.1.13 - Although one correct address range is specified because *43 is not correct format, the entire line is considered not valid. An error is inserted in the LIM log and no hosts are able to join the cluster dynamically. No IPv6 hosts are allowed.
  • EGO_HOST_ADDR_RANGE = 3ffe - All client IPv6 hosts with a domain address starting with 3ffe are allowed access. No IPv4 hosts are allowed.
  • EGO_HOST_ADDR_RANGE = 3ffe:fffe::88bb:* - Expands to 3ffe:fffe:0:0:0:0:88bb:*.All IPv6 client hosts belonging to domains starting with 3ffe:fffe::88bb:* are allowed. No IPv4 hosts are allowed.
  • EGO_HOST_ADDR_RANGE = 3ffe-4fff:fffe::88bb:aa-ff 12.23.45.* - All IPv6 client hosts belonging to domains starting with 3ffe up to 4fff, then fffe::88bb, and ending with aa up to ff are allowed. IPv4 client hosts belonging to domains starting with 12.23.45 are allowed.
  • EGO_HOST_ADDR_RANGE = 3ffe-*:fffe::88bb:*-ff - All IPv6 client hosts belonging to domains starting with 3ffe up to ffff and ending with 0 up to ff are allowed. No IPv4 hosts are allowed.

Default value: Undefined (dynamic host feature disabled). If you enable dynamic hosts during installation, no security is enabled and all hosts can join the cluster.

See also: EGO_ENABLE_SUPPORT_IPV6

EXINTERVAL

Description: Time interval, in seconds, at which the LIM daemons exchange load information. On extremely busy hosts or networks, or in clusters with a large number of hosts, load may interfere with the periodic communication between LIM daemons. Setting EXINTERVAL to a longer interval can reduce network load and slightly improve reliability, at the cost of slower reaction to dynamic load changes.
Note: If you define the time interval as less than 5 seconds, EGO automatically resets it to 5 seconds.

Syntax: EXINTERVAL=time_in_seconds

Default value: 15 seconds

HOST_INACTIVITY_LIMIT

Description:
  • Integer that is multiplied by EXINTERVAL, the time period you set for the communication between the primary and secondary LIMs to ensure that all components are functioning.
  • A secondary LIM can send its load information any time from EXINTERVAL to (HOST_INACTIVITY_LIMIT‑1)*EXINTERVAL seconds. A primary LIM sends a primary announce to each host at least every EXINTERVAL*HOST_INACTIVITY_LIMIT seconds.
  • The HOST_INACTIVITY_LIMIT must be greater than or equal to 2.
  • Increase or decrease the host inactivity limit to adjust for your tolerance for communication between primary and secondary hosts. For example, if you have hosts that frequently become inactive, decrease the host inactivity limit. Note that to get the correct interval, you may also have to adjust EXINTERVAL.

Syntax: HOST_INACTIVITY_LIMIT=integer

Default value: 5

MASTER_INACTIVITY_LIMIT

Description: An integer reflecting a multiple of EXINTERVAL. A secondary host will attempt to become primary if it does not hear from the previous primary after (HOST_INACTIVITY_LIMIT +host_number*MASTER_INACTIVITY_LIMIT)*EXINTERVAL seconds, where host_number is the position of the host in ego.cluster.cluster_name. The primary host is host_number 0.

Syntax: MASTER_INACTIVITY_LIMIT=integer

Default value: 2

ClusterAdmins section

(Optional) The ClusterAdmins section defines the IBM Spectrum Conductor administrators for the cluster. The only keyword is ADMINISTRATORS.

If the ClusterAdmins section is not present, the default IBM Spectrum Conductor administrator is root. Using root as the primary IBM Spectrum Conductor administrator is not recommended.

ADMINISTRATORS

Description:
  • Specify Linux user names.
  • The first administrator of the expanded list is considered the primary IBM Spectrum Conductor administrator. The primary administrator is the owner of the EGO configuration files, as well as the working files under EGO_SHAREDIR/cluster_name. If the primary administrator is changed, make sure the owner of the configuration files and the files under EGO_SHAREDIR/cluster_name are changed as well.
  • Administrators other than the primary IBM Spectrum Conductor administrator have the same privileges as the primary IBM Spectrum Conductor administrator except that they do not have permission to change EGO configuration files. They can perform cluster-wide operations on jobs, queues, or hosts in the system.
  • For flexibility, each cluster may have its own IBM Spectrum Conductor administrators, who are identified by a user name, although the same administrators can be responsible for several clusters.

Syntax: ADMINISTRATORS=administrator_name ...

Example: The following gives an example of a cluster with two IBM Spectrum Conductor administrators. The user listed first, user2, is the primary administrator.
Begin ClusterAdmins 
ADMINISTRATORS = user2 user7
End ClusterAdmins 

Default value: egoadmin

Host section

The Host section is the last section in ego.cluster.cluster_name and is the only required section. It lists all the hosts in the cluster and gives configuration information for each host.

The order in which the hosts are listed in this section is important, because the first host listed becomes the IBM Spectrum Conductor primary host. Since the primary LIM makes all placement decisions for the cluster, it should be on a fast machine.

The LIM on the first host that is listed becomes the primary LIM if this host is up; otherwise, the second host becomes the primary if it is up, and so on. Also, to avoid the delays that are involved in switching primaries if the first machine goes down, the primary should be on a reliable machine. It is desirable to arrange the list such that the first few hosts in the list are always in the same subnet. This avoids a situation where the second host takes over as primary when there are communication problems between subnets.

Configuration information is of two types:
  • Some fields in a host entry simply describe the machine and its configuration.
  • Other fields set thresholds for various resources.

Example Host section

This example Host section contains descriptive and threshold information for three hosts:
Begin Host 
HOSTNAME   model    type   server r1m pg tmp RESOURCES        RUNWINDOW
hostA      SparcIPC Sparc  1      3.5 15   0 (sunos frame)    ()
hostD      Sparc10  Sparc  1      3.5 15   0 (sunos)          (5:18:30-1:8:30)
hostD      !        !      1      2.0 10   0 ()               () 
hostE      !        !      1      2.0 10   0 (linux !bigmem)  () 
End Host

Descriptive fields

The following fields are required in the Host section:
  • HOSTNAME
  • RESOURCES
  • type
  • model
The following fields are optional:
  • server
  • nd
  • RUNWINDOW
  • REXPRI

HOSTNAME

Description: Official name of the host as returned by hostname(1). The name must be listed in ego.shared as belonging to this cluster.

RESOURCES

Description:
  • The static Boolean resources and static or dynamic numeric and string resources available on this host.
  • The resource names are strings that are defined in the Resource section of ego.shared. You may list any number of resources, which are enclosed in parentheses and separated by blanks or tabs. For example:
    (fs frame hpux)
    
  • Optionally, you can specify an exclusive resource by prefixing the resource with an exclamation mark (!). For example, resource bigmem is defined in ego.shared, and is defined as an exclusive resource for hostE:
    Begin Host
    HOSTNAME   model    type   server r1m pg tmp RESOURCES        RUNWINDOW
    ...
    hostE      !        !      1      2.0 10   0 (linux !bigmem)  ()
    ...
    End Host
  • Opening and closing brackets [ ] are not valid and the resource name must be alphanumeric.
  • You can specify static and dynamic numeric and string resources in the resource column of the Host clause. For example:
    Begin   Host
    HOSTNAME  model type server r1m  mem  swp RESOURCES  #Keywords
    hostA     !     !    1      3.5  ()   ()  (mg elimres patchrev=3 owner=user1)
    hostB     !     !    1      3.5  ()   ()  (specman=5 switch=1 owner=test)
    hostC     !     !    1      3.5  ()   ()  (switch=2 rack=rack2_2_3 owner=test)
    hostD     !     !    1      3.5  ()   ()  (switch=1 rack=rack2_2_3 owner=test)
    End     Host

type

Description:
  • Host type as defined in the HostType section of ego.shared
  • The strings that are used for host types are determined by the system administrator: for example, SUNSOL, DEC, or HPPA. The host type is used to identify binary-compatible hosts.
  • The host type is used as the default resource requirement. That is, if no resource requirement is specified in a placement request, the task is run on a host of the same type as the sending host.
  • Often one host type can be used for many machine models. For example, the host type name SUNSOL6 might be used for any computer with a SPARC processor running SunOS 6. This would include many Sun models and quite a few from other vendors as well.
  • Optionally, the ! keyword for the model or type column, indicates that the host model or type is to be automatically detected by the LIM running on the host.

model

Description:
  • Host model
  • The name must be defined in the HostModel section of ego.shared. This determines the CPU speed scaling factor that is applied in load and placement calculations.
  • Optionally, the ! keyword for the model or type column, indicates that the host model or type is to be automatically detected by the LIM running on the host.

server

Description:
  • Indicates whether the host can receive jobs from other hosts
  • Specify 1 if the host can receive jobs from other hosts; specify 0 otherwise. Servers that are set to 0 are IBM Spectrum Conductor clients. Client hosts do not run the IBM Spectrum Conductor daemons. Client hosts can submit interactive and batch jobs to the cluster, but they cannot execute jobs sent from other hosts.

Default value: 1

nd

Description:
  • Number of local disks
  • This corresponds to the ndisks static resource. On most host types, IBM Spectrum Conductor automatically determines the number of disks, and the nd parameter is ignored.
  • nd should only count local disks with file systems on them. Do not count either disks that are used only for swapping or disks that are mounted with NFS.

Default value: The number of disks that are determined by the LIM, or 1 if the LIM cannot determine this.

RUNWINDOW

Description:
  • Dispatch window for interactive tasks.
  • When the host is not available for remote execution, the host status is lockW (locked by run window). LIM does not schedule interactive tasks on hosts that are locked by dispatch windows. Run windows only apply to interactive tasks placed by LIM. The IBM Spectrum Conductor batch system uses its own (optional) host dispatch windows to control batch job processing on batch server hosts.

Format: A dispatch window consists of one or more time windows in the format begin_time-end_time. No blanks can separate begin_time and end_time. Time is specified in the form [day:]hour[:minute]. If only one field is specified, IBM Spectrum Conductor assumes it is an hour. Two fields are assumed to be hour:minute. Use blanks to separate time windows.

Default value: Always accept remote jobs

REXPRI

Description:
  • Default execution priority for interactive remote jobs runs under the RES
  • The range is from -20 to 20. REXPRI corresponds to the BSD-style nice value used for remote jobs. For hosts with System V-style nice values with the range 0 - 39, a REXPRI of -20 corresponds to a nice value of 0, and +20 corresponds to 39. Higher values of REXPRI correspond to low execution priority; -20 gives the highest priority, 0 is the default priority for login sessions, and +20 is the lowest priority.

Default value: 0

Threshold fields

Description:
  • The LIM uses these thresholds in determining whether to place remote jobs on a host. If one or more IBM Spectrum Conductor load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy, and LIM will not recommend jobs to that host.
  • The CPU run queue length threshold values (r15s, r1m, and r15m) are taken as effective queue lengths as reported by egosh resource view.
  • All of these fields are optional; you only need to configure thresholds for load indices that you wish to use for determining whether hosts are busy. Fields that are not configured are not considered when determining host status. The keywords for the threshold fields are not case-sensitive.
  • Thresholds can be set for any of the following:
    • The built-in IBM Spectrum Conductor load indexes (r15s, r1m, r15m, ut, pg, it, io, ls, swp, mem, tmp)
    • External load indexes defined in the Resource section of ego.shared

ResourceMap section

Description: The ResourceMap section defines shared resources in your cluster. This section specifies the mapping between shared resources and their sharing hosts. When you define resources in the Resources section of ego.shared, there is no distinction between a shared and non-shared resource. By default, all resources are not shared and are local to each host. By defining the ResourceMap section, you can define resources that are shared by all hosts in the cluster or define resources that are shared by only some of the hosts in the cluster.

This section must appear after the Host section of ego.cluster.cluster_name, because it has a dependency on host names defined in the Host section.

Structure: The first line consists of the keywords RESOURCENAME and LOCATION. Subsequent lines describe the hosts that are associated with each configured resource.

Example:
Begin ResourceMap 
RESOURCENAME   LOCATION 
verilog        (5@[all])
local          ([host1 host2] [others])
End ResourceMap

The resource verilog must already be defined in the RESOURCE section of the ego.shared file. It is a static numeric resource that is shared by all hosts. The value for verilog is 5. The resource local is a numeric shared resource that contains two instances in the cluster. The first instance is shared by two machines, host1 and host2. The second instance is shared by all other hosts.

LOCATION

Description:
  • Defines the hosts that share the resource
  • For a static resource, you must define an initial value here as well. Do not define a value for a dynamic resource.
  • instance is a list of host names that share an instance of the resource. The reserved words all, others, and default can be specified for the instance:
    • all: indicates that there is only one instance of the resource in the whole cluster and that this resource is shared by all of the hosts
    • Use the not operator (~) to exclude hosts from the all specification. For example:
      (2@[all ~host3 ~host4])
      
      means that 2 units of the resource are shared by all server hosts in the cluster made up of host1 host2 ... hostn, except for host3 and host4. This is useful if you have a large cluster but only want to exclude a few hosts.
    • The parentheses are required in the specification. The not operator can only be used with the all keyword. It is not valid with the keywords others and default.
    • others:  indicates that the rest of the server hosts not explicitly listed in the LOCATION field comprise one instance of the resource. For example:
      2@[host1] 4@[others] 
      
      indicates that there are 2 units of the resource on host1 and 4 units of the resource that is shared by all other hosts.
    • default: indicates an instance of a resource on each host in the cluster.
    • This specifies a special case where the resource is in effect that is not shared and is local to every host. default means at each host. Normally, you should not need to use default, because by default all resources are local to each host. You might want to use ResourceMap for a non-shared static resource if you need to specify different values for the resource on different hosts.

RESOURCENAME

Description:
  • Name of the resource
  • This resource name must be defined in the Resource section of ego.shared. You must specify at least a name and description for the resource, using the keywords RESOURCENAME and DESCRIPTION.
    • A resource name cannot begin with a number.
    • A resource name cannot contain any of the following characters:
      :  .  (  )  [  +  - *  /  !  &  | <  >  @  =
      
    • A resource name cannot be any of the following reserved names:
      cpu cpuf io logins ls idle maxmem maxswp maxtmp type model status it 
      mem ncpus define_ncpus_cores define_ncpus_procs
      define_ncpus_threads ndisks pg r15m r15s r1m swap swp tmp ut
    • To avoid conflict with inf and nan keywords in 3rd-party libraries, resource names should not begin with inf or nan (uppercase or lowercase). Resource requirement strings, such as -R "infra" or -R "nano" causes an error. Use -R "defined(infxx)" or -R "defined(nanxx)", to specify these resource names.
    • Resource names are case-sensitive
    • Resource names can be up to 39 characters in length