ego.shared reference

The ego.shared file contains common definitions that are shared by clusters defined by ego.cluster.cluster_name files. This includes lists of cluster names, host types, host models, special resources available, and external load indices.

This file is installed by default in the directory defined by EGO_CONFDIR.

Changing ego.shared configuration

After making any changes to ego.shared, run the following command:
egosh ego restart

Cluster section

Description: (Required) Lists the cluster names recognized by IBM® Spectrum Conductor.

Structure: The first line must contain the mandatory keyword ClusterName. The other keyword is optional.

Each subsequent line defines one cluster.

Example:
Begin Cluster
ClusterName  # Keyword
cluster1     
End Cluster

HostType section

Description: (Required) Lists the valid host types in the cluster. All hosts that can run the same binary executable are in the same host type.

Structure: The first line consists of the mandatory keyword TYPENAME.

Subsequent lines name valid host types.

Example:
Begin HostType
TYPENAME

X86_64

End HostType

TYPENAME

Description: Host type names are usually based on a combination of the hardware name and operating system. If your site already has a system for naming host types, you can use the same names for IBM Spectrum Conductor.

HostModel section

Description: (Required) Lists models of machines and gives the relative CPU scaling factor for each model. All hosts of the same relative speed are assigned the same host model.

IBM Spectrum Conductor uses the relative CPU scaling factor to normalize the CPU load indices so that tasks are more likely to be sent to faster hosts. The CPU factor affects the calculation of task execution time limits and accounting. Using large or inaccurate values for the CPU factor can cause confusing results when CPU time limits or accounting are used.

Structure: The first line consists of the mandatory keywords MODELNAME, CPUFACTOR, and ARCHITECTURE.

Subsequent lines define a model and its CPU factor.

Example:
Begin HostModel MODELNAME  CPUFACTOR     ARCHITECTURE
PC400        13.0        (i86pc_400 i686_400)
PC450        13.2        (i86pc_450 i686_450)
Sparc5F       3.0        (SUNWSPARCstation5_170_sparc)
Sparc20       4.7        (SUNWSPARCstation20_151_sparc)
Ultra5S      10.3        (SUNWUltra5_270_sparcv9 SUNWUltra510_270_sparcv9)
End HostModel

ARCHITECTURE

Description: (Reserved for system use only) Indicates automatically detected host models that correspond to the model names.

CPUFACTOR

Description: Though it is not required, you would typically assign a CPU factor of 1.0 to the slowest machine model in your system and higher numbers for the others. For example, for a machine model that executes at twice the speed of your slowest model, a factor of 2.0 should be assigned.

MODELNAME

Description: Generally, you need to identify the distinct host types in your system, such as MIPS and SPARC first, and then the machine models within each, such as SparcIPC, Sparc1, Sparc2, and Sparc10.

Automatically detected host models and types

Description: When you first install IBM Spectrum Conductor, you do not necessarily need to assign models and types to hosts in ego.cluster.cluster_name. If you do not assign models and types to hosts in ego.cluster.cluster_name, LIM automatically detects the model and type for the host.

Automatic detection of host model and type is useful because you no longer need to make changes in the configuration files when you upgrade the operating system or hardware of a host and reconfigure the cluster. IBM Spectrum Conductor will automatically detect the change.

Mapping to CPU factors

Description: Automatically detected models are mapped to the short model names in ego.shared in the ARCHITECTURE column. Model strings in the ARCHITECTURE column are only used for mapping to the short model names.

Example: ego.shared file:
Begin HostModel
MODELNAME   CPUFACTOR     ARCHITECTURE
SparcU5     5.0           (SUNWUltra510_270_sparcv9)
PC486       2.0           (i486_33 i486_66)
PowerPC     3.0           (PowerPC12 PowerPC16 PowerPC31)
End HostModel

If an automatically detected host model cannot be matched with the short model name, it is matched to the best partial match and a warning message is generated.

If a host model cannot be detected or is not supported, it is assigned the DEFAULT model name and an error message is generated.

Naming convention

Description: Models that are automatically detected are named according to the following convention:
hardware_platform [_processor_speed[_processor_type]]
where:
  • hardware_platform is the only mandatory component
  • processor_speed is the optional clock speed and is used to differentiate computers within a single platform
  • processor_type is the optional processor manufacturer that is used to differentiate processors with the same speed
  • Underscores (_) between hardware_platform, processor_speed, processor_type are mandatory.

Resource section

Description: Optional. Defines resources (must be done by the cluster administrator).

Resource section structure

Description: The first line consists of the keywords. RESOURCENAME and DESCRIPTION are mandatory. The other keywords are optional. Subsequent lines define resources.

Example:
Begin Resource
RESOURCENAME             TYPE    INTERVAL INCREASING  DESCRIPTION        # Keywords
   fs                    Boolean   ()       ()          (File server)
   cs                    Boolean   ()       ()          (Compute server)
   frame                 Boolean   ()       ()          (Hosts with FrameMaker licence)
   bigmem                Boolean   ()       ()          (Hosts with very big memory)
   diskless              Boolean   ()       ()          (Diskless hosts)
       
   linux                 Boolean   ()       ()          (LINUX UNIX)
   nt                    Boolean   ()       ()          (Windows NT)
   mg                    Boolean   ()       ()          (Management hosts)
   scode                 Numeric   5        Y           (Host harvesting code)
   scvg                  Boolean   ()       ()          (Resource tag identifying harvest-capable hosts)
   agent_control         String    5        ()          (Host harvesting flag)
   cit                   Numeric   5        N           (Amount of time in minutes that a CPU has been idle) 
   uit_t                 Numeric   5        Y           (Idle time threshold, in minutes)
   cu_t                  Numeric   5        Y           (Adjusted CPU utilization threshold, as a percentage)
   cit_t                 Numeric   5        Y           (CPU idle time threshold, in minutes)
   define_ncpus_procs    Boolean   ()       ()          (ncpus := procs)
   define_ncpus_cores    Boolean   ()       ()          (ncpus := cores)
   define_ncpus_threads  Boolean   ()       ()          (ncpus := threads)
   svrscvg               Boolean   ()       ()          (Resource tag identifying harvest-capable server hosts)
   vmscvg                Boolean   ()       ()          (Resource tag identifying harvest-capable virtual server hosts)
   acu                   Numeric   5        Y           (Adjusted CPU utilization which not include CPU usage of ASC
                                                         and exempt process list, as a percentage)
   exempt_process        String    5        ()          (process list which will be excluded for calculating CPU usage)
   close_process         String    5        ()          (process list which will trigger host close or not open)
End Resource

RESOURCENAME

Description: The name that you assign to the new resource. An arbitrary character string.
  • A resource name cannot begin with a number.
  • A resource name cannot contain any of the following characters:
    :  .  (  )  [  +  - *  /  !  &  | <  >  @  =
    
  • A resource name cannot be any of the following reserved names:
    cpu cpuf io logins ls idle maxmem maxswp maxtmp type model status it 
    
    mem ncpus define_ncpus_cores define_ncpus_procs 
    
    define_ncpus_threads ndisks pg r15m r15s r1m swap swp tmp ut
    
  • To avoid conflict with inf and nan keywords in 3rd-party libraries, resource names should not begin with inf or nan (uppercase or lowercase). Resource requirement strings, such as -R "infra" or -R "nano" will cause an error. Use -R "defined(infxx)" or -R "defined(nanxx)", to specify these resource names.
  • Resource names are case sensitive.
  • Resource names can be up to 39 characters in length.
  • For Solaris machines, the keyword int is reserved and cannot be used.

TYPE

Description: The type of resource:
  • Boolean: Resources that have a value of 1 on hosts that have the resource and 0 otherwise.
  • Numeric: Resources that take numerical values, such as all the load indices, number of processors on a host, or host CPU factor.
  • String: Resources that take string values, such as host type, host model, or host status. The resource name can be up to 39 characters in length, and the value of resource can be up to 4096.
    Note: For resources used in multidimensional scheduling, add the [MDS] prefix to the resource DESCRIPTION. String values do not take effect for multidimensional allocations without the [MDS] prefix in the resource description. For example:
    docker_active    String  5  Y        ([MDS]Used by elim to identify Docker active hosts)

Default: If TYPE is not given, the default type is Boolean.

INTERVAL

Description: Optional. Applies to dynamic resources only.

Defines the time interval (in seconds) at which the resource is sampled by the ELIM.

If INTERVAL is defined for a numeric resource, it becomes an external load index.

Default: If INTERVAL is not given, the resource is considered static.

INCREASING

Description: Applies to numeric resources only.

If a larger value means greater load, INCREASING should be defined as Y. If a smaller value means greater load, INCREASING should be defined as N.

DESCRIPTION

Description: Brief description of the resource.
Note: For resources used in multidimensional scheduling, string TYPE values do not take effect without the [MDS] prefix in DESCRIPTION. Ensure that you add [MDS] prefix, as in the following example:
docker_active    String  5  Y        ([MDS]Used by elim to identify Docker active hosts)