Customizing perf loaders to collect user defined fields

Explorer nodes support customized perf loaders to collect user defined fields using Elasticsearch.

About this task

The example used in the following procedure assumes that users want to add the department_name field into the lsf_job_acct index based on the user_name field. This will load the user_name and department_name mapping data into the user_department_mapping index.

Procedure

Define the Elasticsearch template for the user_department_mapping index using the following command. Substitute ${ES_IP_PORT} with the correct Elasticsearch URL. For example, 192.168.1.1:9200.


curl -H "Expect:" -H "Content-Type: application/json" -XPUT http://"${ES_IP_PORT}"/_template/user_department_mapping -s -d '{
     "order": 0,
     "template": "user_department_mapping",
     "settings": {
     "index": {
         "number_of_shards": "1",
         "number_of_replicas": "0"
     }
     },
     "mappings": {
     "dynamic_templates": [
         {
         "not_analyzed_string": {
             "mapping": {
             "index": "true",
             "type": "keyword"
             },
             "match_mapping_type": "string",
             "match": "*"
         }
         }
     ],
     "date_detection": false
     },
     "aliases": {}
  }'

Load user_name and department_name mapping data. This can be done using many third-party tools or you can create your own script or tool. For example, Logstash can be downloaded from https://www.elastic.co/downloads/logstash
1. Prepare the CSV file containing user_name and department_name in the following format and save it in the /tmp/user_department.csv file.
```
user_name,department_name
u1,d1
u2,d2
...
u9,d9
```
2. Prepare the Logstash configuration file as follows and save it in the /tmp/user_department.conf file.
```
input {
    file {
        path => ["/tmp/user_department.csv"]  
        start_position => "beginning"
    }
 }
 filter {
        csv {
            separator => ","
            columns => ["user_name","department_name"]
        }
        mutate {
            convert => {
                "user_name" => "string"
                "department_name" => "string"
            }

            remove_field => ["@version", "message", "host", "@timestamp", "path"]
        }
}
output {
    elasticsearch {
            hosts => ["192.168.1.1:9200"]
            index => "user_department_mapping"
            document_id => "%{user_name}_%{department_name}"
    }
}
```
3. Run Logstash:
```
cd /YOUR_LOGSTASH_DIR
./bin/logstash -f /tmp/user_department.conf
```
4. Run the following command in another terminal to check whether the data has been loaded.
  
  curl http://"${ES_IP_PORT}"/user_department_mapping/_count
  
  The result should be similar to the following:
  
  {"count":6,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0}}
  
  The "count" value is the number of inserted data.
5. When the data has been loaded successfully, stop Logstash using “Ctrl + C" (or similar command).

Modify the lsbacct.xml file to define the department_name field.

Add the <Dependencies> element to the top level of the <Writer> element.


+ "Name" attribute in <Dependency> refers to the index name in which the mapping 
+ data stores. In this sample, it is "user_department_mapping". 
+ "Name" attribute in <Key> element means the "join" key name between lsf_job_acct
+ and user_name_mapping indices. There should be fields with this name in both 
+ lsf_job_acct and user_department indices.
+ "Name" attribute in <Value> element means the name of the newly added field. 
+ This field name should be the in user_department_mapping index.

   <Dependencies>
               <Dependency Name="user_department_mapping">
                       <Keys>
                               <Key Name="user_name"/>
                       </Keys>
                       <Values>
                               <Value Name="department_name"/>
                       </Values>
               </Dependency>
   </Dependencies>

Add the <Extra> element to the <SQL> element following the <PK> element.

Note: The "Name" attribute in the <Extra> element must be exactly the same as the "Name" attribute defined in the <Value> element of the related <Dependency> element.


<SQL Input="LSB_EVENTS">
       <Statement>
       </Statement>
       <Field Name="host_name" Column="HOST_NAME"/>
            ...
       <Field Name="cluster_name" Column="CLUSTER_NAME"/>
       <PK Name="cluster_name"/>
            ...
       <PK Name="time_stamp_utc"/>
       <Extra Name="department_name"/>
</SQL>

Restart the perf loaders:


source PERF_TOP/conf/profile.perf
perfadmin stop all
perfadmin start all