Netcool event load splitting calculation

When you deploy IBM Netcool Operations Insight ObjectServer integrations, it is important to understand the event rate of domain alerts and calculate the number of integrations that are required to share the overall load. A high event load can lead to breakdowns due to the integration pod's file storage that is filled with unprocessed Store-and-Forward (SAF) files. By planning and sizing your Netcool Operations Insight integrations deployment, you can help ensure that it runs smoothly and efficiently.

IDUC and SAF files

  • Netcool Operations Insight integration triggers an IDUC query to ObjectServer for latest alert events every thirty seconds.

  • The events that are fetched by IDUC are buffered in SAF files. This activity can be seen as SAG staging.

  • When a SAF file passes the staging phase, it is consumed by the integration's data forwarding process.

  • Each SAF file can hold up 1,000 entries of Netcool Operations Insight events, but the maximum size of a fully filled SAF file is determined by the table columns that are involved in IDUC.

  • The columns of interest for IDUC data replication are configured in the integration's mapping. More columns require more space usage per SAF file, but the file size alone is not the factor of file storage depletion. The unmitigated growth of SAF files is the main cause to the storage issue.

  • Multiple SAF files might be produced in one IDUC. These files might be partially filled because the SAF staging process marks the file ready for the SAF processing (data forwarding process) to consume when either of the following conditions is met:

    • The SAF file is fully filled. For example, 1000 rows.
    • The wait time for more rows that are coming in is expired.

Optimal rate of SAF file growth

  • To prevent the pile up of SAF files, all the SAF files that are produced in the N-th cycle of IDUC need to be processed by the data forwarding process before the end of the (N+1)-th cycle.
  • The data forwarding process is running in parallel to IDUC and SAF staging.
  • If there are more than 1,000 changed rows that are detected in an IDUC query, then multiple batching is required to complete the pulling of all the changed rows from the ObjectServer.

Understanding IDUC, SAF staging, and SAF processing

Cycles of IDUC, SAF staging, and processing projected in clock dial view
Figure. Cycles of IDUC, SAF staging, and processing projected in clock dial view

Cycles of IDUC, SAF staging, and processing projected in linear time view
Figure. Cycles of IDUC, SAF staging, and processing projected in linear time view

Collecting event load statistics

  1. Edit the following objsvr_cred.sh file to configure the ObjectServer username and password.

    #!/bin/bash
    
    USER=root
    PASS=password
    
    echo $USER $PASS
    
  2. Edit the following os_get_event_count.sh file:

    • In the file, specify the time span for event count query. Configure the query_interval_min field.
    • In the file, specify an alert group for event count query. Configure the MY_ALERT_GROUP field.
    #!/bin/bash
    
    me="$(basename "$(test -L "$0" && readlink "$0" || echo "$0")")"
    
    if [ $# -ne 1 ]; then
       echo "illegal number of parameters"
       echo "Usage: $me <object_server_name>"
       echo
       exit 0
    fi
    
    OBJSERV_NAME=$1
    
    CRED=`./objsvr_cred.sh`
    USER=`echo $CRED | awk -F' ' '{ print $1 }'`
    PASS=`echo $CRED | awk -F' ' '{ print $2 }'`
    
    # Configure the alert criteria of your choice
    MY_ALERT_GROUP="AlertGroup='ConnectionStatus'"
    
    omnihome="${OMNIHOME}"
    sqllocation="./tmpsql"
    mkdir -p $sqllocation
    
    total_all_event_count=0
    total_specific_event_count=0
    
    start_time=0
    query_interval_min=10 #in minutes
    prev_epoch=0
    sleep_interval=20 #in seconds
    
    echo "Event count query for $query_interval_min mins"
    query_interval_secs=$(( $query_interval_min * 60 ))
    
    curr_timestamp=`date`
    curr_epoch=`date +%s`
    time_to_stop=$(( $curr_epoch + $query_interval_secs ))
    echo "Current time: "${curr_epoch} "("${curr_timestamp}")"
    echo "Time to stop: "${time_to_stop}
    
    while :
    do
       rand=`echo $RANDOM | base64 | head -c 20; echo`
       curr_epoch=`date +%s`
    
       if [[ $curr_epoch -gt $time_to_stop ]]
       then
          curr_timestamp=`date`
          echo "Query ends here. Time: "${curr_timestamp}
          exit 1
       fi
       if [[ $prev_epoch -eq 0 ]]
       then
          start_time=$curr_epoch
          prev_epoch=$(( $curr_epoch - $sleep_interval ))
       fi
    
       sttchg_window="StateChange>${prev_epoch} AND StateChange<=${curr_epoch}"
       select_condition=${sttchg_window}
    
       get_count_sqlfile_all="${sqllocation}/${rand}.get_count_all.sql"
       get_count_sqlfile_specific="${sqllocation}/${rand}.get_count_specific.sql"
    
       echo "select count(*) from alerts.status where ${select_condition}" >> ${get_count_sqlfile_all}
       echo "go" >> ${get_count_sqlfile_all}
    
       echo "select count(*) from alerts.status where ${select_condition} AND ${MY_ALERT_GROUP}" >> ${get_count_sqlfile_specific}
       echo "go" >> ${get_count_sqlfile_specific}
    
       echo "Show ${get_count_sqlfile_all} content:"
       cat ${get_count_sqlfile_all}
       echo
       echo "Show ${get_count_sqlfile_specific} content:"
       cat ${get_count_sqlfile_specific}
       echo
    
       all_count_result_file="${sqllocation}/${rand}.all_count_result.txt"
       ${omnihome}/bin/nco_sql -server ${OBJSERV_NAME} -user "${USER}" -password "${PASS}" < ${get_count_sqlfile_all} > ${all_count_result_file}
    
       specific_count_result_file="${sqllocation}/${rand}.specific_count_result.txt"
       ${omnihome}/bin/nco_sql -server ${OBJSERV_NAME} -user "${USER}" -password "${PASS}" < ${get_count_sqlfile_specific} > ${specific_count_result_file}
    
       all_evt_cnt_str=`sed -n '3p' < ${all_count_result_file}`
       all_curr_event_count=$(($all_evt_cnt_str))
       total_all_event_count=$(( $total_all_event_count + $all_curr_event_count ))
    
       specific_evt_cnt_str=`sed -n '3p' < ${specific_count_result_file}`
       specific_curr_event_count=$(($specific_evt_cnt_str))
       total_specific_event_count=$(( $total_specific_event_count + $specific_curr_event_count ))
    
       window_len=$(( $curr_epoch - $prev_epoch ))
       echo "Time window interval (secs): "${window_len}
       echo "All group event count for StateChange (${prev_epoch}:${curr_epoch}]: "${all_curr_event_count}
       echo "Total all group event count: "${total_all_event_count}
       echo "Specific group event count for StateChange (${prev_epoch}:${curr_epoch}]: "${specific_curr_event_count}
       echo "Total specific group event count: "${total_specific_event_count}
    
       rm -f $get_count_sqlfile_specific
       rm -f $get_count_sqlfile_all
       rm -f $all_count_result_file
       rm -f $specific_count_result_file
    
       prev_epoch=$curr_epoch
    
       echo "Sleep for "$sleep_interval" secs..."
       sleep $sleep_interval
       done
    
    
  3. Run the os_get_event_count.sh script to collect event count statistics for all alerts and specific alerts.

    os_get_event_count.sh <ObjectServer_Name> > <output_file>
    

    Where

    • <ObjectServer_Name> is the name of your IBM Netcool Operations Insight ObjectServer. For example, NCOMS
    • <output_file> is the name of the output file where log of the script is saved. For example, NCOMS_event_count.log.

Calculating the number of integrations that are required for given load of domain alerts

Number of integrations required to process the given load of domain alerts
Figure. Number of integrations required to process the given load of domain alerts

Use the following steps to calculate the number of integrations that are required for given loads of domain alerts:

  1. Enter the event count reported by the script.

  2. Enter the interval configured in the script.

  3. Calculate the event rate (total events / duration measured * 60 sec).

  4. Calculate the domain event rate (total domain events / duration measured * 60 sec).

  5. Project the number of events per IDUC.

  6. Calculate the percentage of rows in an SAF.

  7. Calculate the number of integrations required.

    Note: The following calculation assumes that you are using the default IDUC interval, which is 30 seconds. The formula is displayed:

    ceiling(events_per_IDUC / 50 * IDUC interval) => ceiling(events_per_IDUC / 50 * 30)