Netcool event load splitting calculation
When you deploy IBM Netcool Operations Insight ObjectServer integrations, it is important to understand the event rate of domain alerts and calculate the number of integrations that are required to share the overall load. A high event load can lead to breakdowns due to the integration pod's file storage that is filled with unprocessed Store-and-Forward (SAF) files. By planning and sizing your Netcool Operations Insight integrations deployment, you can help ensure that it runs smoothly and efficiently.
IDUC and SAF files
-
Netcool Operations Insight integration triggers an IDUC query to ObjectServer for latest alert events every thirty seconds.
-
The events that are fetched by IDUC are buffered in SAF files. This activity can be seen as SAG staging.
-
When a SAF file passes the staging phase, it is consumed by the integration's data forwarding process.
-
Each SAF file can hold up 1,000 entries of Netcool Operations Insight events, but the maximum size of a fully filled SAF file is determined by the table columns that are involved in IDUC.
-
The columns of interest for IDUC data replication are configured in the integration's mapping. More columns require more space usage per SAF file, but the file size alone is not the factor of file storage depletion. The unmitigated growth of SAF files is the main cause to the storage issue.
-
Multiple SAF files might be produced in one IDUC. These files might be partially filled because the SAF staging process marks the file ready for the SAF processing (data forwarding process) to consume when either of the following conditions is met:
- The SAF file is fully filled. For example, 1000 rows.
- The wait time for more rows that are coming in is expired.
Optimal rate of SAF file growth
- To prevent the pile up of SAF files, all the SAF files that are produced in the N-th cycle of IDUC need to be processed by the data forwarding process before the end of the (N+1)-th cycle.
- The data forwarding process is running in parallel to IDUC and SAF staging.
- If there are more than 1,000 changed rows that are detected in an IDUC query, then multiple batching is required to complete the pulling of all the changed rows from the ObjectServer.
Understanding IDUC, SAF staging, and SAF processing


Collecting event load statistics
-
Edit the following
objsvr_cred.sh
file to configure the ObjectServer username and password.#!/bin/bash USER=root PASS=password echo $USER $PASS
-
Edit the following
os_get_event_count.sh
file:- In the file, specify the time span for event count query. Configure the query_interval_min field.
- In the file, specify an alert group for event count query. Configure the MY_ALERT_GROUP field.
#!/bin/bash me="$(basename "$(test -L "$0" && readlink "$0" || echo "$0")")" if [ $# -ne 1 ]; then echo "illegal number of parameters" echo "Usage: $me <object_server_name>" echo exit 0 fi OBJSERV_NAME=$1 CRED=`./objsvr_cred.sh` USER=`echo $CRED | awk -F' ' '{ print $1 }'` PASS=`echo $CRED | awk -F' ' '{ print $2 }'` # Configure the alert criteria of your choice MY_ALERT_GROUP="AlertGroup='ConnectionStatus'" omnihome="${OMNIHOME}" sqllocation="./tmpsql" mkdir -p $sqllocation total_all_event_count=0 total_specific_event_count=0 start_time=0 query_interval_min=10 #in minutes prev_epoch=0 sleep_interval=20 #in seconds echo "Event count query for $query_interval_min mins" query_interval_secs=$(( $query_interval_min * 60 )) curr_timestamp=`date` curr_epoch=`date +%s` time_to_stop=$(( $curr_epoch + $query_interval_secs )) echo "Current time: "${curr_epoch} "("${curr_timestamp}")" echo "Time to stop: "${time_to_stop} while : do rand=`echo $RANDOM | base64 | head -c 20; echo` curr_epoch=`date +%s` if [[ $curr_epoch -gt $time_to_stop ]] then curr_timestamp=`date` echo "Query ends here. Time: "${curr_timestamp} exit 1 fi if [[ $prev_epoch -eq 0 ]] then start_time=$curr_epoch prev_epoch=$(( $curr_epoch - $sleep_interval )) fi sttchg_window="StateChange>${prev_epoch} AND StateChange<=${curr_epoch}" select_condition=${sttchg_window} get_count_sqlfile_all="${sqllocation}/${rand}.get_count_all.sql" get_count_sqlfile_specific="${sqllocation}/${rand}.get_count_specific.sql" echo "select count(*) from alerts.status where ${select_condition}" >> ${get_count_sqlfile_all} echo "go" >> ${get_count_sqlfile_all} echo "select count(*) from alerts.status where ${select_condition} AND ${MY_ALERT_GROUP}" >> ${get_count_sqlfile_specific} echo "go" >> ${get_count_sqlfile_specific} echo "Show ${get_count_sqlfile_all} content:" cat ${get_count_sqlfile_all} echo echo "Show ${get_count_sqlfile_specific} content:" cat ${get_count_sqlfile_specific} echo all_count_result_file="${sqllocation}/${rand}.all_count_result.txt" ${omnihome}/bin/nco_sql -server ${OBJSERV_NAME} -user "${USER}" -password "${PASS}" < ${get_count_sqlfile_all} > ${all_count_result_file} specific_count_result_file="${sqllocation}/${rand}.specific_count_result.txt" ${omnihome}/bin/nco_sql -server ${OBJSERV_NAME} -user "${USER}" -password "${PASS}" < ${get_count_sqlfile_specific} > ${specific_count_result_file} all_evt_cnt_str=`sed -n '3p' < ${all_count_result_file}` all_curr_event_count=$(($all_evt_cnt_str)) total_all_event_count=$(( $total_all_event_count + $all_curr_event_count )) specific_evt_cnt_str=`sed -n '3p' < ${specific_count_result_file}` specific_curr_event_count=$(($specific_evt_cnt_str)) total_specific_event_count=$(( $total_specific_event_count + $specific_curr_event_count )) window_len=$(( $curr_epoch - $prev_epoch )) echo "Time window interval (secs): "${window_len} echo "All group event count for StateChange (${prev_epoch}:${curr_epoch}]: "${all_curr_event_count} echo "Total all group event count: "${total_all_event_count} echo "Specific group event count for StateChange (${prev_epoch}:${curr_epoch}]: "${specific_curr_event_count} echo "Total specific group event count: "${total_specific_event_count} rm -f $get_count_sqlfile_specific rm -f $get_count_sqlfile_all rm -f $all_count_result_file rm -f $specific_count_result_file prev_epoch=$curr_epoch echo "Sleep for "$sleep_interval" secs..." sleep $sleep_interval done
-
Run the
os_get_event_count.sh
script to collect event count statistics for all alerts and specific alerts.os_get_event_count.sh <ObjectServer_Name> > <output_file>
Where
<ObjectServer_Name>
is the name of your IBM Netcool Operations Insight ObjectServer. For example,NCOMS
<output_file>
is the name of the output file where log of the script is saved. For example,NCOMS_event_count.log
.
Calculating the number of integrations that are required for given load of domain alerts

Use the following steps to calculate the number of integrations that are required for given loads of domain alerts:
-
Enter the event count reported by the script.
-
Enter the interval configured in the script.
-
Calculate the event rate (total events / duration measured * 60 sec).
-
Calculate the domain event rate (total domain events / duration measured * 60 sec).
-
Project the number of events per IDUC.
-
Calculate the percentage of rows in an SAF.
-
Calculate the number of integrations required.
Note: The following calculation assumes that you are using the default IDUC interval, which is 30 seconds. The formula is displayed:
ceiling(events_per_IDUC / 50 * IDUC interval)
=>ceiling(events_per_IDUC / 50 * 30)