Contents


Continuous monitoring of server resources utilization using custom scripts

A light-weight data pull based methodology using PHP scripting and MariaDB that helps to monitor resource utilization of virtual machines

Comments

Prerequisites

Custom monitoring data collector scripts are created using PHP and shall be executed from a server. This will connect to Linux target systems and extract (processor, RAM, HDD, and network) utilization monitoring data. To successfully enable this data collection using this method you'll need:

  • Two Linux servers / virtual machines (VMs) / dockers
  • Latest version of PHP installed
  • A database instance (for example, MariaDB)
  • A database client

These monitoring scripts can be modified to collect any kind of resource utilization data from the target systems at any point in time.

Understanding the custom script monitoring method

The key components involved in this method are the monitoring data collector (one or more) and receiver which might be servers, VMs, or dockers. The data collectors will run the monitoring data extraction script in a cron job (say every 15 minutes). This script when run connects to the target systems (list of servers to be monitored) every 15 minutes and fetches the resource utilization data (as of now processor, memory, disk, and network). This data is written to .csv files and compressed. So, for every 15 minutes, three such files (one each for processor and memory, disk, and network) will be created. Every 15 minutes, these files will be uploaded to the receiver.

The receiver runs another data processing script and a DB instance with tables to store monitoring data. The receiver script extracts the files, processes them, and writes to the respective tables. Refer to the block diagram in Figure 1 for details.

Figure 1. High-level block diagram to monitor data extraction from multiple data collectors
Figure 2. Detailed block diagram – Monitoring data extraction from single data collector

Creating and maintaining the list of servers (target systems) to be monitored

The monitoring data extraction script running as data collector needs a list of servers (as input) that need to be monitored. This shall be maintained in a database table. The source for this table should be decided based on the business scenario.

Logically, it should be the organization's server discovery or inventory solution as it will have the real-time data of all live resources. But sometimes, the business scenario might be to monitor certain critical server resources.

However it is, the following columns are required for this table:

  • Serial number of the row
  • Server serial number
  • Host-name
  • IP address
  • Type
  • Platform
  • Operating system
  • Location

If you are going to test this script for learning purpose or if you are planning to customize it further, a table shall be manually created with sample data and fed as input.

Enabling SSH key-based connectivity from the data collectors to the target systems

Secure Shell (SSH) is an open source and highly trusted network protocol. This can be used to remotely log in to Linux servers (target systems) for running commands and also for transferring files from one computer to another computer using Secure Copy Protocol (SCP).

The next step is to identify the data collector (server or VM) that runs the Linux operating system. SSH protocol based password-less lo-gin has to be enabled from the data collectors to the identified Linux target systems. This enables the monitoring script to connect to the target systems and to run commands to extract monitoring data.

You need to perform the followed steps to enable SSH key-based connect from the data collectors to the target systems:

  • Log in to the data collector using any user name (from which the monitoring script can run).
  • Run the ssh-keygen -t rsa command to generate a pair of public keys.
  • Use SSH from the data collector, connect to the target systems using a common host name and create a .ssh directory under it.
    ssh <common-username>@<target-system-ip> mkdir -p .ssh
  • Use SSH from the data collector and upload the collector's public SSH key to the <common-username>'s .ssh directory of all target systems as a file named authorized_keys.
    cat .ssh/id_rsa.pub | ssh <common-username>@<target-system-ip> 'cat >> .ssh/authorized_keys'
  • Note that there can be different SSH versions on the data collectors and target systems. So, set permissions on the .ssh directory and the authorized_keys file using the following command:
    ssh <common-username>@<target-system-ip> "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"

Note: Use a <common-username> for the connect across the target systems because only then the monitoring script can use one common ssh <user-name>@<target-system-ip> code line to connect to target systems for running commands remotely.

Creating directories for monitoring data files on the data collectors and the receiver

Three directories (named 'cpumem', 'disk', and 'network') are needed on the data collectors to store the three .zip files generated by the monitoring data extraction script every 15 minutes. Post generation, these files are uploaded to the receiver which also has three similar directories to receive and store them for processing.

The receiver script will extract these files, process them, and write the data to the respective monitoring data tables (three tables, one each for processor, memory, and disk and network)

Creation of monitoring data tables on the receiver

Every 15 minutes, the data receiver script writes the data from files to database tables. Three tables are needed for the same: one each for processor and memory utilization, Disk utilization, and network utilization. The structure of these 3 tables shall be designed as follows:

A table for processor and memory utilization data with the following columns is recommended:

  • Serial number of the row
  • Server serial number
  • Host name
  • CPU percent idle
  • CPU percent used
  • Total available memory
  • Memory free
  • Memory used
  • Memory percent free
  • Memory percent used
  • Time stamp

A table for disk utilization data with the following columns is recommended:

  • Serial number of the row
  • Server serial number
  • Mount points
  • percent utilization
  • Time stamp

A table for network utilization data with the following columns is recommended:

  • Serial number of the row
  • Server serial number
  • Interface name
  • Interface type
  • percent utilization
  • Time stamp

Understanding the monitoring data extraction script to be run on the data collectors

Refer to the following PHP code to monitor data extraction. Comments are added to make the code self-explanatory.

/*Code Goes Here*/
<?php
$record_time= date("Y-m-dXH:i:s", time());
$record_time1=date("Y-m-d-H-i-s",time());

/*Include the DB connection code script*/
include '/var/www/html/consqlp.php';
$ssql="select id, name,function from <discovery-or-inventory-table> where platform='x'";
$result=$conn->query($ssql);
$count=0;
$myf="mon-cpumem-$record_time1";
$myfile="$myf.csv";
$heading="server_id,server_name,current_server_datetime,start_time,cpu_percent_idle,cpu_percent_used,mem_total,mem_free,mem_used,mem_cached,mem_percent_free,mem_percent_available,mem_percent_used/n";
$fh=fopen($myfile,"w"); 
fwrite($fh,"");
fclose($fh);
$myfd="mon-disk-$record_time1";
$myfiled="$myfd.csv";
$fhd=fopen($myfiled,"w");
fwrite($fhd,"");
fclose($fhd);
$myfn="mon-network-$record_time1";
$myfilen="$myfn.csv";
$fhn=fopen($myfilen,"w");
fwrite($fhn,"");
fclose($fhn);

/* For each server from the "To be monitored table", fetch CPU, memory, and network  utilization and write them to three .csv files */ 
foreach ($result as $row) {
		$server_id=$row['id'];
 		$server_name=trim(strtolower(($row['name'])));
 		$server_type=$row['function'];
 		if (isset($server_name)) {
 				$server_name=$server_name;
 				}
 		if($server_name == "" ) {
 				exit;
 				}
 		$server_name=strtolower($server_name);
 		$up = ping($server_name);
 		if($up == NULL) {
 				continue; 
 				} 
 		$cmdd="sudo ssh $server_name hostname -s";
 		$output = strtolower(trim(shell_exec($cmdd)));
 		if ($output<>$server_name) {
 				exit;
 				}

        #Get current time on server so same unit can be used for all entries in database 
	    $cmdd="sudo ssh $server_name 'date ".'"'."+%Y-%m-%d %H:%M:%S".'"'."'";
 		$current_server_datetime = trim(shell_exec($cmdd));
 		$time1 = strtotime($current_server_datetime);
 		$time1 = $time1 - (20 * 60);
 		$date1 = date("Y-m-d H:i:s", $time1);
 		$date1_split=explode(' ',$date1);
 		$start_time=array_pop($date1_split);

 		//Get CPU and Memory data, call relevant function
 		$cpu_percent_idle=getCpuUsage($server_name,$start_time);
 		$cpu_percent_used= 100 - intval($cpu_percent_idle);
 		$memory_usage=getMemoryUsage($server_name,$start_time);

 	    //Format CPU and memory data and write to a file, call relevant functions
 		$memory_details=explode(",",$memory_usage);
 		$mem_free=intval($memory_details[0]);
 		$mem_used=intval($memory_details[1]);
 		$mem_cached=intval($memory_details[2]);
 		$mem_total=$mem_free+$mem_used;
 		$mem_percent_free=($mem_free/$mem_total)*100;
 	$mem_percent_available=(($mem_free+$mem_cached)/ $mem_total)*100;
 		$mem_percent_used=($mem_used/$mem_total)*100;
 	$data="$server_sno,$server_name,$cpu_percent_idle,$cpu_percent_used,$mem_total,$mem_free,$mem_used,$mem_percent_free,$mem_percent_use,$current_server_datetime/n";
 		$fh1=fopen($myfile,"a");
 		fwrite($fh1,$data);
 		fclose($fh1);
 
 		//Get disk utilization data, call relevant function
 		getDiskUsage($server_name,$start_time,$server_id,$myfiled); 
 
 		//Get network utilization data, call relevant function
 		$storageNetwork=getStorageNetwork($server_name);
 	    getNetworkUtlization($server_name,$server_id,$storageNetwork,$start_time,'data',	$myfilen);
 	    getBridgeUtilizations($server_name,$start_time,$server_id,$myfilen);
  		$count++;
  		}

/*Zip the text file created for CPU & Memory utilization data*/
$zip = new ZipArchive;
$res = $zip->open("$myf.zip", ZipArchive::CREATE);
if ($res === TRUE) {
		$zip->addFile($myfile);
		$zip->close();
		} else {
 		echo 'zip-failed';
		}

/*Zip the text file created for disk utilization data*/
$zipd = new ZipArchive;
$resd = $zipd->open("$myfd.zip", ZipArchive::CREATE);
if ($resd === TRUE) {
		$zipd->addFile($myfiled);
		$zipd->close();
		} else {
 		echo 'zip-failed';
		}

/*Zip the text file created for network utilization data*/
$zipn = new ZipArchive;
$resn = $zipn->open("$myfn.zip", ZipArchive::CREATE);
if ($resn === TRUE) {
		$zipn->addFile($myfilen);
		$zipn->close();
		} else {
 		echo 'zip-failed';
		}

/*copy the zip files to the receiver and delete them*/
$remote_file_url = "<Receiver-address>:/var/www/html/monitoring-data/cpumem";
$remote_file_urld = "<Receiver-address>:/var/www/html/monitoring-data/disk";
$remote_file_urln = "<Receiver-address>:/var/www/html/monitoring-data/network";
#New file name and path for this file
$local_file = "/root/$myf.zip";
$local_filed = "/root/$myfd.zip";
$local_filen = "/root/$myfn.zip";
#Copy the CPU & Mem utilization file from data collector to receiver 
$copy = "scp $local_file $remote_file_url";
$c=1;
a:
$copystatus=shell_exec("$copy 2>&1"); 
#echo "Here is - $copystatus";
#If not copied try after 5 minutes
if($copystatus!="") {
		echo "SCP ERROR:Have to be retried/n";
		$c++;
		if ($c<=3) {
 			sleep(300);
 			goto a;
 			} 
		$cmdmove1="mv $myf.zip /var/www/html/monitoring-data/Failed/cpumem/";
 		shell_exec($cmdmove1);
		}
#Copy the disk utilization file from data collector to receiver 
$copyd = "scp $local_filed $remote_file_urld";
$c1=1;
b:
$copystatusd=shell_exec("$copyd 2>&1");
#echo "Here is - $copystatus";
#If not copied try after 5 minutes
if($copystatusd!="") {
		echo "SCP ERROR:Have to be retried/n";
		$c1++;
		if ($c1<=3) {
 			sleep(300);
 			goto b;
 			}
		$cmdmove1d="mv $myfd.zip /var/www/html/monitoring-data/Failed/disk/";
		shell_exec($cmdmove1d);
 		}
#Copy the network utilization file from data collector to receiver 
$copyn = "scp $local_filen $remote_file_urln";
$c1=1;
c:
$copystatusn=shell_exec("$copyd 2>&1");
if($copystatusn!="") {
		echo "SCP ERROR:Have to be retried/n";
		$c1++;
		if ($c1<=3) {
			sleep(300);
			goto c;
 			}
		$cmdmove1n="mv $myfn.zip /var/www/html/monitoring-data/Failed/network/";
		shell_exec($cmdmove1d);
		}

/*delete the zip file(s) once copied to the receiver*/
$cmddel="rm -rf $myf.zip";
shell_exec($cmddel);
$cmddeld="rm -rf $myfd.zip";
shell_exec($cmddeld);
$cmddeln="rm -rf $myfn.zip";
shell_exec($cmddeln);

/*Function definitions*/
// 1) SAR (System Activity Report) command in Linux is used to collect,report & save CPU,Memory, Network utilization
// 2) Here, the script connects to target systems via ssh and executes SAR command
//Function to connect to target system and extract CPU utilization data

function getCpuUsage($server_name,$start_time) {
#This 1-line code will successfully connect to target systems one by one without password and execute commands
	$cmdd="sudo ssh $server_name sar -p -s $start_time | grep Average | tail -n1"; $output = 	trim(shell_exec($cmdd));
	if($output=="") {
		exit;
 		}
	$result_array = explode(' ', $output);
	$cpu_usage_percent = intval(array_pop($result_array)); 
	return($cpu_usage_percent);
	} 

//Function to connect to target system and extract Memory utilization data
function getMemoryUsage($server_name,$start_time) {
	$cmdd="sudo ssh $server_name sar -r -s $start_time | grep Average | tail -n1";
	$output = removeExtraSpaces(trim(shell_exec($cmdd)));
	$result_array = explode(' ', $output);
	return("$result_array[1],$result_array[2],$result_array[5]"); 
	}

//Function to connect to target system and extract disk utilization data, format (mount-point wise) data and write to a .csv file
function getDiskUsage($server_name,$start_time,$server_id,$myfiled) { 
	global $current_server_datetime;
	global $record_time;
	global $record_time1;
	$cmdd="sudo ssh $server_name sar -d -s $start_time | grep Average";
	$output = (trim(shell_exec($cmdd)));
	$disk_array=explode("/n",$output);
	foreach ($disk_array as $disk_line) {
		$disk_line=trim(str_replace("Average:"," ",$disk_line));
		$disk_line=removeExtraSpaces($disk_line);
		$disk_line_values=explode(' ',$disk_line);
		$mount-point=$disk_line_values[0];
		$percent_utilization=intval($disk_line_values[8]);
	$datad="$server_sno,$mount-point,$percent_utilization,$current_server_datetime/n";
		$fh1d=fopen($myfiled,"a");
		fwrite($fh1d,$datad);
		fclose($fh1d); 
		}
}

//Function to connect to target system and extract network utilization data, format (interface wise) data and write to a .csv file
function getNetworkUtlization($server_name,$server_id,$interface_type,$start_time,$interface_name,$myfilen) {
	global $conn;
	global $current_server_datetime;
	global $record_time;
	$cmdd="sudo ssh $server_name sar -n DEV -s $start_time | grep $network_type | grep 	Average | tail -n1";
	$output = shell_exec($cmdd);
	$result_array=explode(" ",$output);
	$storage_network_utilization=trim(end($result_array));
	if ($storage_network_utilization<>"") {
		if (intval($storage_network_utilization) > 100) {
		$storage_network_utilization=100;
		}
	$datan="$server_sno,$interface_name,$interface_type,$storage_network_utilization,	$record_time/n";
	$fh1n=fopen($myfilen,"a");
	fwrite($fh1n,$datan);
	fclose($fh1n); 
	}
}

function getBridgeUtilizations($server_name,$start_time,$server_id,$myfilen) {
	global $conn;
	global $current_server_datetime;
	global $record_time;
	$bridge_array=explode("/n",getBridgeNames($server_name));
	$grep_string="";
	foreach ($bridge_array as $bridge_name) {
		$bridge_name=trim(str_replace("auto ","",$bridge_name));
		$grep_string.= "$bridge_name ".'/|';
		}
	$grep_string=trim(substr($grep_string,0,strlen($grep_string)-2));
	$result="";
	if ($grep_string <> "") {
	$cmdd="sudo ssh $server_name sar -n DEV -s $start_time | grep Average | grep '$grep_string' ";
		$output = trim(shell_exec($cmdd));
		$output=removeExtraSpaces($output);
		$output= strtolower($output);
		$bridges_all_utilization_array=explode("/n",trim($output));
foreach ($bridges_all_utilization_array as $bridge_each_utilization_array) {
$bridge_each_utilization_array=trim(str_replace("average:","",  $bridge_each_utilization_array));
$bridge_each_utilization_details=explode(" ",$bridge_each_utilization_array); $bridge_each_utilization_details_array=explode(".",current($bridge_each_utilization_details));
$bridge_name = trim(end($bridge_each_utilization_details_array));
		$vlan_id=str_replace('br','',$bridge_name);
$bridge_usage = trim(end($bridge_each_utilization_details));$datan1="$server_id,$bridge_name,bridge,$vlan_id,$bridge_usage,$record_time/n";
$fh2n=fopen($myfilen,"a");
fwrite($fh2n,$datan1);
fclose($fh2n);
			}
		}
	}

function getBridgeNames($server_name) {
	$cmdd="sudo ssh $server_name cat /etc/network/interfaces | grep auto | grep '/.'";
	$output = shell_exec($cmdd);
	return(trim($output));
	}

function getStorageNetwork($server_name) {
	$storage_net="";
$cmdd="sudo ssh $server_name ip a | grep '10.0.0/|10.0.2/|10.0.4/|10.0.5'";
	$output = shell_exec("$cmdd");
	$output_array=explode(" ",$output);
	$storage_net = trim(end($output_array));
	return($storage_net);
	}

//Function called while formatting to remove extra black spaces from the extracted data before writing it to the file
function removeExtraSpaces($my_string) {
	$my_string=str_replace(" "," ",$my_string);
	$my_string=str_replace(" "," ",$my_string);
	$my_string=str_replace(" "," ",$my_string);
	return($my_string);
	}

//Function called to check if all servers in the "to be monitored" data table are responding for ping request if its source is not the discovery solution
function ping($host) {
exec(sprintf('ping -c 1 -W 5 %s', escapeshellarg($host)), $res, $rval);
	return $rval === 0;
	}

function appendfile($file,$strline) {
	file_put_contents($file, $strline, FILE_APPEND );
	}

function convert_seconds($uptime) {
	$dt1 = new DateTime("@0");
	$dt2 = new DateTime("@$uptime");
return $dt1->diff($dt2)->format('%a days-%h hours-%i minutes-%s seconds');
	}
exit;
?>

So now three files will be created as listed below (at one 15 min interval):

  • mon-cpumem-<current-timestamp>.zip
  • mon-disk-<current-timestamp>.zip
  • mon-network-<current-timestamp>.zip

Understanding the monitoring data receiver script to be run on the receiver

Refer to the following PHP code for processing the received monitoring data at the receiver node. Comments are added to make the code self-explanatory. One instance of the script must run for each type of .csv file received (for example, three instances of the script have to be run as cron job, one for each of the three types of .csv files received: cpumem ,'disk , and network).

/*Code Goes Here*/


<?php

/*Include the DB connection code script*/
include '/var/www/html/info/consqlp.php';
$findfile="find /var/www/html/monitoring-data/cpumem/ -cmin +0 -cmin -15 -name *.zip";
$latestzipfile=shell_exec($findfile);
$split=explode("\n",$latestzipfile);
for ($i=0;$i<count($split)-1;$i++) {
	$temp=$split[$i];
	if($temp == "" ) {
		exit;
 		}
	$namesplit=explode('/',$temp);
	$fullfilename=array_pop($namesplit);
	$onlyfilename=substr($fullfilename,0,strpos($fullfilename,'.'));

/*Unzip the received file, process the .csv file and write to table. Here the script handles only CPU and Memory utilization data*/
	$zip = new ZipArchive;
$zip1=$zip->open("/var/www/html/monitoring-data/cpumem/$onlyfilename.zip");
	var_dump($zip1);
	if ($zip1 === true) {
		$zip->extractTo('/var/www/html/monitoring-data/cpumem/');
		$zip->close();
  		//Delete the .zip file after unzipping it
unlink("/var/www/html/monitoring-data/cpumem/$onlyfilename.zip");
 		} else {
 		echo "failed\n";
 		}

//Read the file and write the contents to the CPU & Memory utilization data table
$handle = fopen("/var/www/html/monitoring-data/cpumem/$onlyfilename.csv", "r");
	// loop over rows in data file
	while (($data = fgetcsv($handle,",")) !== FALSE) {
		// trim the first column
		$data[0] = trim($data[0]);
$SQL="insert into <cpumem-utilization-data-table>(ServerSNo,server_name,cpu_percent_idle,cpu_percent_used,mem_total,mem_free,mem_used,mem_percent_free,mem_percent_used,timestamp)values('$data[0]','$data[1]','$data[4]','$data[5]','$data[6]','$data[7]','$data[8]','$data[10]','$data[12],'$data[2]')";
// insert into database
$result=$conn->query($SQL);
}
fclose($handle);
}
?>

DB connection code for MariaDB or MySQLRefer to the following PHP code to create DB connections:

/*Code Goes Here*/

<?php
$servername = "<Receiver-address>";
$username = "<DB-hostname>";
$password = "<DB-password>";
$db_database ="<DB-name>";
try {
$conn = new PDO("mysql:host=$servername;dbname=$db_database", $username, $password);
	// set the PDO error mode to exception
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
	}
catch(PDOException $e) {
	echo "Connection failed: " . $e->getMessage();
	}
?>

Conclusion

In this article, you have learned how to plan and build a custom server utilization solution for monitoring your environment. You can refer this architecture and create another or modify it to your business requirements. You can also use the given code to perform basic data collection and customize it further as needed.

References


Downloadable resources


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=1046852
ArticleTitle=Continuous monitoring of server resources utilization using custom scripts
publish-date=06212017