Deploying Telegraf plug-in to monitor cloud and virtualization clusters
Monitoring the cloud platforms is critical to organizations. In Telco Network Cloud Manager - Performance, Cloud Monitoring Technology Packs are introduced. These packs collect performance metrics across the cloud system that can be visualized in the built-in dashboards.
Cluster setup
- Cluster where Telco Network Cloud Manager - Performance is installed. It is where the Monitoring Technology Pack is installed.
- Server where the Telegraf plug-in is installed that can be referred to as the agent environment. This server must be outside the managed cluster that is being monitored.
- Kubernetes cluster that you want to monitor and collect performance metrics. These metrics are collected and sent to Telco Network Cloud Manager - Performance database for visualization. This cluster can be referred to as the managed cluster.
Telegraf Set up tasks
Before you begin
- You must install the Telegraf agent and external Kafka in the same network as the managed environment.
- Get the Telegraf configuration files from the Technology Packs.
Packs are extracted at /installers/core folder. It is referred to as <DIST_DIR>.
The M06VTML.tar.gz bundle has the following technology packs:- cloud-kubernetes-1.8.0.jarNote: You need this Technology pack to monitor the performance of a Kubernetes cluster.You can see the following files in the /plugin folder:
- remote_monitoring.yaml
- telegraf.conf
- telegraf_linux_amd64
- cloud-vmware-vsphere-1.1.0.jarNote: You need this technology pack to monitor the performance of a VMware cluster.
- cloud-kubernetes-1.8.0.jar
Configure the managed Kubernetes cluster to communicate with Telegraf agent
- Copy the remote_monitoring.yaml file from the
<DIST_DIR> where you extracted the
cloud-kubernetes-1.8.0.jar file.
Custom Resource Definition that can create the
namespace
,serviceaccount
,ClusterRole
, andClusterRoleBinding
objects for the managed cloud cluster. These objects are needed to connect the Telegraf agent with the managed cloud cluster. - Run the following command to apply the Custom Resource
definition:
kubectl apply -f remote_monitoring.yaml
The following objects are created:- Namespace –
remote-telegraf-ns
- Service account –
remote-telegraf-account
- ClusterRole –
remote-telegraf-roles
- ClusterRoleBindings –
remote-telegraf-rolebind
,remote-telegraf-scc-rolebind
,remote-telegraf-kubelet-rolebind
- Namespace –
- Verify that the token is generated under the
remote-telegraf-ns
namespace with the following command:kubectl get secret -n remote-telegraf-ns
A token is generated. For example,
remote-telegraf-account-token_<value>
. - Copy and use the token that is generated in the previous step to get the secret with the
following
command:
kubectl describe secret remote-telegraf-account-token-q9qq4 -n remote-telegraf-ns
Note: Token is used as thebearer_token_string
during the configuration of Telegraf. - To monitor
etcd
component, copy theetcd
certificates and key from this managed Kubernetes environment to system where the Telegraf agent is installed. The certificates (apiserver-etcd-client.crt
andapiserver-etcd-client.key
) are usually at /etc/kubernetes/pki. - Run the following commands to get the managed cluster details.To get
apiserver
URL, run thekubectl cluster-info
command.Note: Theapiserver
URL is needed in bothKubernetes plugin
andkube_inventory plugin
during Telegraf setup.To get
nodeIP
, run thekubectl get nodes -o wide
command. - To get
nodePorts
, run thekubectl cluster-info dump > dump.txt
command.Search fordaemonEndpoints
in the dump.txt file. For each node, one daemonEndpoints block is available, which contain the port of kubelet.Note: Apiserver URL, NodeIP, and nodePorts are used to configure the Kubernetes plug-in of Telegraf agent. - To get the
apiserver_url
,scheduler_url
, andcontroller_url
, follow these steps:- In the managed Kubernetes cluster, go to /etc/kubernetes/manifests and
locate the following files:
- etcd.yaml
The default port for
etcd
is 2379. - kube-apiserver.yaml
The default port for
api-sever
is 6443. - kube-scheduler.yaml
The default port for
scheduler
is 10259. - kube_controller-manager.yaml
The default port for
controller
is 10257.
- etcd.yaml
- Get the IP address and port details from the yaml files.
- In the managed Kubernetes cluster, go to /etc/kubernetes/manifests and
locate the following files:
- Go to agent environment and check if the apiserver, scheduler, and controller IP addresses are
accessible by using this command:
telnet <masterNode_IP> <port>
Configure the Telegraf agent plug-ins
Configure the Kubernetes, kube_inventory, and kube_admin plug-ins for your cluster monitoring in agent environment.
- Copy the /plugin/telegraf.conf and
/plugin/telegraf_linux_amd64 files from the Technology Pack to the agent
environment to a location of your choice. For example,
/opt/<remote_monitor_setup>.
The telegraf.conf file has different input blocks, and each block represents one input plug-in.
- • Get the Kubernetes nodes by using following command on the managed Kubernetes
system:
kubectl get nodes -o wide
- Configure the telegraf.conf file to enter the following values:
Block Values [[inputs.kubernetes]]
kubelet_url = "https://10.10.10.10:10250" apiserver_url = "https://10.10.10.10:6443" bearer_token_string = "aaabbbccc" insecure_skip_verify = true
Note: The number ofinputs.kubernetes
blocks depends on the number of nodes you have in your managed cluster.For more information, see https://github.com/influxdata/telegraf/tree/master/plugins/inputs/kubernetes
[inputs.kube_inventory]]
[[inputs.kube_inventory]] insecure_skip_verify = true namespace = "" url = "https://10.10.10.10:6443" bearer_token_string = "aaabbbccc"
You can get the list of nodes in your managed cluster with the following command:kubernetes get nodes -o wide
Note: Every managed cluster must have one[inputs.kube_inventory]]
block.For more information, see https://github.com/influxdata/telegraf/tree/master/plugins/inputs/kube_inventory
[[inputs.kube_admin]]
apiserver_urls = [ "https://10.10.10.10:6443" ] # list of all API-Server urlspresent (separated by semicolon )in your cluster setup. insecure_skip_verify = true bearer_token_string = "aaabbbccc" controller_urls = [ "https://10.10.10.10:10257" ] # list of all controller_urls (separated by semicoln )present in your cluster setup. scheduler_urls = [ "https://10.10.10.10:10259" ] # list of all scheduler_urls (separated by semicolon ) present in your cluster setup # #etcd_urls = [ "https://10.10.10.10:2379" ] # # path of certificates stored in agent environment # #tls_cert = "D:/__TELEGRAPH/CODEBASE/etcd_certs/apiserver-etcd-client.crt" # #tls_key = "D:/__TELEGRAPH/CODEBASE/etcd_certs/apiserver-etcd-client.key"
Note: Every managed cluster must have one[[inputs.kube_admin]]
block.[[inputs.vsphere]]
It is needed to configure the VMWare(vsphere) plug-in.
## List of vCenter URLs to be monitored. vcenters = [ "https://10.10.100.10" ] username = "abc" #"ibm_user" password = "abc@123" #"abc@123" insecure_skip_verify = true datastore_instances = true
Note: This block must be configured for vShpere Client.For more information, see https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere.
Set up Kafka on the agent environment
- Install Kafka
-
Use the following steps to install Kafka:
- Install Java™ to run Apache Kafka without any errors.
# yum -y install java-1.8.0-openjdk # java -version
- Download the most recent stable version of Apache Kafka from the official website or use the
following
wget
command to download it directly and extract it.wget https://mirrors.estointernet.in/apache/kafka/2.7.0/kafka_2.13-2.7.0.tgz tar -xzf kafka_2.13-2.7.0.tgz
Note: If thewget
command fails, usewget https://archive.apache.org/dist/kafka/2.8.1/kafka-2.8.1-src.tgz
- Create a symbolic link for Kafka package, then add Kafka environment path to
.bash_profile
file and then initialize it as shown.ln -s kafka_2.13-2.7.0 kafka # echo "export PATH=$PATH:/root/kafka_2.13-2.7.0/bin" >> ~/.bash_profile # source ~/.bash_profile
- Start the Zookeeper, which comes built-in with the Kafka package. Since it is a single node
cluster, you can start the zookeeper with default
properties.
zookeeper-server-start.sh -daemon /root/kafka/config/zookeeper.properties
- Telnet to Zookeeper port at 2181 to validate whether the zookeeper is accessible or not by
telnet to Zookeeper port 2181.
telnet localhost 2181
- Create a
topic.
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic <topic_name>
- Verify that the topic is
created.
kafka-topics.sh --zookeeper localhost:2181 --list
- Install Java™ to run Apache Kafka without any errors.
- Configure Kafka output plug-in
-
Add following block in the telegraf.conf file to send the metrics to Kafka server.
For more information, see https://github.com/influxdata/telegraf/tree/master/plugins/outputs/kafka.[[outputs.kafka]] brokers = ["10.46.43.195:9093"] # Port of Kafka broker topic = "minikube" # kafka topic
Telegraf agent maintenance
- Start the Telegraf agent
-
-
After the configuration of the input plug-ins and the Kafka output plug-in is complete, start the Telegraf agent by using the following command:
./telegraf_linux_amd64 -config ./telegraf.conf
-
If user want to run the Telegraf agent as background service, then go to the location /etc/systemd/system/ and create telegraf.service with the following content:
[Unit] Description=Telegraf Service [Service] Type=simple Restart=always RestartSec=1 User=root ExecStart=/opt/<remote_monitor_setup>/telegraf_linux_amd64 -config /opt/<remote_monitor_setup>/telegraf.conf [Install] WantedBy=multi-user.target
- Run the following commands to start and stop the Telegraf
Service:
systemctl start telegraf systemctl stop telegraf systemctl status telegraf
-
- Clean up the Telegraf agent
-
If you need to clean up the Telegraf agent for some reason, delete the binary files and configuration files from the agent environment.Note: If you created the background service, delete the /etc/systemd/system/telegraf.service file.
Troubleshooting
- Open the /etc/kubernetes/manifests/kube-scheduler.yaml file, modify the
following lines:
- Clear the line (spec->containers->command) containing this phrase:
- --port=0
- Change the
- --bind-address=127.0.0.1
to- --bind-address=masterNodeIP
- Change the host to
masterNodeIP
and port to 10259 underlivenessProbe
andstartupProbe
.
- Clear the line (spec->containers->command) containing this phrase:
- Open the /etc/kubernetes/manifests/kube-controller-manager.yaml file,
update the following lines:
- Clear the line (spec->containers->command) containing this phrase:
- --port=0
- Change the
- --bind-address=127.0.0.1
to- --bind-address= masterNodeIP
- Change the host to
masterNodeIP
and port to 10257 underlivenessProbe
andstartupProbe
. - Restart the
kubelet
Service with the following command:sudo systemctl restart kubelet.service
- Verify that you are now able to connect to the managed
environment.
telnet <masterNodeIP> <port>
- Clear the line (spec->containers->command) containing this phrase: