Deploying Telegraf plug-in to monitor cloud and virtualization clusters

Monitoring the cloud platforms is critical to organizations. In Telco Network Cloud Manager - Performance, Cloud Monitoring Technology Packs are introduced. These packs collect performance metrics across the cloud system that can be visualized in the built-in dashboards.

Cluster setup

You require three environments:
  • Cluster where Telco Network Cloud Manager - Performance is installed. It is where the Monitoring Technology Pack is installed.
  • Server where the Telegraf plug-in is installed that can be referred to as the agent environment. This server must be outside the managed cluster that is being monitored.
  • Kubernetes cluster that you want to monitor and collect performance metrics. These metrics are collected and sent to Telco Network Cloud Manager - Performance database for visualization. This cluster can be referred to as the managed cluster.

Telegraf Set up tasks

Before you begin

  • You must install the Telegraf agent and external Kafka in the same network as the managed environment.
  • Get the Telegraf configuration files from the Technology Packs.

    Packs are extracted at /installers/core folder. It is referred to as <DIST_DIR>.

    The M06VTML.tar.gz bundle has the following technology packs:
    • cloud-kubernetes-1.8.0.jar
      Note: You need this Technology pack to monitor the performance of a Kubernetes cluster.
      You can see the following files in the /plugin folder:
      • remote_monitoring.yaml
      • telegraf.conf
      • telegraf_linux_amd64
    • cloud-vmware-vsphere-1.1.0.jar
      Note: You need this technology pack to monitor the performance of a VMware cluster.

Configure the managed Kubernetes cluster to communicate with Telegraf agent

Follow these steps on the Kubernetes cluster that you are trying to monitor:
  1. Copy the remote_monitoring.yaml file from the <DIST_DIR> where you extracted the cloud-kubernetes-1.8.0.jar file.

    Custom Resource Definition that can create the namespace, serviceaccount, ClusterRole, and ClusterRoleBinding objects for the managed cloud cluster. These objects are needed to connect the Telegraf agent with the managed cloud cluster.

  2. Run the following command to apply the Custom Resource definition:
    kubectl apply -f remote_monitoring.yaml
    The following objects are created:
    • Namespace – remote-telegraf-ns
    • Service account – remote-telegraf-account
    • ClusterRole – remote-telegraf-roles
    • ClusterRoleBindings – remote-telegraf-rolebind , remote-telegraf-scc-rolebind, remote-telegraf-kubelet-rolebind
  3. Verify that the token is generated under the remote-telegraf-ns namespace with the following command:
    kubectl get secret -n remote-telegraf-ns

    A token is generated. For example, remote-telegraf-account-token_<value>.

  4. Copy and use the token that is generated in the previous step to get the secret with the following command:
    kubectl describe secret remote-telegraf-account-token-q9qq4  -n  remote-telegraf-ns
    Note: Token is used as the bearer_token_string during the configuration of Telegraf.
  5. To monitor etcd component, copy the etcd certificates and key from this managed Kubernetes environment to system where the Telegraf agent is installed. The certificates (apiserver-etcd-client.crt and apiserver-etcd-client.key) are usually at /etc/kubernetes/pki.
  6. Run the following commands to get the managed cluster details.
    To get apiserver URL, run the kubectl cluster-info command.
    Note: The apiserver URL is needed in both Kubernetes plugin and kube_inventory plugin during Telegraf setup.

    To get nodeIP, run the kubectl get nodes -o wide command.

  7. To get nodePorts, run the kubectl cluster-info dump > dump.txt command.
    Search for daemonEndpoints in the dump.txt file. For each node, one daemonEndpoints block is available, which contain the port of kubelet.deamonEndpoints
    Note: Apiserver URL, NodeIP, and nodePorts are used to configure the Kubernetes plug-in of Telegraf agent.
  8. To get the apiserver_url, scheduler_url, and controller_url, follow these steps:
    • In the managed Kubernetes cluster, go to /etc/kubernetes/manifests and locate the following files:
      • etcd.yaml

        The default port for etcd is 2379.

      • kube-apiserver.yaml

        The default port for api-sever is 6443.

      • kube-scheduler.yaml

        The default port for scheduler is 10259.

      • kube_controller-manager.yaml

        The default port for controller is 10257.

    • Get the IP address and port details from the yaml files.
  9. Go to agent environment and check if the apiserver, scheduler, and controller IP addresses are accessible by using this command:
    telnet <masterNode_IP> <port>

Configure the Telegraf agent plug-ins

Configure the Kubernetes, kube_inventory, and kube_admin plug-ins for your cluster monitoring in agent environment.

  1. Copy the /plugin/telegraf.conf and /plugin/telegraf_linux_amd64 files from the Technology Pack to the agent environment to a location of your choice. For example, /opt/<remote_monitor_setup>.

    The telegraf.conf file has different input blocks, and each block represents one input plug-in.

  2. • Get the Kubernetes nodes by using following command on the managed Kubernetes system:
    kubectl get nodes -o wide
  3. Configure the telegraf.conf file to enter the following values:
    Block Values
    [[inputs.kubernetes]]
    kubelet_url = "https://10.10.10.10:10250"
    apiserver_url = "https://10.10.10.10:6443"
    bearer_token_string = "aaabbbccc"
    insecure_skip_verify = true
    Note: The number of inputs.kubernetes blocks depends on the number of nodes you have in your managed cluster.

    For more information, see https://github.com/influxdata/telegraf/tree/master/plugins/inputs/kubernetes

    [inputs.kube_inventory]]
    [[inputs.kube_inventory]]
    insecure_skip_verify = true
    namespace = ""
    url = "https://10.10.10.10:6443"
    bearer_token_string = "aaabbbccc"
    You can get the list of nodes in your managed cluster with the following command:
    kubernetes get nodes -o wide
    Note: Every managed cluster must have one [inputs.kube_inventory]] block.

    For more information, see https://github.com/influxdata/telegraf/tree/master/plugins/inputs/kube_inventory

    [[inputs.kube_admin]]
    apiserver_urls = [ "https://10.10.10.10:6443" ]   # list of all API-Server urlspresent (separated by semicolon )in your cluster setup.
    insecure_skip_verify = true
    bearer_token_string = "aaabbbccc"
    controller_urls = [ "https://10.10.10.10:10257" ] # list of all controller_urls (separated by semicoln )present in your cluster setup.
    scheduler_urls = [ "https://10.10.10.10:10259" ]  # list of all scheduler_urls (separated by semicolon ) present in your cluster setup
    #	#etcd_urls = [ "https://10.10.10.10:2379" ]
    #	# path of certificates stored in agent environment
    #	#tls_cert = "D:/__TELEGRAPH/CODEBASE/etcd_certs/apiserver-etcd-client.crt"
    #	#tls_key = "D:/__TELEGRAPH/CODEBASE/etcd_certs/apiserver-etcd-client.key"
    	
    Note: Every managed cluster must have one [[inputs.kube_admin]] block.
    [[inputs.vsphere]]

    It is needed to configure the VMWare(vsphere) plug-in.

    ## List of vCenter URLs to be monitored.
    vcenters = [ "https://10.10.100.10" ]
    username = "abc" #"ibm_user"
    password = "abc@123" #"abc@123"
    insecure_skip_verify = true
    datastore_instances = true
    Note: This block must be configured for vShpere Client.

    For more information, see https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere.

Set up Kafka on the agent environment

Install Kafka
Use the following steps to install Kafka:
  1. Install Java™ to run Apache Kafka without any errors.
    # yum -y install java-1.8.0-openjdk
    # java -version
  2. Download the most recent stable version of Apache Kafka from the official website or use the following wget command to download it directly and extract it.
    wget https://mirrors.estointernet.in/apache/kafka/2.7.0/kafka_2.13-2.7.0.tgz 
    tar -xzf kafka_2.13-2.7.0.tgz
    Note: If the wget command fails, use
    wget https://archive.apache.org/dist/kafka/2.8.1/kafka-2.8.1-src.tgz
  3. Create a symbolic link for Kafka package, then add Kafka environment path to .bash_profile file and then initialize it as shown.
    ln -s kafka_2.13-2.7.0 kafka
    # echo "export PATH=$PATH:/root/kafka_2.13-2.7.0/bin" >> ~/.bash_profile
    # source ~/.bash_profile
  4. Start the Zookeeper, which comes built-in with the Kafka package. Since it is a single node cluster, you can start the zookeeper with default properties.
    zookeeper-server-start.sh -daemon /root/kafka/config/zookeeper.properties
  5. Telnet to Zookeeper port at 2181 to validate whether the zookeeper is accessible or not by telnet to Zookeeper port 2181.
    telnet localhost 2181
  6. Create a topic.
    kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic <topic_name>
  7. Verify that the topic is created.
    kafka-topics.sh --zookeeper localhost:2181 --list
For more information, see https://kafka.apache.org/quickstart.
Configure Kafka output plug-in
Add following block in the telegraf.conf file to send the metrics to Kafka server.
[[outputs.kafka]]
      brokers = ["10.46.43.195:9093"]   # Port of Kafka broker
      topic = "minikube"   # kafka topic
For more information, see https://github.com/influxdata/telegraf/tree/master/plugins/outputs/kafka.

Telegraf agent maintenance

Start the Telegraf agent
  • After the configuration of the input plug-ins and the Kafka output plug-in is complete, start the Telegraf agent by using the following command:
    ./telegraf_linux_amd64    -config    ./telegraf.conf
  • If user want to run the Telegraf agent as background service, then go to the location /etc/systemd/system/ and create telegraf.service with the following content:
    [Unit]
    Description=Telegraf Service
    [Service]
    Type=simple
    Restart=always
    RestartSec=1
    User=root
    ExecStart=/opt/<remote_monitor_setup>/telegraf_linux_amd64 -config /opt/<remote_monitor_setup>/telegraf.conf
    [Install]
    WantedBy=multi-user.target
    
  • Run the following commands to start and stop the Telegraf Service:
    systemctl start telegraf
    systemctl stop telegraf
    systemctl status telegraf
Clean up the Telegraf agent
If you need to clean up the Telegraf agent for some reason, delete the binary files and configuration files from the agent environment.
Note: If you created the background service, delete the /etc/systemd/system/telegraf.service file.

Troubleshooting

If you notice issues to connect to the managed environment, and you are unable to telnet to the master node, follow these steps on all the master nodes in your cluster:
  1. Open the /etc/kubernetes/manifests/kube-scheduler.yaml file, modify the following lines:
    • Clear the line (spec->containers->command) containing this phrase: - --port=0
    • Change the - --bind-address=127.0.0.1 to - --bind-address=masterNodeIP
    • Change the host to masterNodeIP and port to 10259 under livenessProbe and startupProbe.
  2. Open the /etc/kubernetes/manifests/kube-controller-manager.yaml file, update the following lines:
    • Clear the line (spec->containers->command) containing this phrase: - --port=0
    • Change the - --bind-address=127.0.0.1 to - --bind-address= masterNodeIP
    • Change the host to masterNodeIP and port to 10257 under livenessProbe and startupProbe.
    • Restart the kubelet Service with the following command:
      sudo systemctl restart kubelet.service
    • Verify that you are now able to connect to the managed environment.
      telnet <masterNodeIP> <port>