SD-WAN Viptela Collector Use Cases Guide

About

This document provides some useful use-cases and troubleshooting details for SD-WAN Viptela collector.

Important: Please do not run sevone-cli command from a subdirectory under /opt/SevOne/upgrade and /var/log/pods. It can be run from any directory except for from subdirectories under /opt/SevOne/upgrade and /var/log/pods.

Use-Cases

Note: Use-Cases are listed in alphabetical order.

Change Hostname

Important: Please make sure to set the hostname for all k3s nodes in lowercase when deploying or upgrading the collector.

Teardown Kubernetes

To change a node's hostname, you must teardown your Kubernetes cluster.

$ sevone-cli cluster down

Update ansible Inventory

  1. Run the following command on every node to change their hostname.
    $ sudo hostnamectl set-hostname "sdwan-node<##>"
  2. On the control plane node, update /etc/ansible/hosts with your new hostname.
    Note: For a single-node cluster, the deployment node is local.

    Example

    [server]
    sdwan-node01 ansible_connection=local
  3. If you have agent nodes, update their hostnames as well.

    Example

    [server]
    sdwan-node01 ansible_connection=local
    
    [agent]
    sdwan-node02 ansible_user=sevone ansible_host=10.123.45.68
    sdwan-node03 ansible_user=sevone ansible_host=10.123.45.69

Provision Kubernetes

$ sevone-cli cluster up
Important: The message FAILED - RETRYING: Wait for k3s server to be up means that k3s is trying to come up and it may take a long time. If all retries are exhausted and k3s is unable to come up, the command will fail automatically. Please contact IBM SevOne Support for help.

Customize SD-WAN Collector

Using ssh, log into SD-WAN collector control plane node as sevone.

$ ssh sevone@<SD-WAN Viptela collector 'control plane' node IP address or hostname>

SD-WAN collector runs as a helm chart deployed within the Kubernetes cluster. The helm chart is configured with a base set of configuration options that can be overridden as needed. A new file, /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml, must be created to contain only the settings you want to override.

Create /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml file

$ touch /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml

Handle IP Conflicts

The following are the default IP ranges used by Solutions.

Flag Description IP Address IP Range
--cluster-cidr Pod IP addresses 192.168.80.0/20 192.168.80.0.0 - 192.168.95.255
--service-cidr Service IP addresses 192.168.96.0/20 192.168.96.0 - 192.168.111.255
--cluster-dns Cluster DNS (must be in Service's range) 192.168.96.10 n/a

Teardown Kubernetes

  1. In order to change the default IP ranges, you must teardown your Kubernetes cluster.
    $ sevone-cli cluster down
  2. Ensure that the old IP address ranges are not left behind in any of your node’s routing tables.
    $ ansible all --become -a "ip route del 192.168.96.0/24"

Adjust IP Ranges

Create a file ip_ranges.yaml in /etc/ansible/group_vars/all directory with your new IP ranges.

Example

$ echo 'k3s_cluster_cidr: "192.168.0.0/24"' >> \
/etc/ansible/group_vars/all/ip_ranges.yaml
  
$ echo 'k3s_service_cidr: "192.168.1.0/24"' >> \
/etc/ansible/group_vars/all/ip_ranges.yaml
  
$ echo 'k3s_cluster_dns: "192.168.1.10"' >> \
/etc/ansible/group_vars/all/ip_ranges.yaml

You may then redeploy or proceed with your deployment as normal.

$ sevone-cli playbook up

Deploy Collector and Augmentor on Specific Nodes

To bind the augmentor / collector to specific nodes (as flows are directly routed to worker nodes), add the following variables in your configuration file.

  • collectorService.nodeAffinity.values
  • flowAugmentorService.nodeAffinity.values

Example: To run collector on host 'sdwan-node02' and augmentor on 'sdwan-node03'

collectorService:
  nodeAffinity:
    values:
      - {hostname}

Example

collectorService:
  nodeAffinity:
    values:
      - sdwan-node02

.

flowAugmentorService:
  nodeAffinity:
    values:
      - {hostname}

Example

flowAugmentorService:
  nodeAffinity:
    values:
      - sdwan-node03
Important:
  • In a multi-node setup, the use of the master node hostname is not supported by the nodeAffinity.values.
  • When using affinity, ensure that both collectorService.nodeAffinity.values and flowAugmentorService.nodeAffinity.values are configured. If affinity is set for only the collector or augmentor, it may result in both services being scheduled on the same node.

Run Agent On-Demand

The collector allows the agent to run on demand. In other words, collector allows the agent to run manually without waiting for the collector's scheduler to run it automatically based on the crontab. To run the agent on demand, execute the following command in your environment where the collector is running.

$ sevone-cli solutions run_agent [--deployment_name] DEPLOYMENT_NAME \
[--agent_name] AGENT_NAME [--log_level LOG_LEVEL]
Note: The square brackets indicate optional arguments or options. The words in upper case need to be replaced with the actual argument.
Note:
  • --deployment_name: It indicates the deployment name against which you are planning to run the agent on demand (refer to examples 1 and 2 below). It is optional, so you can remove --deployment_name from the command if you wish (refer to examples 3 and 4 below).
  • DEPLOYMENT_NAME: Replace it with the actual deployment name. The deployment name is in the following format.
    • Single tenant: solutions-<solution name>-<collector name>. For example, solutions-sdwan-viptela.
    • Multi-tenant: solutions-<solution name>-<collector name>-<a number>. For example, solutions-sdwan-viptela-1.
  • --agent_name: It indicates the agent name to run on demand (refer to examples 1 and 2 below). It is optional, so you can remove --agent_name from the command if you wish (refer to examples 3 and 4 below).
  • AGENT_NAME: Replace it with the actual agent name.
  • --log_level: It indicates the log level against which you are planning to run the agent on demand (refer to example 1 below). It is optional, so you can remove --log_level from the command if you wish (refer to examples 2, 3, and 4 below).
  • LOG_LEVEL: Replace it with the actual log level against which you are planning to run the agent on demand (refer to examples 1 and 3 below). It is optional. If you do not define the log level, the collector uses info as the log level (refer to examples 2 and 4 below). The following is the list of valid values for log level
    • debug
    • info
    • warn
    • Error
Important: Please do not run any streaming agents on demand as they run indefinitely. Hence, there will be two instances of streaming agent running against the collector.

Examples

Example-1

$ sevone-cli solutions run_agent --deployment_name solutions-sdwan-viptela-1 \
--agent_name InstallerAgent --log_level debug

Example-2

$ sevone-cli solutions run_agent --deployment_name solutions-sdwan-viptela-1 \
--agent_name InstallerAgent

Example-3

$ sevone-cli solutions run_agent solutions-sdwan-viptela-1 InstallerAgent debug

Example-4

$ sevone-cli solutions run_agent solutions-sdwan-viptela-1 InstallerAgent

Enable AlarmStatAgent

To enable the AlarmStatAgent, perform the following steps.

  1. Using ssh, log into SD-WAN Viptela collector control plane node as sevone.
    $ ssh sevone@<SD-WAN collector 'control plane' node IP address or hostname>
  2. Navigate to /opt/SevOne/chartconfs directory.
    $ cd /opt/SevOne/chartconfs
  3. Add a valid filter for the AlarmStatAgent by adding the following configuration in /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml file.
    $ vi /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml
    
    collectorConfig:
      vendor:
        alarm_stat:
          filter:
            filter_on: vmanage_severity
            filter_value:
              - Major
              - Medium
              - Minor
              - Critical
  4. Remove AlarmStatAgent from the exclude list by adding the following configuration in /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml file.
    $ vi /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml
    
    collectorConfig:
      agent:
        exclude:
          - None
  5. Apply the changes made in the configuration (solutions-sdwan-viptela_custom_guii.yaml) file.
    $ sevone-cli solutions reload

Rotate Kubernetes Certificates

During SD-WAN solution upgrade, the k3s service automatically rotates certificates that are due to expire within 90 days. In the event that they expire before k3s is able to rotate them, you will need to rotate manually. In other words, If you are seeing error message, x509: certificate has expired, when running kubectl commands, your certificates have expired and need to be rotated manually.


$ kubectl get pods

Unable to connect to the server: x509: certificate has expired or is not yet valid

Backup TLS Directory

As a precautionary measure, backup the TLS directory.

$ sudo tar -czvf /var/lib/rancher/k3s/server/tls.tgz /var/lib/rancher/k3s/server/tls

Generate New Certificates

  1. Remove the cached certificate from a Kubernetes secret.

    $ sudo rm /var/lib/rancher/k3s/server/tls/dynamic-cert.json
  2. Restart k3s service to rotate the certificates.

    $ sudo systemctl restart k3s
    Note: You can now run Kubernetes commands. This will allow you to backup your all-important security keys in case you have not done so already.
  3. After rotating the Kubernetes certificates, the Kubernetes configuration file must be refreshed to apply the new certificates.

    Refresh Kubernetes config file

    for 'root' user

    $ sudo cp /etc/rancher/k3s/k3s.yaml /root/.kube/config

    for 'sevone' user

    
    $ sudo cp /etc/rancher/k3s/k3s.yaml /home/sevone/.kube/config
    
    $ sudo chown -R sevone:sevone /home/sevone/.kube
  4. To verify the certificates, execute the following commands.
    
    $ sudo -i
    
    $ for i in `ls /var/lib/rancher/k3s/server/tls/*.crt`; \
    do echo $i; openssl x509 -enddate -noout -in $i; \
    echo "---"; done
  5. Validate pod status.
    
    $ kubectl get pods
    
    Important: If the command continues to fail due to certificate issue as shown below, then continue with the next step.
    
    Output:
    Unable to connect to the server: x509: certificate has expired or is not yet valid
  6. Execute the following steps.
    1. Generate k3s certificates if still not generated.
      for SD-WAN < 2.13
      
      $ cd /opt/SevOne/upgrade/ansible/playbook
      
      $ ansible-playbook reset.yaml
      
      $ ansible-playbook up.yaml
      for SD-WAN >= 2.13
      
      $ sevone-cli cluster down
      
      $ sevone-cli cluster up
    2. Confirm whether the node on which the augmentor is deployed is receiving flows or not.
      Important: Skip this step if k3s certificates are generated using Step 4.
      1. Check the augmentor pod(s).

        Example

        
        $ kubectl get pods -o wide  
        
        NAME                                                 READY   STATUS      RESTARTS        AGE     IP              NODE        NOMINATED NODE   READINESS GATES
        solutions-sdwan-viptela-redis-master-0                 1/1     Running     1               22h     192.168.80.18   sevonek8s   <none>           <none>
        solutions-sdwan-viptela-redis-replicas-0               1/1     Running     1               22h     192.168.80.20   sevonek8s   <none>           <none>
        solutions-sdwan-viptela-upgrade-sn78p                  0/1     Completed   0               5h34m   192.168.80.21   sevonek8s   <none>           <none>
        solutions-sdwan-viptela-aug-decoder-58fc5dfc6d-9l6kw   1/1     Running     0               5h34m   10.49.12.2      sevonek8s   <none>           <none>
        solutions-sdwan-viptela-create-keys-2-cf252            0/1     Completed   0               5h34m   192.168.80.24   sevonek8s   <none>           <none>
        solutions-sdwan-viptela-collector-5c6f7fd4b8-g6k8x     1/1     Running     0               5h34m   192.168.80.23   sevonek8s   <none>           <none>
        
      2. Using ssh, log into augmentor node as sevone.
        $ ssh sevone@<SD-WAN collector augmentor node IP address>

        Example

        $ ssh sevone@10.49.12.2
      3. Check whether the augmentor node is receiving flows or not.
        $ sudo tcpdump -i any port <receiver_port_number> -vv
        Note: To know augmentor receiver port number, see the value of variable flowAugmentorService.receiverPort in /opt/SevOne/chartconfs/solution-sdwan-viptela*.yaml file.

        Example

        
        $ sudo tcpdump -i any port 9992 -vv
        
        tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
        08:45:17.950805 IP (tos 0x0, ttl 61, id 13462, offset 0, flags [DF], proto UDP (17), length 360)
            192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 332
        08:45:17.950850 IP (tos 0x0, ttl 61, id 13463, offset 0, flags [DF], proto UDP (17), length 152)
            192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 124
        08:45:17.950856 IP (tos 0x0, ttl 61, id 13464, offset 0, flags [DF], proto UDP (17), length 152)
            192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 124
        08:45:17.950859 IP (tos 0x0, ttl 61, id 13465, offset 0, flags [DF], proto UDP (17), length 152)
            192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 124
        08:45:17.950863 IP (tos 0x0, ttl 61, id 13466, offset 0, flags [DF], proto UDP (17), length 152)
            192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 124
        08:45:17.950867 IP (tos 0x0, ttl 61, id 13467, offset 0, flags [DF], proto UDP (17), length 152)
            192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 124
        ...
        ...
        ...
      4. If you are unable to see flows, repeat Step 6b, bullets ii & iii on other nodes. Once you find the node on which the flows are coming, delete the augmentor pod till it gets deployed on that node.

Install SD-WAN 7.0 on SD-WAN < 7.0 Virtual Machine

Important: It is recommended to use the appropriate OVA for the Viptela collector installation.

Execute the following steps to install SD-WAN 7.0 Viptela Collector on SD-WAN < 7.0 virtual machine.

  1. Using ssh, log into SD-WAN Viptela collector control plane node as sevone.
    $ ssh sevone@<SD-WAN collector 'control plane' node IP address or hostname>

    Example: Currently on SD-WAN Viptela Collector < 7.0

    $ ssh sevone@10.128.11.150
  2. Navigate to /opt/SevOne/upgrade directory.
    $ cd /opt/SevOne/upgrade
  3. Remove all files present in directory /opt/SevOne/upgrade.
    $ rm -rf /opt/SevOne/upgrade
  4. Download the following (latest) files from IBM Passport Advantage (https://www.ibm.com/software/passportadvantage/pao_download_software.html) via Passport Advantage Online. However, if you are on a legacy / flexible SevOne contract and do not have access to IBM Passport Advantage but have an active Support contract, please contact IBM SevOne Support for the latest files. You must place these files in /opt/SevOne/upgrade directory.
    1. sevone_solutions_sdwan_viptela-v7.0.0-build.<###>.tgz
    2. sevone_solutions_sdwan_viptela-v7.0.0-build.<###>.tgz.sha256.txt
    3. signature-tools-<latest version>-build.<latest>.tgz
    4. signature-tools-<latest version>-build.<latest>.tgz.sha256.txt
  5. Extract the latest build.
    $ tar xvfz $(ls -Art /opt/SevOne/upgrade/sevone_*.tgz | \
    tail -n 1) -C /opt/SevOne/upgrade/ ./utilities
  6. You are now ready to deploy SD-WAN 7.0 collector. Please refer to SD-WAN Viptela Collector Deployment / Configuration Guide for details on how to perform the deployment.

Move SD-WAN Devices from One Peer to Another

After moving necessary devices from source peer to destination peer(s) in SevOne NMS UI (Device Mover), perfrom the following steps to make sure that the destination peer(s) is/are added in the collector config.

  1. Using ssh, log into SD-WAN Viptela collector control plane node as sevone.
    $ ssh sevone@<SD-WAN collector 'control plane' node IP address or hostname>

    Example

    $ ssh sevone@10.128.11.150
  2. Change directory to /opt/SevOne/chartconfs/.
    $ cd /opt/SevOne/chartconfs/
  3. Using a text editor of your choice, update the flag distribution_peers_list with all peer IDs in /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml file and then save the file.
    $ vi /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml
  4. Redeploy the collector.
    $ sevone-cli solutions reload

Standalone Vs Multi-Tenant Deployments

This section contains information about the pros and cons of deploying standalone and multi-tenant clusters.

  • Add a new vManage as a tenant in the existing cluster
    Pros Cons
    No need to provide additional resources to k3s. If the k3s cluster encounters any issues, it will impact the data collection process for tenants.
    No need to setup a standalone cluster for the new vManage. Cannot use SSU for deployment.
    Adding a new tenant is easy as compared to creating a standalone cluster.
  • Standalone cluster for a new vManage
    Pros Cons
    Any issues with one tenant's data collection due to k3s cluster problems will not affect the other tenant. Need to provide additional resources to k3s.
    Deploying the augmenter on a separate node will be easier due to the smaller number of nodes. Need to setup a standalone cluster for the new vManage.
    Can use SSU for deployment

Stop Collecting Data from Tenant

Execute the following steps.

  1. To stop collecting data for a particular tenant, execute the following command.
    $ helm uninstall <helm_deployment_name>

    where, <helm_deployment_name> is the name of the helm deployment related to a particular tenant you want to stop collecting data for.

  2. After stopping the collection of data for the tenant, manually cleanup to remove the tenant specific data from the NMS. To purge the flows data, from the navigation bar, go to Administration, select Flow Configuration, and then select Flow Interface Manager. Select the devices from the list specific to the tenant, click on the wrench icon, and Purge Device Flows. Purge Flows
  3. To finally delete the data, select the devices specific to the tenant from the list on the Flow Interface Manager, click on the Delete Device Rules button. Delete Device Rules
  4. To delete the tenant device group, from the navigation bar, go to Devices, select Grouping, and then select Device Groups. Hover over the tenant device group and click on the Trash can icon. Click on OK to confirm. Device Groups 1
  5. To delete the tenant object groups, from the navigation bar, go to Devices, select Grouping, and then select Object Groups. Hover over the tenant object group and click on Trash can icon. Click on OK to confirm. Object Groups
  6. To delete the devices for this tenant, from the navigation bar, go to Devices and then select Device Manager. Select the devices for this tenant and click on Delete Selected. Then click on OK to confirm. Delete Devices

Replace Faulty Node in a Multi-Node Cluster

In a multi-node cluster, if one of the nodes is faulty, ensure that the deployed image matches the same release version as the existing nodes in the collector configuration.

Note: The sevone-cli cluster up command in the step below will ensure that the node is updated to the same patch level as the existing collector nodes.

Execute the following steps to add a new node in the SD-WAN collector cluster using the .ova file. Please refer to section Deploy OVA in SD-WAN Viptela Collector Pre-Deployment Guide for details.

  1. Run the kubectl command to retrieve the node information from the cluster.
    $ kubectl get nodes
  2. Remove the faulty node from the cluster.
    $ sevone-cli cluster worker remove <IP address of worker node>
    
  3. Add a new node to the cluster.
    $ sevone-cli cluster worker add <IP address of worker node>
    
  4. Reset the Kubernetes cluster.
    $ sevone-cli cluster down
    
  5. Spin up the Kubernetes cluster.
    $ sevone-cli cluster up
    
  6. Verify that the new agent node is Ready and has been added to the Kubernetes cluster.
    $ kubectl get nodes