SD-WAN Viptela Collector Use Cases Guide
About
This document provides some useful use-cases and troubleshooting details for SD-WAN Viptela collector.
Use-Cases
Change Hostname
Teardown Kubernetes
To change a node's hostname, you must teardown your Kubernetes cluster.
$ sevone-cli cluster down
Update ansible Inventory
- Run the following command on every node to change their
hostname.
$ sudo hostnamectl set-hostname "sdwan-node<##>"
- On the control plane node, update /etc/ansible/hosts with your new hostname.Note: For a single-node cluster, the deployment node is local.
Example
[server] sdwan-node01 ansible_connection=local
- If you have agent nodes, update their hostnames as well.
Example
[server] sdwan-node01 ansible_connection=local [agent] sdwan-node02 ansible_user=sevone ansible_host=10.123.45.68 sdwan-node03 ansible_user=sevone ansible_host=10.123.45.69
Provision Kubernetes
$ sevone-cli cluster up
Customize SD-WAN Collector
Using ssh, log into SD-WAN collector control plane node as sevone.
$ ssh sevone@<SD-WAN Viptela collector 'control plane' node IP address or hostname>
SD-WAN collector runs as a helm chart deployed within the Kubernetes cluster. The helm chart is configured with a base set of configuration options that can be overridden as needed. A new file, /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml, must be created to contain only the settings you want to override.
Create /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml file
$ touch /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml
Handle IP Conflicts
The following are the default IP ranges used by Solutions.
Flag | Description | IP Address | IP Range |
---|---|---|---|
--cluster-cidr | Pod IP addresses | 192.168.80.0/20 | 192.168.80.0.0 - 192.168.95.255 |
--service-cidr | Service IP addresses | 192.168.96.0/20 | 192.168.96.0 - 192.168.111.255 |
--cluster-dns | Cluster DNS (must be in Service's range) | 192.168.96.10 | n/a |
Teardown Kubernetes
- In order to change the default IP ranges, you must teardown your Kubernetes
cluster.
$ sevone-cli cluster down
- Ensure that the old IP address ranges are not left behind in any of your node’s routing
tables.
$ ansible all --become -a "ip route del 192.168.96.0/24"
Adjust IP Ranges
Create a file ip_ranges.yaml in /etc/ansible/group_vars/all directory with your new IP ranges.
Example
$ echo 'k3s_cluster_cidr: "192.168.0.0/24"' >> \
/etc/ansible/group_vars/all/ip_ranges.yaml
$ echo 'k3s_service_cidr: "192.168.1.0/24"' >> \
/etc/ansible/group_vars/all/ip_ranges.yaml
$ echo 'k3s_cluster_dns: "192.168.1.10"' >> \
/etc/ansible/group_vars/all/ip_ranges.yaml
You may then redeploy or proceed with your deployment as normal.
$ sevone-cli playbook up
Deploy Collector and Augmentor on Specific Nodes
To bind the augmentor / collector to specific nodes (as flows are directly routed to worker nodes), add the following variables in your configuration file.
- collectorService.nodeAffinity.values
- flowAugmentorService.nodeAffinity.values
Example: To run collector on host 'sdwan-node02' and augmentor on 'sdwan-node03'
collectorService:
nodeAffinity:
values:
- {hostname}
Example
collectorService:
nodeAffinity:
values:
- sdwan-node02
.
flowAugmentorService:
nodeAffinity:
values:
- {hostname}
Example
flowAugmentorService:
nodeAffinity:
values:
- sdwan-node03
- In a multi-node setup, the use of the master node hostname is not supported by the nodeAffinity.values.
- When using affinity, ensure that both collectorService.nodeAffinity.values and flowAugmentorService.nodeAffinity.values are configured. If affinity is set for only the collector or augmentor, it may result in both services being scheduled on the same node.
Run Agent On-Demand
The collector allows the agent to run on demand. In other words, collector allows the agent to run manually without waiting for the collector's scheduler to run it automatically based on the crontab. To run the agent on demand, execute the following command in your environment where the collector is running.
$ sevone-cli solutions run_agent [--deployment_name] DEPLOYMENT_NAME \
[--agent_name] AGENT_NAME [--log_level LOG_LEVEL]
- --deployment_name: It indicates the deployment name against which you are planning to run the agent on demand (refer to examples 1 and 2 below). It is optional, so you can remove --deployment_name from the command if you wish (refer to examples 3 and 4 below).
- DEPLOYMENT_NAME: Replace it with the actual deployment name. The deployment name is in
the following format.
- Single tenant: solutions-<solution name>-<collector name>. For example, solutions-sdwan-viptela.
- Multi-tenant: solutions-<solution name>-<collector name>-<a number>. For example, solutions-sdwan-viptela-1.
- --agent_name: It indicates the agent name to run on demand (refer to examples 1 and 2 below). It is optional, so you can remove --agent_name from the command if you wish (refer to examples 3 and 4 below).
- AGENT_NAME: Replace it with the actual agent name.
- --log_level: It indicates the log level against which you are planning to run the agent on demand (refer to example 1 below). It is optional, so you can remove --log_level from the command if you wish (refer to examples 2, 3, and 4 below).
- LOG_LEVEL: Replace it with the actual log level against which you are planning to run the
agent on demand (refer to examples 1 and 3 below). It is optional. If you do not define the log
level, the collector uses info as the log level (refer to examples 2 and 4 below). The
following is the list of valid values for log level
- debug
- info
- warn
- Error
Examples
Example-1
$ sevone-cli solutions run_agent --deployment_name solutions-sdwan-viptela-1 \
--agent_name InstallerAgent --log_level debug
Example-2
$ sevone-cli solutions run_agent --deployment_name solutions-sdwan-viptela-1 \
--agent_name InstallerAgent
Example-3
$ sevone-cli solutions run_agent solutions-sdwan-viptela-1 InstallerAgent debug
Example-4
$ sevone-cli solutions run_agent solutions-sdwan-viptela-1 InstallerAgent
Enable AlarmStatAgent
To enable the AlarmStatAgent, perform the following steps.
- Using ssh, log into SD-WAN Viptela collector control plane node as
sevone.
$ ssh sevone@<SD-WAN collector 'control plane' node IP address or hostname>
- Navigate to /opt/SevOne/chartconfs
directory.
$ cd /opt/SevOne/chartconfs
- Add a valid filter for the AlarmStatAgent by adding the following configuration in
/opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml
file.
$ vi /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml collectorConfig: vendor: alarm_stat: filter: filter_on: vmanage_severity filter_value: - Major - Medium - Minor - Critical
- Remove AlarmStatAgent from the exclude list by adding the following configuration in
/opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml
file.
$ vi /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml collectorConfig: agent: exclude: - None
- Apply the changes made in the configuration (solutions-sdwan-viptela_custom_guii.yaml)
file.
$ sevone-cli solutions reload
Rotate Kubernetes Certificates
During SD-WAN solution upgrade, the k3s service automatically rotates certificates that are due to expire within 90 days. In the event that they expire before k3s is able to rotate them, you will need to rotate manually. In other words, If you are seeing error message, x509: certificate has expired, when running kubectl commands, your certificates have expired and need to be rotated manually.
$ kubectl get pods
Unable to connect to the server: x509: certificate has expired or is not yet valid
Backup TLS Directory
As a precautionary measure, backup the TLS directory.
$ sudo tar -czvf /var/lib/rancher/k3s/server/tls.tgz /var/lib/rancher/k3s/server/tls
Generate New Certificates
-
Remove the cached certificate from a Kubernetes secret.
$ sudo rm /var/lib/rancher/k3s/server/tls/dynamic-cert.json
-
Restart k3s service to rotate the certificates.
$ sudo systemctl restart k3s
Note: You can now run Kubernetes commands. This will allow you to backup your all-important security keys in case you have not done so already. - After rotating the Kubernetes certificates, the Kubernetes configuration file must be refreshed
to apply the new certificates.
Refresh Kubernetes config file
for 'root' user
$ sudo cp /etc/rancher/k3s/k3s.yaml /root/.kube/config
for 'sevone' user
$ sudo cp /etc/rancher/k3s/k3s.yaml /home/sevone/.kube/config $ sudo chown -R sevone:sevone /home/sevone/.kube
- To verify the certificates, execute the following
commands.
$ sudo -i $ for i in `ls /var/lib/rancher/k3s/server/tls/*.crt`; \ do echo $i; openssl x509 -enddate -noout -in $i; \ echo "---"; done
- Validate pod status.
$ kubectl get pods
Important: If the command continues to fail due to certificate issue as shown below, then continue with the next step.Output: Unable to connect to the server: x509: certificate has expired or is not yet valid
- Execute the following steps.
- Generate k3s certificates if still not generated. for SD-WAN < 2.13
$ cd /opt/SevOne/upgrade/ansible/playbook $ ansible-playbook reset.yaml $ ansible-playbook up.yaml
for SD-WAN >= 2.13$ sevone-cli cluster down $ sevone-cli cluster up
- Confirm whether the node on which the augmentor is deployed is receiving flows or not.Important: Skip this step if k3s certificates are generated using Step 4.
- Check the augmentor pod(s).
Example
$ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES solutions-sdwan-viptela-redis-master-0 1/1 Running 1 22h 192.168.80.18 sevonek8s <none> <none> solutions-sdwan-viptela-redis-replicas-0 1/1 Running 1 22h 192.168.80.20 sevonek8s <none> <none> solutions-sdwan-viptela-upgrade-sn78p 0/1 Completed 0 5h34m 192.168.80.21 sevonek8s <none> <none> solutions-sdwan-viptela-aug-decoder-58fc5dfc6d-9l6kw 1/1 Running 0 5h34m 10.49.12.2 sevonek8s <none> <none> solutions-sdwan-viptela-create-keys-2-cf252 0/1 Completed 0 5h34m 192.168.80.24 sevonek8s <none> <none> solutions-sdwan-viptela-collector-5c6f7fd4b8-g6k8x 1/1 Running 0 5h34m 192.168.80.23 sevonek8s <none> <none>
- Using ssh, log into augmentor node as sevone.
$ ssh sevone@<SD-WAN collector augmentor node IP address>
Example
$ ssh sevone@10.49.12.2
- Check whether the augmentor node is receiving flows or
not.
$ sudo tcpdump -i any port <receiver_port_number> -vv
Note: To know augmentor receiver port number, see the value of variable flowAugmentorService.receiverPort in /opt/SevOne/chartconfs/solution-sdwan-viptela*.yaml file.Example
$ sudo tcpdump -i any port 9992 -vv tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 08:45:17.950805 IP (tos 0x0, ttl 61, id 13462, offset 0, flags [DF], proto UDP (17), length 360) 192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 332 08:45:17.950850 IP (tos 0x0, ttl 61, id 13463, offset 0, flags [DF], proto UDP (17), length 152) 192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 124 08:45:17.950856 IP (tos 0x0, ttl 61, id 13464, offset 0, flags [DF], proto UDP (17), length 152) 192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 124 08:45:17.950859 IP (tos 0x0, ttl 61, id 13465, offset 0, flags [DF], proto UDP (17), length 152) 192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 124 08:45:17.950863 IP (tos 0x0, ttl 61, id 13466, offset 0, flags [DF], proto UDP (17), length 152) 192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 124 08:45:17.950867 IP (tos 0x0, ttl 61, id 13467, offset 0, flags [DF], proto UDP (17), length 152) 192.168.13.1.42383 > 10.128.26.25.palace-4: [udp sum ok] UDP, length 124 ... ... ...
- If you are unable to see flows, repeat Step 6b, bullets ii & iii on other nodes. Once you find the node on which the flows are coming, delete the augmentor pod till it gets deployed on that node.
- Check the augmentor pod(s).
- Generate k3s certificates if still not generated.
Install SD-WAN 7.0 on SD-WAN < 7.0 Virtual Machine
Execute the following steps to install SD-WAN 7.0 Viptela Collector on SD-WAN < 7.0 virtual machine.
- Using ssh, log into SD-WAN Viptela collector control plane node as sevone.
$ ssh sevone@<SD-WAN collector 'control plane' node IP address or hostname>
Example: Currently on SD-WAN Viptela Collector < 7.0
$ ssh sevone@10.128.11.150
- Navigate to /opt/SevOne/upgrade
directory.
$ cd /opt/SevOne/upgrade
- Remove all files present in directory
/opt/SevOne/upgrade.
$ rm -rf /opt/SevOne/upgrade
- Download the following (latest) files from IBM Passport Advantage (https://www.ibm.com/software/passportadvantage/pao_download_software.html) via
Passport Advantage Online. However, if you are on a legacy / flexible SevOne contract and do
not have access to IBM Passport Advantage but have an active Support contract, please contact
IBM SevOne Support for the latest files. You must place these files in
/opt/SevOne/upgrade directory.
- sevone_solutions_sdwan_viptela-v7.0.0-build.<###>.tgz
- sevone_solutions_sdwan_viptela-v7.0.0-build.<###>.tgz.sha256.txt
- signature-tools-<latest version>-build.<latest>.tgz
- signature-tools-<latest version>-build.<latest>.tgz.sha256.txt
- Extract the latest
build.
$ tar xvfz $(ls -Art /opt/SevOne/upgrade/sevone_*.tgz | \ tail -n 1) -C /opt/SevOne/upgrade/ ./utilities
- You are now ready to deploy SD-WAN 7.0 collector. Please refer to SD-WAN Viptela Collector Deployment / Configuration Guide for details on how to perform the deployment.
Move SD-WAN Devices from One Peer to Another
After moving necessary devices from source peer to destination peer(s) in SevOne NMS UI (Device Mover), perfrom the following steps to make sure that the destination peer(s) is/are added in the collector config.
- Using ssh, log into SD-WAN Viptela collector control plane node as sevone.
$ ssh sevone@<SD-WAN collector 'control plane' node IP address or hostname>
Example
$ ssh sevone@10.128.11.150
- Change directory to
/opt/SevOne/chartconfs/.
$ cd /opt/SevOne/chartconfs/
- Using a text editor of your choice, update the flag distribution_peers_list with all peer
IDs in /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml file and then save the
file.
$ vi /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml
- Redeploy the
collector.
$ sevone-cli solutions reload
Standalone Vs Multi-Tenant Deployments
This section contains information about the pros and cons of deploying standalone and multi-tenant clusters.
-
Add a new vManage as a tenant in the existing cluster
Pros Cons No need to provide additional resources to k3s. If the k3s cluster encounters any issues, it will impact the data collection process for tenants. No need to setup a standalone cluster for the new vManage. Cannot use SSU for deployment. Adding a new tenant is easy as compared to creating a standalone cluster. -
Standalone cluster for a new vManage
Pros Cons Any issues with one tenant's data collection due to k3s cluster problems will not affect the other tenant. Need to provide additional resources to k3s. Deploying the augmenter on a separate node will be easier due to the smaller number of nodes. Need to setup a standalone cluster for the new vManage. Can use SSU for deployment
Stop Collecting Data from Tenant
Execute the following steps.
- To stop collecting data for a particular tenant, execute the following
command.
$ helm uninstall <helm_deployment_name>
where, <helm_deployment_name> is the name of the helm deployment related to a particular tenant you want to stop collecting data for.
- After stopping the collection of data for the tenant, manually cleanup to remove the tenant specific data from the NMS. To purge the flows data, from the navigation bar, go to Administration, select Flow Configuration, and then select Flow Interface Manager. Select the devices from the list specific to the tenant, click on the wrench icon, and Purge Device Flows.
- To finally delete the data, select the devices specific to the tenant from the list on the Flow Interface Manager, click on the Delete Device Rules button.
- To delete the tenant device group, from the navigation bar, go to Devices, select Grouping, and then select Device Groups. Hover over the tenant device group and click on the icon. Click on OK to confirm.
- To delete the tenant object groups, from the navigation bar, go to Devices, select Grouping, and then select Object Groups. Hover over the tenant object group and click on icon. Click on OK to confirm.
- To delete the devices for this tenant, from the navigation bar, go to Devices and then select Device Manager. Select the devices for this tenant and click on Delete Selected. Then click on OK to confirm.
Replace Faulty Node in a Multi-Node Cluster
In a multi-node cluster, if one of the nodes is faulty, ensure that the deployed image matches the same release version as the existing nodes in the collector configuration.
Execute the following steps to add a new node in the SD-WAN collector cluster using the .ova file. Please refer to section Deploy OVA in SD-WAN Viptela Collector Pre-Deployment Guide for details.
- Run the kubectl command to retrieve the node information from the cluster.
$ kubectl get nodes
- Remove the faulty node from the cluster.
$ sevone-cli cluster worker remove <IP address of worker node>
- Add a new node to the cluster.
$ sevone-cli cluster worker add <IP address of worker node>
- Reset the Kubernetes cluster.
$ sevone-cli cluster down
- Spin up the Kubernetes cluster.
$ sevone-cli cluster up
- Verify that the new agent node is Ready and has been added to the Kubernetes
cluster.
$ kubectl get nodes