SD-WAN Viptela Collector Troubleshooting Guide
SevOne Documentation
All documentation is available from the IBM SevOne Support customer portal.
© Copyright International Business Machines Corporation 2023.
All right, title, and interest in and to the software and documentation are and shall remain the exclusive property of IBM and its respective licensors. No part of this document may be reproduced by any means nor modified, decompiled, disassembled, published or distributed, in whole or in part, or translated to any electronic medium or other means without the written consent of IBM.
IN NO EVENT SHALL IBM, ITS SUPPLIERS, NOR ITS LICENSORS BE LIABLE FOR ANY DAMAGES, WHETHER ARISING IN TORT, CONTRACT OR ANY OTHER LEGAL THEORY EVEN IF IBM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES, AND IBM DISCLAIMS ALL WARRANTIES, CONDITIONS OR OTHER TERMS, EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, ON SOFTWARE AND DOCUMENTATION FURNISHED HEREUNDER INCLUDING WITHOUT LIMITATION THE WARRANTIES OF DESIGN, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT.
IBM, the IBM logo, and SevOne are trademarks or registered trademarks of International Business Machines Corporation, in the United States and/or other countries. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on ibm.com/trademark.
- About
- Troubleshooting
- Debug SD-WAN Collector
- Helpful CLI commands
- Pod Stuck in a terminating State
- Redeploy / Update Configuration
- Review / Collect Logs
- Stop Collector
- Upgrade Collector
- 'Agent' Nodes in a Not Ready State after Rebooting
- Domain Name Resolution (DNS) not working
- ERROR: Failed to open ID file '/home/sevone/.pub': No such file or directory
- Change Collector Log Level
- SSU (Self Service Upgrade)
About
This document provides some useful use-cases and troubleshooting details for SD-WAN Viptela collector.
In this guide if there is,
- [any reference to master] OR
- [[if a CLI command (for NMS or Kubernetes or Redis) contains master] AND/OR
- [its output contains master]],
And, if there is any reference to slave or worker, it means follower or agent.
Troubleshooting
Debug SD-WAN Collector
The following are some scenarios on how you may debug an issue while deploying SD-WAN Viptela Collector.
Question | Description / Command(s) | |
---|---|---|
|
Are you logged in as root or sevone user? | All commands must be run as sevone user. |
|
If you are performing a fresh deployment, have you checked that there are no IP-range conflicts. | Please refer to SD-WAN Viptela Collector Use-Cases Guide > Use-Cases > section Handle IP Conflicts for details on IP address ranges. for details on IP address ranges. |
|
How can SD-WAN collector, Kubernetes, Helm, SOA, and SevOne NMS application versions be obtained? | On SD-WAN collector, execute the following commands $ cat /SevOne.info $ kubectl version o --short $ helm version On SevOne NMS appliance, execute the following command $ docker ps | grep soa |
|
Is the Kubernetes cluster healthy? | Execute the following commands. $ kubectl get nodes $ kubectl get pods $ kubectl get pods -n=kube-system |
|
How can the application logs for a suspected pod be obtained? | Example: Get logs for deploy/solutions-sdwan-viptela-collector on SD-WAN 'viptela' collector machine$ kubectl logs deploy/solutions-sdwan-viptela-collector --tail 100 On SevOne NMS appliance # Using a text editor of your choice, you can find the logs in /var/log/soa.log $ vi /var/log/soa.log |
|
How can traces for the request be obtained if it involves retrieving data from SevOne NMS? | On SevOne NMS appliance $ supervisorctl start jaeger <Trigger the event to happen in SD-WAN Collector> $ /usr/local/bin/jaeger |
|
What should you do if an issue is related to the User Interface? | You must collect the console logs from your browser. |
Helpful CLI commands
-
After creating the virtual machine, you may want to change its name. Execute the following steps.
$ ssh sevone@<SD-WAN Viptela collector node IP address> $ sudo hostnamectl set-hostname "<enter hostname>" $ sudo reboot
-
When provisioning of control plane node is complete via user interface, ensure that control plane node is correctly provisioned from CLI.
$ ssh sevone@<SD-WAN Viptela 'control plane' node IP address> $ kubectl get nodes NAME STATUS ROLES AGE VERSION sdwan-node01 Ready control-plane,master 2m45s v1.27.1+k3s1
-
When the agent nodes have joined the Kubernetes cluster, execute the following command to confirm the same.
$ ssh sevone@<SD-WAN Viptela collector 'control plane' node IP address> $ kubectl get nodes NAME STATUS ROLES AGE VERSION sdwan-node01 Ready control-plane,master 2m45s v1.27.1+k3s1 sdwan-node02 Ready <none> 2m45s v1.27.1+k3s1 sdwan-node03 Ready <none> 2m45s v1.27.1+k3s1
-
To check the status of the deployment, ensure that all the pods are in Running status.
$ kubectl get pods NAME READY STATUS RESTARTS AGE solutions-sdwan-viptela-redis-master-0 1/1 Running 0 6d3h solutions-sdwan-viptela-redis-replicas-0 1/1 Running 0 6d3h solutions-sdwan-viptela-upgrade-kpbdd 0/1 Completed 0 15m solutions-sdwan-viptela-aug-5496ccccbd-7txnt 1/1 Running 0 15m solutions-sdwan-viptela-create-keys-2-rcbpc 0/1 Completed 0 15m solutions-sdwan-viptela-collector-8795594c9-2v7gr 1/1 Running 0 15m
Pod Stuck in a terminating State
If a pod is ever stuck and you want it to reboot, you can append --grace-period=0 --force to the end of your delete pod command.
Example
$ ssh sevone@<SD-WAN Viptela collector 'control plane' node IP address or hostname>
$ kubectl delete pod $(kubectl get pods | grep 'dsm' | awk '{print $1}') --grace-period=0 --force
Redeploy / Update Configuration
If you are deploying the same build again or have updated /opt/SevOne/chartconfs/solutions-sdwan-viptela_custom_guii.yaml file, the following commands must be executed.
Applies only when configuration has been updated. The helm command uninstalls the deployment along with the base configuration which by default, is available with the .ova image file.
$ sevone-cli playbook precheck
$ sevone-cli solutions reload
- collectorService
- affinity
- flowAugmentorService
- augmentor service
- receiverPort
Review / Collect Logs
Logs can be collected at the pod level. The status of pods must be Running.
By default, resource-type = pod. For logs where resource-type = pod, you may choose to only pass the pod-name only; resource-type is optional.
Using ssh, log into SD-WAN collector control plane node as sevone.
$ ssh sevone@<SD-WAN Viptela collector 'control plane' node IP address or hostname>
Ensure that all pods are either Running or Completed.
Example: Get 'pod' names
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
solutions-sdwan-viptela-redis-master-0 1/1 Running 0 6d3h
solutions-sdwan-viptela-redis-replicas-0 1/1 Running 0 6d3h
solutions-sdwan-viptela-upgrade-kpbdd 0/1 Completed 0 15m
solutions-sdwan-viptela-aug-5496ccccbd-7txnt 1/1 Running 0 15m
solutions-sdwan-viptela-create-keys-2-rcbpc 0/1 Completed 0 15m
solutions-sdwan-viptela-collector-8795594c9-2v7gr 1/1 Running 0 15m
Get resource types
Get 'all' resource types
$ kubectl get all | more
Get resource type for a pod
$ kubectl get all | grep <pod-name>
Example: Get resource type for pod-name containing 'solutions-sdwan'
$ kubectl get all |grep solutions-sdwan
pod/solutions-sdwan-viptela-redis-master-0 1/1 Running 1 (4h14m ago) 6d6h
pod/solutions-sdwan-viptela-redis-replicas-0 1/1 Running 1 (4h14m ago) 6d6h
pod/solutions-sdwan-viptela-upgrade-kpbdd 0/1 Completed 0 3h38m
pod/solutions-sdwan-viptela-aug-5496ccccbd-7txnt 1/1 Running 0 3h38m
pod/solutions-sdwan-viptela-create-keys-2-rcbpc 0/1 Completed 0 3h38m
pod/solutions-sdwan-viptela-collector-8795594c9-2v7gr 1/1 Running 0 3h38m
service/solutions-sdwan-viptela-redis-headless ClusterIP None <none> 6379/TCP 6d6h
service/solutions-sdwan-viptela ClusterIP 192.168.100.137 <none> 80/TCP 6d6h
service/solutions-sdwan-viptela-redis-replicas ClusterIP 192.168.110.165 <none> 6379/TCP 6d6h
service/solutions-sdwan-viptela-redis-master ClusterIP 192.168.103.82 <none> 6379/TCP 6d6h
service/solutions-sdwan-viptela-flowservice NodePort 192.168.99.27 <none> 9995:9995/UDP 3h38m
deployment.apps/solutions-sdwan-viptela-aug 1/1 1 1 3h38m
deployment.apps/solutions-sdwan-viptela-collector 1/1 1 1 6d6h
replicaset.apps/solutions-sdwan-viptela-collector-7c645d9f8f 0 0 0 6d6h
replicaset.apps/solutions-sdwan-viptela-aug-5496ccccbd 1 1 1 3h38m
replicaset.apps/solutions-sdwan-viptela-collector-8795594c9 1 1 1 3h38m
statefulset.apps/solutions-sdwan-viptela-redis-master 1/1 6d6h
statefulset.apps/solutions-sdwan-viptela-redis-replicas 1/1 6d6h
job.batch/solutions-sdwan-viptela-upgrade 1/1 3s 3h38m
job.batch/solutions-sdwan-viptela-create-keys-2 1/1 8s 3h38m
solutions-sdwan-viptela in the example above is the pod name.
Get logs
$ kubectl logs <resource-type>/<pod-name>
Example: Get logs for pod-name 'solutions-sdwan-viptela-collector'
$ kubectl logs deployment.apps/solutions-sdwan-viptela-collector
OR
$ kubectl logs deploy/solutions-sdwan-viptela-collector
Example: Get logs for pod-name 'solutions-sdwan-viptela-redis-master'
$ kubectl logs statefulset.apps/solutions-sdwan-viptela-redis-master
OR
$ kubectl logs sts/solutions-sdwan-viptela-redis-master
Example: Get logs for pod-name 'solutions-sdwan-viptela-redis-master' with timestamps
$ kubectl logs statefulset.apps/solutions-sdwan-viptela-redis-master --timestamps
OR
$ kubectl logs sts/solutions-sdwan-viptela-redis-master --timestamps
By default, resource-type = pod.
In the example below, to obtain the logs for <resource-type>/<pod-name> = pod/solutions-sdwan-viptela-redis-master-0, <resource-type> pod is optional.
Example: <resource-type> = pod; <resource-type> is optional
$ kubectl logs pod/solutions-sdwan-viptela-redis-master-0
OR
$ kubectl logs solutions-sdwan-viptela-redis-master-0
Collect Logs for a Pod with One Container
-
Using ssh, log into SD-WAN collector control plane node as sevone.
$ ssh sevone@<SD-WAN Viptela collector 'control plane' node IP address or hostname>
-
Obtain the list of containers that belong to a pod.
Example
Example: Pod name 'di-mysql-0' contains one container, 'mysql'
$ kubectl get pods solutions-sdwan-viptela-redis-master-0 -o jsonpath='{.spec.containers[*].name}{"\n"}' redis
-
Collect logs.
For pods with one container only, -c <container-name> in the command below is optional.$ kubectl logs <pod-name> -c <container-name> or $ kubectl logs <pod-name>
Example
$ kubectl logs solutions-sdwan-viptela-redis-master-0 -c redis or $ kubectl logs solutions-sdwan-viptela-redis-master-0
Start Collector
To start the collector, execute the following commands.
-
Using ssh, log into SD-WAN collector control plane node as sevone.
$ ssh sevone@<SD-WAN Viptela collector 'control plane' node IP address or hostname>
-
Start the collector.
$ sevone-cli solutions reload
Scenario-1 (When no changes to collectorConfig file and secrets)
- create-keys pods will restart to set up NMS v2 api keys and v3 api keys.
- Collector pod will not restart.
Scenario-2 (When no changes in secrets but changes in collectorConfig file)
- create-keys pods will restart.
- Only the changed collector pod will restart.
Scenario-3 (When changes in secrets but no changes in collectorConfig file)
- create-keys pods will restart.
- Only the changed collector pod will restart.
Scenario-4 (When changes in secrets and collectorConfig file)
- create-keys pods will restart.
- Only the changed collector pod will restart.
Stop Collector
To stop the collector, execute the following commands.
-
Using ssh, log into SD-WAN collector control plane node as sevone.
$ ssh sevone@<SD-WAN Viptela collector 'control plane' node IP address or hostname>
-
Stop the collector.
$ sevone-cli solutions stop_collector
Upgrade Collector
If you are upgrading from SD-WAN 2.9 to SD-WAN version > 2.9, execute the following step. By default, the .ova image is already running a base configuration of the collector. If the configuration is modified, the following command must be executed.
$ sevone-cli solutions reload
- Extract the latest tar files provided to you by IBM SevOne Production or IBM SevOne Support in /opt/SevOne/upgrade folder.
- Run the following commands.
- $ rm -rf /opt/SevOne/upgrade/utilities
- $ tar xvfz $(ls -Art /opt/SevOne/upgrade/sevone_solutions_sdwan_*.tgz | tail -n 1) -C /opt/SevOne/upgrade/ ./utilities
- $ sudo rpm -Uvh /opt/SevOne/upgrade/utilities/sevone-cli-*.rpm
- Execute $ sevone-cli cluster down command.
- Run $ sevone-cli solutions upgrade --no\_guii command. Please run $ sevone-cli solutions upgrade command when upgrading via GUI installer.
'Agent' Nodes in a Not Ready State after Rebooting
Perform the following action if the agent nodes are in a Not Ready state after rebooting.
Ensure SD-WAN collector is 100% deployed
Check the status of the deployment by running the following command. Ensure that everything is in Running status.
$ ssh sevone@<SD-WAN Viptela collector 'control plane' node IP address or hostname>
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
solutions-sdwan-viptela-redis-master-0 1/1 Running 0 6d3h
solutions-sdwan-viptela-redis-replicas-0 1/1 Running 0 6d3h
solutions-sdwan-viptela-upgrade-kpbdd 0/1 Completed 0 15m
solutions-sdwan-viptela-aug-5496ccccbd-7txnt 1/1 Running 0 15m
solutions-sdwan-viptela-create-keys-2-rcbpc 0/1 Completed 0 15m
solutions-sdwan-viptela-collector-8795594c9-2v7gr 1/1 Running 0 15m
Restart SOA
If SevOne NMS has been upgraded or downgraded, please make sure that the SOA container is restarted after a successful upgrade/downgrade. Execute the following command.
From SevOne NMS appliance,
$ ssh root@<NMS appliance>
$ supervisorctl restart soa
Domain Name Resolution (DNS) not working
A working DNS configuration is a requirement for any SD-WAN collector deployment.
The DNS server must be able to resolve SD-WAN collector's hostname on both the control plane and the agent nodes otherwise, SD-WAN collector will not work. This can be done by adding your DNS servers via nmtui or by editing /etc/resolv.conf file directly as shown in the steps below.
Hostname | IP Address | Role |
---|---|---|
sdwan-node01 | 10.123.45.67 | control plane |
sdwan-node02 | 10.123.45.68 | agent |
Also, in this example, the following DNS configuration is used and DNS search records, sevone.com and nwk.sevone.com are used.
Nameserver | IP Address |
---|---|
nameserver | 10.168.16.50 |
nameserver | 10.205.8.50 |
-
Using ssh , log into the designated SD-WAN collector control plane node and agent node as sevone from two different terminal windows.
SSH to 'control plane' node from terminal window 1
$ ssh sevone@10.123.45.67
SSH to 'agent' node from terminal window 2
$ ssh sevone@10.123.45.68
-
Obtain a list of DNS entries in /etc/resolv.conf file for both control plane and agent nodes in this example.
From terminal window 1
$ cat /etc/resolv.conf # Generated by NetworkManager search sevone.com nwk.sevone.com nameserver 10.168.16.50 nameserver 10.205.8.50
From terminal window 2
$ cat /etc/resolv.conf # Generated by NetworkManager search sevone.com nwk.sevone.com nameserver 10.168.16.50 nameserver 10.205.8.50
-
Ensure that DNS server can resolve SD-WAN collector's hostname / IP address on both the control plane and the agent nodes along with the DNS entries in /etc/resolv.conf file (see the search line and nameserver(s)).
From terminal window 1
The following output shows that the DNS server can resolve hostname / IP address on both the control plane and the agent nodes.
Check if 'nslookup' resolves the 'control plane' IP address
$ nslookup 10.123.45.67 67.45.123.10.in-addr.arpa name = sdwan-node01.sevone.com.
Check if 'nslookup' resolves the 'control plane' hostname
$ nslookup sdwan-node01.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sdwan-node01.sevone.com Address: 10.123.45.67
Check if 'nslookup' resolves the 'agent' IP address
$ nslookup 10.123.45.68 68.45.123.10.in-addr.arpa name = sdwan-node02.sevone.com.
Check if 'nslookup' resolves the 'agent' hostname
$ nslookup sdwan-node02.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sdwan-node02.sevone.com Address: 10.123.45.68
nslookup name 'sevone.com' in search line in /etc/resolve.conf
$ nslookup sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sevone.com Address: 23.185.0.4
nslookup name 'nwk.sevone.com' in search line in /etc/resolve.conf
$ nslookup nwk.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: nwk.sevone.com Address: 25.185.0.4
nslookup nameserver '10.168.16.50' in /etc/resolve.conf
$ nslookup 10.168.16.50 50.16.168.10.in-addr.arpa name = infoblox.nwk.sevone.com.
nslookup nameserver '10.205.8.50' in /etc/resolve.conf
$ nslookup 10.205.8.50 50.8.205.10.in-addr.arpa name = infoblox.colo2.sevone.com.
From terminal window 2
The following output shows that the DNS server can resolve hostname / IP address on both the control plane and the agent nodes.
Check if 'nslookup' resolves the 'agent' IP address
$ nslookup 10.123.45.68 68.45.123.10.in-addr.arpa name = sdwan-node02.sevone.com.
Check if 'nslookup' resolves the 'agent' hostname
$ nslookup sdwan-node02.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sdwan-node02.sevone.com Address: 10.123.45.68
Check if 'nslookup' resolves the 'control plane' IP address
$ nslookup 10.123.45.67 67.45.123.10.in-addr.arpa name = sdwan-node01.sevone.com.
Check if 'nslookup' resolves the 'control plane' hostname
$ nslookup sdwan-node01.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sdwan-node01.sevone.com Address: 10.123.45.67
nslookup name 'sevone.com' in search line in /etc/resolve.conf
$ nslookup sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sevone.com Address: 23.185.0.4
nslookup name 'nwk.sevone.com' in search line in /etc/resolve.conf
$ nslookup nwk.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: nwk.sevone.com Address: 25.185.0.4
nslookup nameserver '10.168.16.50' in /etc/resolve.conf
$ nslookup 10.168.16.50 50.16.168.10.in-addr.arpa name = infoblox.nwk.sevone.com.
nslookup nameserver '10.205.8.50' in /etc/resolve.conf
$ nslookup 10.205.8.50 50.8.205.10.in-addr.arpa name = infoblox.colo2.sevone.com.
If any of the nslookup commands in terminal window 1 or terminal window 2 above fail or return one or more of the following, you must first resolve the name resolution issue otherwise, SD-WAN collector will not work.Example
** server cant find 67.45.123.10.in-addr.arpa.: NXDOMAIN or ** server cant find 68.45.123.10.in-addr.arpa.: NXDOMAIN or *** Cant find nwk.sevone.com: No answer etc.
If the name resolution fails due to any reason after the deployment of SD-WAN collector, then this could also lead to the failure of normal operations in SD-WAN collector. Hence, it is recommended to ensure that the DNS configuration is always working.
ERROR: Failed to open ID file '/home/sevone/.pub': No such file or directory
As a security measure, fresh installations do not ship with pre-generated SSH keys.
-
Using ssh, log into the SD-WAN Viptela collector control plane node as sevone.
$ ssh sevone@<SD-WAN collector 'control plane' node IP address or hostname>
Example
$ ssh sevone@10.123.45.67
-
Execute the following command to generate unique SSH keys for your cluster.
$ sevone-cli cluster setup-keys
Change Collector Log Level
To change the collector log level for any particular agent without redeploying the collector, perform the following steps.
-
Get redis-cli shell on the master node.
$ kubectl exec -it {redis-pod}– redis-cli
-
Publish debug message to the loggingCommand channel.
PUBLISH loggingCommand "{agent-name}:{loglevel}:{logtype}:{minutes}"
Where,
- agent-name - Defines the name of the agent.
- loglevel - Defines the log-level for the collector. Value can be info, debug, warning, or error.
- logtype - Defines the type of logs. Value can be nms / vendor / all.
- nms - Only NMS API response will be printed in debug logs.
- vendor - Only vendor (vManage / vDirector) API logs will be printed in debug logs.
- all - NMS and vendor logs will be printed in debug logs.
- minutes - Defines time (in minutes) for which logs must be printed based on this message. For example, if minutes is set to 5, then debug logs will be printed for 5 minutes based on the message sent from the redis.
Example
PUBLISH loggingCommand "DeviceHealthStreamingAgent:debug:all:1"
SSU (Self Service Upgrade)
General
Unable to Start SSU
When installing GUI, if you see the following error, please execute the command export LC_ALL='en_US.utf8' and retry.
Unable to Login
If you see the following error after entering the correct credentials, change the API port to any accessible port from the browser. For more details, please refer to SD-WAN Viptela Collector Upgrade Process Guide > section FAQs > section Change Ports.
Pre-check Stage Failures
Invalid vManage Credentials
When doing a pre-check, if you see the following error, please provide the correct Base64 credentials for vManage.
Invalid NMS API Credentials
When doing a pre-check, if you see the following error, please provide the correct Base64 API credentials for SevOne NMS.
Invalid NMS SSH Credentials
When doing a pre-check, if you see the following error, please provide the correct Base64 SSH credentials for SevOne NMS.
Invalid DI API Credentials
When doing a pre-check, if you see the following error, please provide the correct Base64 API credentials for SevOne Data Insight.
Credentials are not in Base64 Format
When doing a pre-check, if you see the following error, please provide credentials (username & password) in Base64 format for the controller, NMS SSH, NMS API, & DI API.
PAS Sizing Issue
When doing a pre-check, if you see the following error, please review the sizing details and reconfigure the PAS in the collector and then re-trigger the pre-check.
Post-check Stage Failures
Fail to Import OOTB Reports
When doing a post-check, if you see the following error, please provide the correct Base64 API credentials for SevOne Data Insight.