Troubleshooting
The following are some troubleshooting scenarios.
General Issues
Early Users of 'unrhel'
The following message is returned when the environment is migrated over from RHEL using unrhel@v5.1.1 or below.
FAILED => Missing sudo password
To fix this, execute the following commands before performing an upgrade.
$ ssh sevone@<'agent' IP address>
$ sudo -i
$ echo "sevone ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
ACCESS DENIED in GraphQL Logs
If SOA apikeys are outdated or expired, you are likely to get this error. To fix this, update the datasource keys.
- Execute the following command to ensure that GraphQL pod is Running and not in an
Errored or CrashLookBackOff state.
Example
$ kubectl get pods | grep graphql di-graphql-7d88c8c7b5-fbwgc 1/1 Running 0 22h
- If the third column reads Errored or CrashLookBackOff, using a text editor of your
choice, edit /opt/SevOne/chartconfs/di_custom.yaml to include the following environment
variable and then, save it.
Example
graphql: env: SKIP_REPORT_MIGRATION_DRY_RUN: true
- Apply the change made to /opt/SevOne/chartconfs/di_custom.yaml
file.
$ sevone-cli playbook up --tags apps
- Once the GraphQL pod state is running / healthy, generate new SOA API keys for each affected
datasource. Execute the following
command.
$ sevone-cli exec graphql -- npm run reconfig-datasource
Note: You will be prompted several times. Keep all the default values except enter y when prompted to Login instead of providing an API key.
username must be admin
password must be the admin Graphical User Interface password for the datasource.Important: Repeat this step for each datasource.Example
$ sevone-cli exec graphql -- npm run reconfig-datasource > insight-server@6.7.0 reconfig-datasource /insight-server > NODE_PATH=./dist/libs node dist/scripts/database-init/reconfigure-datasource.js Datasource config: Name: Data Insight API Address: https://staging.soa.sevone.doc API key: eyJ1dWlkIjoiYzMxNTQzZWUtMTgxZC00NWMyLTlkNjctNTUwZWRhODQ2MGFkIiwiYXBwbGljYXRpb24iOiJEYXRhIEluc2lnaHQgKGNrcmwxdmhqZzAwMDA1M3MwM3Z5bmRlZXMpIiwiZW50cm9weSI6IjZVaWxuQStzVDk2ZUFKeG92WW1Nak1odS9nZ29JSWhLNVBDZ05yZHBBT1lrSE11ZlM0eU9CbCs4YWxEUXd3a1MifQ== Dstype: METRICS/FLOW Datasource name [Data Insight API]: [1] METRICS/FLOW [2] splunk-datasource [3] elastic-datasource [0] Keep: METRICS/FLOW Datasource dstype [1, 2, 3, 0]: 0 Datasource address [https://staging.soa.sevone.doc]: Login instead of providing an API key? [y/n]: y Username: admin Password: ****** info: [Data Insight API@reconfigure-datasource] SOA request (SOA-1) post https://staging.soa.sevone.doc/api/v3/users/signin (node:347) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification. (Use `node --trace-warnings ...` to show where the warning was created) info: [Data Insight API@reconfigure-datasource] SOA response (SOA-1) elapsed 1755ms. info: [Data Insight API@reconfigure-datasource] SOA request (SOA-2) post https://staging.soa.sevone.doc/api/v3/users/apikey info: [Data Insight API@reconfigure-datasource] SOA response (SOA-2) elapsed 277ms. New datasource config: Name: Data Insight API Address: https://staging.soa.sevone.doc API key: eyJ1dWlkIjoiMzgyMDdhMjItNzE2Mi00OWRlLTk5NTYtYmI3OTVkYjc5NzZkIiwiYXBwbGljYXRpb24iOiJEYXRhIEluc2lnaHQgKGNrczUyYnd0YzAwMDA5bnMxNnQ3aWcxYnQpIiwiZW50cm9weSI6IkNoYS9tbDFWbVVyYThQcHVsLzIzY05JZk94QXcxWFQrVnEyM0hPSzYzSTdPNGNMbkJTVjQyWUVRSW1FeGtDaEoifQ== Dstype: METRICS/FLOW Is this config correct? [y/n]: y info: [Data Insight API@create-datasource] SOA request (SOA-3) get https://staging.soa.sevone.doc/api/v3/users/self info: [Data Insight API@create-datasource] SOA response (SOA-3) elapsed 275ms. Datasource config updated! Datasource reconfiguration complete.
- Once all the datasources have been updated, restart the GraphQL
pod.
$ kubectl delete pods -l app.kubernetes.io/component=graphql
Error Fetching Widget when Loading Report
This error is when the wdkserver is not serving the widgets to the user interface. In most cases, it is due to an invalid cookie value set in your browser that causes the error. You may inspect the network activity based on the browser's Developer Tools and look for requests to /wdkserver. If you are unable to inspect the network activity, please contact IBM SevOne Support.
If you observe the following error message coming back from /wdkserver, remove the offending cookie or disable strict headers in wdkserver. If you need assistance with this, please contact IBM SevOne Support.
{"statusCode":400,"error":"Bad Request","message":"Invalid cookie value"}
- Using a text editor of your choice, edit /opt/SevOne/chartconfs/di_custom.yaml to include
the following environment variable and then, save
it.
wdkserver: env: DISABLE_STRICT_HEADER: true
- Apply the change made to /opt/SevOne/chartconfs/di_custom.yaml
file.
$ sevone-cli playbook up --tags apps
Unable to connect to the server: x509: certificate has expired
If you are seeing error message, x509: certificate has expired, when running kubectl commands, your certificates have expired and need to be rotated manually. Please refer to SevOne Data Insight Administration Guide > section Rotate Kubernetes Certificates for details.
[ WARN ] No upgrade available
The No upgrade available warning usually occurs when attempting to retry a failed upgrade or if the upgrade .tgz file is placed in an incorrect directory.
- Ensure the .tgz file is in the correct directory as outlined in SevOne Data Insight Upgrade Process Guide > section Confirm SevOne Data Insight Version.
- Using ssh , log into SevOne Data Insight as sevone
.
$ ssh sevone@<SevOne Data Insight 'control plane' node IP address or hostname>
- Revert your SevOne Data Insight major / minor version in /SevOne.info to a
prior / lower version using a text editor of your
choice.
$ vi /SevOne.info
Example# 1
Assume the current SevOne Data Insight version is 6.7.0. The version prior to SevOne Data Insight 6.7.0 is SevOne Data Insight is 6.6. In this case, if you want to go to the prior version, you must change the major and minor version to go to the prior / lower version.major = 6 # e.g.: if this is `6` then leave it as-is minor = 7 # e.g.: if this is `7` then set it to `6` patch = 0 build = 160 # e.g.: enter the build number for the prior version i.e. 160
The prior version is,
major = 6 minor = 6 patch = 0 build = 139
Example# 2
Assume the current SevOne Data Insight version is 6.5.0. The version prior to SevOne Data Insight 6.5.0 is SevOne Data Insight is 3.14. In this case, if you want to go to the prior version, you must change the major and minor version to go to the prior / lower version.major = 6 # e.g.: if this is `6` then set it to `3` minor = 5 # e.g.: if this is `5` then set it to `14` or `13` or lower version patch = 0 build = 67 # e.g.: enter the build number for the prior version i.e. 162
The prior version is,
major = 3 minor = 14 patch = 0 build = 162
Example# 3
Assume the current SevOne Data Insight version is 3.14. The version prior to SevOne Data Insight 3.14 is SevOne Data Insight is 3.13. In this case, if you want to go to the prior version, you must change the major and minor version to go to the prior / lower version.major = 3 minor = 14 # e.g.: if this is `14` then set it to `13` patch = 0 build = 162 # e.g.: enter the build number for the prior version i.e. 54
The prior version is,
major = 3 minor = 13 patch = 0 build = 54
Domain Name Resolution (DNS) not working
The DNS server must be able to resolve SevOne Data Insight's hostname on both the control plane and the agent nodes otherwise, SevOne Data Insight will not work. This can be done by adding your DNS servers via nmtui or by editing /etc/resolv.conf file directly as shown in the steps below.
In the example below, let's use the following SevOne Data Insight IP addresses.
Hostname | IP Address | Role |
---|---|---|
sdi-node01 | 10.123.45.67 | control plane |
sdi-node02 | 10.123.45.68 | agent |
Also, in this example, the following DNS configuration is used and DNS search records, sevone.com and nwk.sevone.com are used.
Nameserver | IP Address |
---|---|
nameserver | 10.168.16.50 |
nameserver | 10.205.8.50 |
- Using ssh, log into the designated SevOne Data Insight control plane node and
agent node as sevone from two different terminal windows.
SSH to 'control plane' node from terminal window 1
$ ssh sevone@10.123.45.67
SSH to 'agent' node from terminal window 2
$ ssh sevone@10.123.45.68
- Obtain a list of DNS entries in /etc/resolv.conf file for both control plane and
agent nodes in this example.
From terminal window 1
$ cat /etc/resolv.conf # Generated by NetworkManager search sevone.com nwk.sevone.com nameserver 10.168.16.50 nameserver 10.205.8.50
From terminal window 2
$ cat /etc/resolv.conf # Generated by NetworkManager search sevone.com nwk.sevone.com nameserver 10.168.16.50 nameserver 10.205.8.50
- Ensure that DNS server can resolve SevOne Data Insight's hostname / IP address on both the
control plane and the agent nodes along with the DNS entries in
/etc/resolv.conf file (see the search line and
nameserver(s)).
From terminal window 1 The following output shows that the DNS server can resolve hostname / IP address on both the control plane and the agent nodes.
Check if 'nslookup' resolves the 'control plane' IP address
$ nslookup 10.123.45.67 67.45.123.10.in-addr.arpa name = sdi-node01.sevone.com.
Check if 'nslookup' resolves the 'control plane' hostname
$ nslookup sdi-node01.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sdi-node01.sevone.com Address: 10.123.45.67
Check if 'nslookup' resolves the 'agent' IP address
$ nslookup 10.123.45.68 68.45.123.10.in-addr.arpa name = sdi-node02.sevone.com.
Check if 'nslookup' resolves the 'agent' hostname
$ nslookup sdi-node02.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sdi-node02.sevone.com Address: 10.123.45.68
nslookup name 'sevone.com' in search line in /etc/resolve.conf
$ nslookup sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sevone.com Address: 23.185.0.4
nslookup name 'nwk.sevone.com' in search line in /etc/resolve.conf
$ nslookup nwk.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: nwk.sevone.com Address: 25.185.0.4
nslookup nameserver '10.168.16.50' in /etc/resolve.conf
$ nslookup 10.168.16.50 50.16.168.10.in-addr.arpa name = infoblox.nwk.sevone.com.
nslookup nameserver '10.205.8.50' in /etc/resolve.conf
$ nslookup 10.205.8.50 50.8.205.10.in-addr.arpa name = infoblox.colo2.sevone.com.
From terminal window 2 The following output shows that the DNS server can resolve hostname / IP address on both the control plane and the agent nodes.
Check if 'nslookup' resolves the 'agent' IP address
$ nslookup 10.123.45.68 68.45.123.10.in-addr.arpa name = sdi-node02.sevone.com.
Check if 'nslookup' resolves the 'agent' hostname
$ nslookup sdi-node02.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sdi-node02.sevone.com Address: 10.123.45.68
Check if 'nslookup' resolves the 'control plane' IP address
$ nslookup 10.123.45.67 67.45.123.10.in-addr.arpa name = sdi-node01.sevone.com.
Check if 'nslookup' resolves the 'control plane' hostname
$ nslookup sdi-node01.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sdi-node01.sevone.com Address: 10.123.45.67
nslookup name 'sevone.com' in search line in /etc/resolve.conf
$ nslookup sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: sevone.com Address: 23.185.0.4
nslookup name 'nwk.sevone.com' in search line in /etc/resolve.conf
$ nslookup nwk.sevone.com Server: 10.168.16.50 Address: 10.168.16.50#53 Name: nwk.sevone.com Address: 25.185.0.4
nslookup nameserver '10.168.16.50' in /etc/resolve.conf
$ nslookup 10.168.16.50 50.16.168.10.in-addr.arpa name = infoblox.nwk.sevone.com.
nslookup nameserver '10.205.8.50' in /etc/resolve.conf
$ nslookup 10.205.8.50 50.8.205.10.in-addr.arpa name = infoblox.colo2.sevone.com.
Note: If any of the nslookup commands in terminal window 1 or terminal window 2 above fail or return one or more of the following, you must first resolve the name resolution issue otherwise, SevOne Data Insight will not work.Examples
** server can't find 67.45.123.10.in-addr.arpa.: NXDOMAIN or ** server can't find 68.45.123.10.in-addr.arpa.: NXDOMAIN or *** Can't find nwk.sevone.com: No answer etc.
If the name resolution fails due to any reason after the deployment of SevOne Data Insight, then this could also lead to the failure of normal operations in SevOne Data Insight. Hence, it is recommended to ensure that the DNS configuration is always working.
ERROR: Failed to open ID file '/home/sevone/.pub': No such file or directory
As a security measure, fresh installations do not ship with pre-generated SSH keys.
- Using ssh , log into SevOne Data Insight as sevone
.
$ ssh sevone@<SevOne Data Insight 'control plane' node IP address or hostname>
- Execute the following command to generate unique SSH keys for your
cluster.
$ sevone-cli cluster setup-keys
TimeShift between SevOne Data Insight & SevOne NMS
If the time difference between SevOne Data Insight and SevOne NMS appliances is more than 5 minutes, the following steps must be performed.
- Check the time on the SevOne Data Insight
appliance.
$ date
- Check the time on the SevOne NMS
appliance.
$ date
- If the time difference between SevOne Data Insight and SevOne NMS appliances is more than 5
minutes, then check the NTP configuration on both the appliances. Both the appliances must be
time-sync'd to the NTP.Note: If the NTP server is unavailable, manually set the same time on both the appliances as shown in the example below.Example
$ date --set="6 OCT 2023 18:00:00"
Pre-check Failures
TASK [ Confirm free space ]
- If this task fails, you can try to clean up old installer files that may be found in various
parts of the file system. For example,
- /root
- /home/sevone
- /opt/SevOne/upgrade
- /var/lib/rancher/k3s/agent/images
- Clear scheduled report caching. Execute the following command to delete files older than
one-week (604800 seconds).Note: SevOne Data Insight maintains a cache of the printed PDF's for scheduled reports. Depending on your usage of report scheduling, it is recommended to occasionally clean up the cache to free up disk space.
$ sevone-cli exec graphql -- "npm run asset-sweeper -- --prefix=scheduledReports --age=604800"
You can run the following command to delete files older than one week (604800 seconds).
- Running the command below helps track down what files in the system are taking up the most
space.
$ du -sh /*
- In your investigation, if you find that the following directory is filling up your HDD (Hard
Disk Drive), then it is a container or a pod that is the
culprit.
$ /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots
You must continue running du -sh to further pinpoint the exact container or pod. In some cases, it may be the printer container taking up the space due to node.js core dump files. Execute the following command to identify those files.
$ find /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots \ -name "core\.*"
TASK [ FN000## ]
- If the pre-check fails to validate any of the Internal Field Notices (IFNs), apply the IFNs and
reboot the appliance(s).Note: If FN00068 and/or FN00070 needs to be applied, please contact IBM SevOne Support for the IFN's patch instructions / workaround / solution.
- Rerun the pre-check playbook to verify if the IFNs have been applied or you may manually verify
it.
$ ansible-playbook /opt/SevOne/upgrade/ansible/playbooks/precheck.yaml
- Validate that Internal Field Notices (IFNs) FN00068 and FN00070 have been applied
on both the control plane and agent nodes.
Check if FN00068 is applied
This issue is due to a bug with the CentOS kernel reporting incorrect memory usage. Due to this, Kubernetes does not schedule or restart any pods on the affected node because it thinks there is no memory remaining. To check if the IFN needs to be applied, execute the following command.
$ cat /proc/cmdline | grep -qi 'cgroup.memory=nokmem' || \ echo ">> IFN 68 NOT APPLIED"
Check if FN00070 is applied
This issue only affects users who have been migrated over from RHEL (using the unrhel migration tool). To check if the IFN is applied, execute the following command.
$ nmcli dev | grep -i ^eth && (cat /proc/cmdline | \ grep -qi 'biosdevname=0 net.ifnames=0' || \ echo ">> IFN 70 NOT APPLIED") || \ echo ">> IFN 70 NOT NEEDED"
Install / Upgrade Failures
TASK [ k3s : Initialize the cluster ]
If this task fails, you can observe the status of k3s service by using the following command.
$ systemctl status k3s
Unable to find suitable network address. No default routes found.
Check if there is a default route added to the routing table.
$ ip route | grep default
If this returns empty, you will need to add a default route.
Add default route
$ ip route add default via <default_gateway>
TASK [ Stop k3s-server if upgrading to new version ]
If this task does not complete within a minute then, you will have to apply the following workaround before continuing with the upgrade.
Stop API and Client processes
$ sudo systemctl status sevone-guii-@api
$ sudo systemctl status sevone-guii-@client
$ sudo systemctl start sevone-guii-@api
$ sudo systemctl start sevone-guii-@client
$ sudo systemctl stop sevone-guii-@api
$ sudo systemctl stop sevone-guii-@client
for SevOne Data Insight <= 3.9
$ sed -i 's/.*k3s-killall.sh.*/ echo noop/' \
/opt/SevOne/upgrade/ansible/playbooks/roles/k3s/tasks/02_setup.yaml
$ ansible-playbook /opt/SevOne/upgrade/ansible/playbooks/up.yaml \
--tags kube,apps,kernel
for SevOne Data Insight >= 3.10
$ sed -i 's/.*k3s-killall.sh.*/ echo noop/' \
/opt/SevOne/upgrade/ansible/playbooks/roles/k3s/tasks/02_setup.yaml
$ sevone-cli playbook up --tags kube,apps,kernel
This is due to an upstream issue with the k3s-killall.sh script hanging when attempting to shut down some running containerd processes.
TASK [ prep : Ensure hostname set ]
When attempting to run an upgrade, you may run into the following error.
TASK [prep : Ensure hostname set] ****************************************************************************************************************************************************
fatal: [sevonek8s]: FAILED! => {“changed”: false, \
“msg”: “Command failed rc=1, out=, err=Could not get property: \
Failed to activate service ‘org.freedesktop.hostname1’: timed out\n”}
This happens when hostnamed has likely crashed. Restart hostnamed.
$ sudo systemctl restart systemd-hostnamed
$ sudo reboot
TASK [ freight : Install centos-update-*.el7.tgz ]
When upgrading to SevOne Data Insight >= 3.8 or higher, the culprit is likely that the packages are too up-to-date. This can happen if your machine has internet access and can resolve yum package servers. The fix is to retry the upgrade while skipping the yum packages with broken dependencies.
- Remove lingering yum packages or package
conflicts.
$ sudo yum clean all $ sudo rm -rf /var/cache/yum/
- Retry the upgrade via the Command Line Interface
(CLI).
$ sevone-cli playbook up --extra-vars "freight_install_skip_broken=yes"
TASK [ helm upgrade/install default/<chart_name> ]
There are several reasons why this task may fail. Unfortunately, Helm does not report errors or useful debug information. Due to this, further investigation is required for this. Please look for the stderr key in the large JSON body that is returned in the task output.
UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
This failure is caused when Helm is unable to manually rollback a failed deployment. It may have occurred before you initiated the upgrade, perhaps when configuring the SevOne Data Insight Helm chart. Please refer to SevOne Data Insight Administration Guide, section Helm Chart for details.
Execute the following command.
$ helm rollback di
$ helm rollback ingress
Upon completion of the command above, you may then resume the upgrade by executing the following command.
$ sevone-cli playbook up --tags apps,kernel
UPGRADE FAILED: to deploy apps
If you are upgrading between 3.5.x versions, for example, from 3.5.1 to 3.5.3, upgrade will fail to deploy apps.
To fix this, execute the following commands before performing an upgrade.
$ sevone-cli playbook up --skip-tags apps,kernel
$ sudo systemctl restart k3s
$ ssh sevone@<'agent' IP address>
$ sudo systemctl restart k3s-agent
UPGRADE FAILED: current release manifest contains removed kubernetes api(s) for this kubernetes version
This is caused when upgrading from SevOne Data Insight 3.5.x directly to SevOne Data Insight 3.11 and above. Please refer to SevOne Data Insight Upgrade Process Guide > Pre-Upgrade Checklist > section Version Matrix for more information.
Execute the following steps.
- Go to /home/sevone
directory.
$ cd /home/sevone
- Create fix-manifest.sh script
file.
$ touch fix-manifest.sh
- Using a text editor of your choice, edit /home/sevone/fix-manifest.sh script file, add
the following and then, save
it.
$ #!/bin/bash # set up vars. change these as needed release=di namespace=default # create temp file to output files to tmp_dir=$(mktemp -d -t fix-manifest-XXXXX) # grab helm release object and decode it releaseObject=$(kubectl get secret -l owner=helm,status=deployed,name=$release --namespace $namespace | awk '{print $1}' | grep -v NAME) kubectl get secret $releaseObject -n $namespace -o yaml > $tmp_dir/$release.release.yaml cp $tmp_dir/$release.release.yaml $tmp_dir/$release.release.bak cat $tmp_dir/$release.release.yaml | grep -oP '(?<=release: ).*' | base64 -d | base64 -d | gzip -d > $tmp_dir/$release.release.data.decoded sed -i -e 's/networking.k8s.io\/v1beta1/networking.k8s.io\/v1/' $tmp_dir/$release.release.data.decoded cat $tmp_dir/$release.release.data.decoded | gzip | base64 | base64 > $tmp_dir/$release.release.data.encoded # patch the helm release object tr -d "\n" < $tmp_dir/$release.release.data.encoded > $tmp_dir/$release.release.data.encoded.final releaseData=$(cat $tmp_dir/$release.release.data.encoded.final) sed 's/^\(\s*release\s*:\s*\).*/\1'$releaseData'/' $tmp_dir/$release.release.yaml > $tmp_dir/$release.final.release.yaml kubectl apply -f $tmp_dir/$release.final.release.yaml -n $namespace # clean up rm -rf $tmp_dir
- Execute fix-manifest.sh script
file.
$ bash fix-manifest
General Debugging Tips
There are several reasons why task helm upgrade/install default/<chart_name> may fail. Helm does not provide useful debug information and further investigation is required to understand the failure.
- Execute the following command to retry the
upgrade.
$ sevone-cli playbook up --tags apps
- While the above command is in progress, from another terminal window, run k9s.
- Monitor the status of each pod and refer to the table below for some basic debugging
techniques.If logs are not shown when observing the logs via k9s, press 0 to enable logs for all time.
Status Action CrashLoopBackOff Check the pod logs by hovering over the pod and pressing 1. Error Check the pod logs by hovering over the pod and pressing Check the pod logs by hovering over the pod and pressing 1. Pending Check the pod event log by hovering over it and pressing d.
Other Issues
Configuration Check
SevOne Data Insight requires configuration of several components to operate properly. When troubleshooting issues, it can be cumbersome to check the configuration and health of the system because there are different tools and network requirements, such as exposing certain ports.
The administrator would benefit from a tool used to display the configuration and health of the Data Insight environment so that a misconfiguration or system error can be quickly identified.
CLI method on production Kubernetes
$ ssh sevone@<SevOne Data Insight 'control plane' node IP address or hostname>>
$ sevone-cli exec graphql -- npm run health
GraphQL method
Here are some sample GraphQL queries.
Check Data Insight system health
query health {
health {
minio { ...componentHealthDetails }
mysql { ...componentHealthDetails }
rabbitMq { ...componentHealthDetails }
redis { ...componentHealthDetails }
reportScheduler { ...componentHealthDetails }
soa { ...componentHealthDetails }
}
}
fragment componentHealthDetails on ComponentHealthDetails {
host port error ok
}
Check a single datasource
query ds {
datasources(ids: [ 1 ]) {
id
name
address
}
}
Check all datasources on all tenants
query tenants {
tenants {
id
name
datasources {
id
name
address
}
}
}
Check datasources at authentication
mutation auth {
authentication(tenant: "MyTenant", username: "admin", password: "password") {
token
success
datasources {
id
name
address
}
}
}
Error getting NMS IP List
SOA must be on the latest version on all appliances in SevOne NMS cluster. Command Line Interface (CLI) must be used to upgrade SOA on all peers as the graphical user interface (GUI) only upgrades SOA for the NMS appliance you are connected to.
Error
$ sevone-cli soa upgrade /opt/SevOne/upgrade/utilities/SevOne-soa-*.rpm --all-peers
>> [INFO] ATTEMPTING TO AUTO-DETECT SOA DATASOURCES...
Defaulted container "mysql" out of: mysql, metrics
...
...
<returns an ERROR>
If you get this error, please make sure you are logged into SevOne Data Insight as sevone.
$ ssh sevone@<SevOne Data Insight IP address or hostname>
Now, re-run the command to upgrade SOA.
SOA must be on the latest version on all appliances in SevOne NMS cluster. Command Line Interface (CLI) must be used to upgrade SOA on all peers as the graphical user interface (GUI) only upgrades SOA for the NMS appliance you are connected to.
$ sevone-cli soa upgrade /opt/SevOne/upgrade/utilities/SevOne-soa-*.rpm --all-peers
Incorrect information entered at Bootstrap and/or Provisioning prompts?
If you entered incorrect information at bootstrap and/or provisioning prompts, execute the following commands to allow you to override the input. These commands can only be run once your SevOne Data Insight is up and running.
$ ssh sevone@<SevOne Data Insight IP address or hostname>
$ sevone-cli exec graphql -- npm run bootstrap -- -f
$ sevone-cli exec graphql -- npm run provision -- -f
Pod Stuck in a terminating State
If a pod is ever stuck and you want it to reboot, you can append --grace-period=0 --force to the end of your delete pod command.
Example
$ ssh sevone@<SevOne Data Insight IP address or hostname>
$ kubectl delete pod $(kubectl get pods | grep 'dsm' | awk '{print $1}') --grace-period=0 --force
Review / Collect Logs
Logs can be collected at the pod level. The status of pods must be Running.
By default, resource-type = pod. For logs where resource-type = pod, you may choose to only pass the pod-name only; resource-type is optional.
Using ssh, log into SevOne Data Insight as sevone.
$ ssh sevone@<SevOne Data Insight IP address or hostname>
Example: Get 'pod' names
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
di-create-secrets-xllfj 0/1 Completed 0 22h
di-upgrade-l2cs8 0/1 Completed 0 22h
clienttest-success-89lmt 0/1 Completed 0 22h
clienttest-fail-lb8mq 0/1 Completed 0 22h
di-report-version-sweeper-28276440-zpcxt 0/1 Completed 0 20h
ingress-ingress-nginx-controller-54dfdbc9cf-g9wdz 1/1 Running 0 22h
di-prometheus-node-exporter-shnxk 1/1 Running 0 22h
di-graphql-7d88c8c7b5-fbwgc 1/1 Running 0 22h
di-ui-5b8fbcfc54-rtwlq 1/1 Running 0 22h
di-kube-state-metrics-6f4fbc67cb-tsbbk 1/1 Running 0 22h
di-migrator-fdb9dd58b-29kl2 2/2 Running 0 22h
ingress-ingress-nginx-defaultbackend-69f644c9dc-7jvvs 1/1 Running 0 22h
di-printer-7888679b59-cqp9q 2/2 Running 0 22h
di-scheduler-7845d64d57-bdsm2 1/1 Running 0 22h
di-registry-68c7bbc47b-45l5v 1/1 Running 0 22h
di-djinn-api-5b4bbb446b-prsjd 1/1 Running 1 (22h ago) 22h
di-mysql-0 2/2 Running 0 22h
di-prometheus-server-7dc67cb6b5-bjzn5 2/2 Running 0 22h
di-redis-master-0 2/2 Running 0 22h
di-wdkserver-6db95bb9c9-5w2kt 2/2 Running 0 22h
di-assetserver-5c4769bd8-6f2hw 1/1 Running 0 22h
di-prometheus-node-exporter-mp5xf 1/1 Running 0 22h
di-report-tombstone-sweeper-28277040-kj227 1/1 Running 0 10h
datasource-operator-controller-manager-5cf6f7f675-h5lng 2/2 Running 3 (5h37m ago) 22h
di-asset-sweeper-28277645-tq6gb 0/1 Completed 0 12m
di-user-sync-28277645-dl6ks 0/1 Completed 0 12m
di-asset-sweeper-28277650-hxwvn 0/1 Completed 0 7m46s
di-user-sync-28277650-6kxf7 0/1 Completed 0 7m46s
di-asset-sweeper-28277655-gjtpr 0/1 Completed 0 2m46s
di-user-sync-28277655-chgxd 0/1 Completed 0 2m46s
Get resource types
Get 'all' resource types
$ kubectl get all | more
Get resource type for a pod
$ kubectl get all | grep <pod-name>
Example: Get resource type for pod-name containing 'printer'
$ kubectl get all | grep printer
pod/di-printer-68f6bddb6f-hkhdt 1/1 Running 2 (27h ago) 2d3h
deployment.apps/di-printer 1/1 1 1 2d3h
replicaset.apps/di-printer-68f6bddb6f 1 1 1 2d3h
Example: Get resource type for pod-name containing 'rabbitmq'
$ kubectl get all | grep rabbitmq
pod/di-rabbitmq-0 1/1 Running 2 (27h ago) 2d3h
service/di-rabbitmq-headless ClusterIP None <none> 4369/TCP,5672/TCP,25672/TCP,15672/TCP 2d3h
service/di-rabbitmq ClusterIP 192.168.108.109 <none> 5672/TCP,4369/TCP,25672/TCP,15672/TCP,9419/TCP 2d3h
statefulset.apps/di-rabbitmq 1/1 2d3h
di-printer, di-rabbitmq, etc. in the examples above are pod names.
Get logs
$ kubectl logs <resource-type>/<pod-name>
Example: Get logs for pod-name 'di-printer'
$ kubectl logs deployment.apps/di-printer
OR
$ kubectl logs deploy/di-printer
Example: Get logs for pod-name 'di-rabbitmq'
$ kubectl logs statefulset.apps/di-rabbitmq
OR
$ kubectl logs sts/di-rabbitmq
Example: Get logs for pod-name 'rabbitmq' with timestamps
$ kubectl logs statefulset.apps/di-rabbitmq --timestamps
OR
$ kubectl logs sts/di-rabbitmq --timestamps
By default, resource-type = pod.
In the example below, to obtain the logs for <resource-type>/<pod-name> = pod/di-mysql-0, <resource-type> pod is optional.
Example: <resource-type> = pod; <resource-type> is optional
$ kubectl logs pod/di-mysql-0
OR
$ kubectl logs di-mysql-0
Collect Logs for a Pod with One Container
- Using ssh, log into SevOne Data Insight as
sevone.
$ ssh sevone@<SevOne Data Insight IP address or hostname>
- Obtain the list of containers that belong to a pod.
Example: Pod name 'di-mysql-0' contains one container, 'mysql'$ kubectl get pods di-mysql-0 -o jsonpath='{.spec.containers[*].name}{"\n"}' mysql metrics
- Collect logs.Note: For pods with one container only, -c < container-name > in the command below is optional.
$ kubectl logs <pod-name> -c <container-name> or $ kubectl logs <pod-name>
Example
$ kubectl logs di-mysql-0 -c mysql or $ kubectl logs di-mysql-0
Collect Logs for a Pod with More Than One Container
- Using ssh, log into SevOne Data Insight as
sevone.
$ ssh sevone@<SevOne Data Insight IP address or hostname>
- Obtain the list of containers that belong to a pod.
Example: Pod name 'svclb-ingress-ingress-nginx-controller-6fbfd' contains two containers, 'lb-port-80' and 'lb-port-443'$ kubectl get pods svclb-ingress-ingress-nginx-controller-5pcm7 \ -o jsonpath='{.spec.containers[*].name}{"\n"}' lb-port-80 lb-port-443
- Collect logs.Important: For pods with more than one container, -c < container-name > is required.
$ kubectl logs <pod-name> -c <container-name>
Example: Get logs for <container-name> = lb-port-80
$ kubectl logs svclb-ingress-ingress-nginx-controller-vzcqj -c lb-port-80
Example: Get logs for <container-name> = lb-port-443
$ kubectl logs svclb-ingress-ingress-nginx-controller-vzcqj -c lb-port-443
Collect All Logs
- To collect all the logs relevant for SevOne Data Insight pods and its containers, create a
working directory where all the logs can be
collected.
$ TMPDIR="/tmp/sdi_logs/$(date +%d%b%y)" $ mkdir -p $TMPDIR
- Execute the following command to collect all logs for all SevOne Data Insight
containers.Note: The --timestamps option in the command below allows you to collect the logs with the timestamps.
Example: Command to collect logs from all SevOne Data Insight Pods and containers
$ for POD in $(kubectl get pods --no-headers -n default | \ awk '{print $1}'); do for CONTAINER in $(kubectl get pods \ $POD -o jsonpath='{.spec.containers[*].name}{"\n"}'); \ do echo "Collecting logs for POD: $POD - CONTAINER: \ $CONTAINER in log file $TMPDIR/$POD_$CONTAINER.log.gz" ; \ kubectl logs $POD -c $CONTAINER --timestamps | \ gzip > $TMPDIR/$POD_$CONTAINER.log.gz 2>&1; done ; done
The for command is shown here with indentations for clarity.
for POD in $(kubectl get pods --no-headers -n default | awk '{print $1}') ; do for CONTAINER in $(kubectl get pods $POD -o jsonpath='{.spec.containers[*].name}{"\n"}') ; do echo "Collecting logs for POD: $POD - CONTAINER: $CONTAINER in log file $TMPDIR/$POD_$CONTAINER.log.gz" ; kubectl logs $POD -c $CONTAINER --timestamps | gzip > $TMPDIR/$POD_$CONTAINER.log.gz 2>&1 ; done ; done
Command to see files contained in $TMPDIR
$ ls -lh $TMPDIR
- Once the logs are collected, the contents can be put in a tar file. There is no need to compress
again since the logs are already
compressed.
$ tar -cf /tmp/sdi_logs-$(date +%d%b%y).tar $TMPDIR $ ls -l /tmp/sdi_logs-$(date +%d%b%y).tar $ md5sum /tmp/sdi_logs-$(date +%d%b%y).tar
- Delete the log directory to free-up the
space.
$ rm -rf $TMPDIR
- You may upload the tar file in /tmp/sdi_logs-$(date +%d%b%y).tar for further investigation.
'Agent' Nodes in a Not Ready State after Rebooting
Perform the following actions if the agent nodes are in a Not Ready state after rebooting.
Ensure Data Insight is 100% deployed
Check the status of the deployment by running the following command. Ensure that everything is in Running status.
$ ssh sevone@<SevOne Data Insight IP address or hostname>
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
di-create-secrets-xllfj 0/1 Completed 0 22h
di-upgrade-l2cs8 0/1 Completed 0 22h
clienttest-success-89lmt 0/1 Completed 0 22h
clienttest-fail-lb8mq 0/1 Completed 0 22h
di-report-version-sweeper-28276440-zpcxt 0/1 Completed 0 20h
ingress-ingress-nginx-controller-54dfdbc9cf-g9wdz 1/1 Running 0 22h
di-prometheus-node-exporter-shnxk 1/1 Running 0 22h
di-graphql-7d88c8c7b5-fbwgc 1/1 Running 0 22h
di-ui-5b8fbcfc54-rtwlq 1/1 Running 0 22h
di-kube-state-metrics-6f4fbc67cb-tsbbk 1/1 Running 0 22h
di-migrator-fdb9dd58b-29kl2 2/2 Running 0 22h
ingress-ingress-nginx-defaultbackend-69f644c9dc-7jvvs 1/1 Running 0 22h
di-printer-7888679b59-cqp9q 2/2 Running 0 22h
di-scheduler-7845d64d57-bdsm2 1/1 Running 0 22h
di-registry-68c7bbc47b-45l5v 1/1 Running 0 22h
di-djinn-api-5b4bbb446b-prsjd 1/1 Running 1 (22h ago) 22h
di-mysql-0 2/2 Running 0 22h
di-prometheus-server-7dc67cb6b5-bjzn5 2/2 Running 0 22h
di-redis-master-0 2/2 Running 0 22h
di-wdkserver-6db95bb9c9-5w2kt 2/2 Running 0 22h
di-assetserver-5c4769bd8-6f2hw 1/1 Running 0 22h
di-prometheus-node-exporter-mp5xf 1/1 Running 0 22h
di-report-tombstone-sweeper-28277040-kj227 1/1 Running 0 10h
datasource-operator-controller-manager-5cf6f7f675-h5lng 2/2 Running 3 (5h37m ago) 22h
di-asset-sweeper-28277645-tq6gb 0/1 Completed 0 12m
di-user-sync-28277645-dl6ks 0/1 Completed 0 12m
di-asset-sweeper-28277650-hxwvn 0/1 Completed 0 7m46s
di-user-sync-28277650-6kxf7 0/1 Completed 0 7m46s
di-asset-sweeper-28277655-gjtpr 0/1 Completed 0 2m46s
di-user-sync-28277655-chgxd 0/1 Completed 0 2m46s
Restart SOA
If SevOne NMS has been upgraded or downgraded, please make sure that the SOA container is restarted after a successful upgrade/downgrade. Execute the following command.
From SevOne NMS appliance,
$ ssh root@<NMS appliance>
$ supervisorctl restart soa