SevOne SDN Collector Advanced Configuration & Troubleshooting Guide

This document offers detailed instructions for executing advanced configurations of the SDN collector by utilizing configuration variables. It also includes troubleshooting guidelines.

Advanced Configuration

Please refer to SDN Plugin in SevOne NMS User Guide for the APIC Connectivity details to configure the Cisco ACI solution.

Example

Example - create SDN apic device

In this example, the device name for the device with SDN plugin being created is Apic and SDN plugin is enabled. After you configure the plugin and save the configuration, two files are created in /config/SDN folder. The file names are in the following format.
  1. <device-name>.yaml
  2. default-<device-name>.yaml
In our example, the device name is Apic. So, after the device is configured with SDN plugin, the two files that will get created in /config/SDN folder are:
  1. Apic.yaml
  2. default-Apic.yaml

Example: Apic.yaml (sample file)

Important: Based on SDN plugin configuration and the values entered in the fields to configure the plugin, Apic.yaml file is created. Please refrain from modifying Apic.yaml file. If you want to set or modify any configuration field, you must edit default-Apic.yaml instead.

The list of SDN plugin variables can be found in Configuration Variables table below.


deployment_name: Apic
version: 7.2.0
run_agents_immediately_and_exit_collector: true
log:
  level: debug
agent:
  include:
  - InstallerAgent
  - TopologyInstallerAgent
  - PodAgent
  - NodeAgent
  - PodExtendedAgent
  - NodeExtendedAgent
  - NodeInterfaceAgent
  - MetadataAgent
  - TopologyAgent
  - DeviceDescriptionAgent
  - FaultStreamingAgent
  - ObjectGroupAgent
  - ExternalSwitchAgent
  - HypervisorAndVirtualMachineAgent
vendor:
  is_multi_site_mode: false
  no_prefix: false
  site:
    name: Apic
    apic_url: https://10.52.0.171
    apic_uid: developer
    apic_password: DevTeam1234#
    device_name_prefix: MyPrefix
    fault_configuration_filename: ""
    timeout: 30s
  page_size: 10000
  sleep_time: 200
  dn_order: true
  do_nodes_traffic: true
  fault_prefix: ""
  do_pod_traffic: true
  do_virtual_traffic: false
  do_bytes: true
  do_packets: false
  skip_tunnel_if: true
  skip_off_vm: true
  skip_bad_nic: true
  pod_agent:
    schedule: ""
  node_agent:
    schedule: ""
  pod_extended_agent:
    schedule: ""
  node_interface_agent:
    schedule: ""
  node_extended_agent:
    schedule: ""
  external_switch_agent:
    schedule: ""
  hypervisor_and_virtual_machine_agent:
    schedule: ""
  topology_agent:
    schedule: ""
  object_group_agent:
    schedule: ""
nms:
  api:
    insecure_tls_connection: true
    host: 127.0.0.1
    v2_api_key: eyJhbGciOiJIUzUxMiJ9eyJpc3MiOiJhZG1pbiJ92wPJ-R9zaAoD3sJ95dSzN_irIaLn7E_o1SpHrkpVTOegoInNZ0r-s7zELy6GJS7bdLJuExqF9ksB4JfMHlcKJA
    v3_api_key: eyJ1dWlkIjoiYzNhMTc1NGEtZDBjMC00ZTczLWE1YzgtODk5OTBiMWMxZDQ3IiwiYXBwbGljYXRpb24iOiJTRE4iLCJlbnRyb3B5IjoiazNZN0JMWGIwWVBCbzhzcGlmdmpUbjdOOHlEenh0WFpPUktnZVZVWVRTTzQzTWtwMDZSVmozQ3p0RWFUYlZkbyJ9
fault_config:
  filter: []
  granular_fault_filter: []
  severity_mapping: []
 

Example: default-Apic.yaml (sample file)

Important:

After the SDN plugin is configured and you want to set / modify the SDN plugin configuration variables, using a text editor of your choice, you may edit default-Apic.yaml.

Please see the Configuration Variables table below for the list of SDN plugin configuration variables available.

Log rotations are performed automatically.

As of SDN 7.2.1, the log path directory has been changed from /var/log to /var/log/SDN. For example, /var/log/SDN/<site name provided when adding SDN device>/<v7.2.x>/


deployment_name: ""
version: 7.2.0
run_agents_immediately_and_exit_collector: true
log:
  level: debug
agent:
  include:
  - InstallerAgent
  - TopologyInstallerAgent
  - PodAgent
  - NodeAgent
  - PodExtendedAgent
  - NodeExtendedAgent
  - NodeInterfaceAgent
  - MetadataAgent
  - TopologyAgent
  - DeviceDescriptionAgent
  - FaultStreamingAgent
  - ObjectGroupAgent
vendor:
  is_multi_site_mode: false
  no_prefix: true
  site:
    name: ""
    apic_url: ""
    apic_uid: ""
    apic_password: ""
    device_name_prefix: SiteName
    fault_configuration_filename: ""
    timeout: 30s
  page_size: 10000
  sleep_time: 200
  dn_order: true
  do_nodes_traffic: true
  fault_prefix: ""
  do_pod_traffic: true
  do_virtual_traffic: false
  do_bytes: true
  do_packets: false
  skip_tunnel_if: true
  skip_off_vm: true
  skip_bad_nic: true
  pod_agent:
    schedule: ""
  node_agent:
    schedule: ""
  pod_extended_agent:
    schedule: ""
  node_interface_agent:
    schedule: ""
  node_extended_agent:
    schedule: ""
  external_switch_agent:
    schedule: ""
  hypervisor_and_virtual_machine_agent:
    schedule: ""
  topology_agent:
    schedule: ""
  object_group_agent:
    schedule: ""
nms:
  api:
    insecure_tls_connection: true
    host: ""
    v2_api_key: ""
    v3_api_key: ""
fault_config:
  filter: []
  granular_fault_filter: []
  severity_mapping: [] 
 

Filter Alerts

When a device is added through a SDN plugin, by default all alerts are generated. To generate selective alerts, execute the following steps.
  1. SSH to SevOne NMS appliance as root user.
    ssh root@<NMS appliance>
  2. Change directory to /config/SDN.
    cd /config/SDN
  3. You will see two configuration files <device-name>.yaml and default-<device-name>.yaml for the device created through the SDN plugin. For example,
    
    ls
     
    Apic. yaml
    default-Apic.yaml
    
    where, Apic is the device name of the device created in the example above.
     
  4. Note: If you are configuring the alerts for the first time, the fault-config values in /config/SDN/default-<device-name>.yaml file will be blank.
    Using a text editor of your choice, edit and save /config/SDN/default-<device-name>.yaml file. Please refer to the table below for details on the variables in the .yaml file.
    For example,
    
    vi /config/default-Apic.yaml
     
    fault_config:
      filter:
      - filter_on: aci_severity
        filter_value:
        - aci-severity-1
        - aci-severity-2
      - filter_on: aci_fault_code
        filter_value:
        - fault-code-1
        - fault-code-2
      granular_fault_filter:
      - code: fault-code-3
        aci_severity:
        - aci-severity-3
        - aci-severity-4
      - code: fault-code-4
        aci_severity:
        - aci-severity-4
        - aci-severity-5
      severity_mapping:
      - code:
        - fault-code-1
        - fault-code-2
        severity: nms-severity-1
      - code:
        - fault-code-3
        - fault-code-4
        - fault-code-5
        severity: nms-severity-2
     

    Save /config/SDN/default-Apic.yaml file.

Note: During the next iteration, the main config i.e., <device-name>.yaml (for example, Apic.yaml) file will incorporate the settings as configured in the default-Apic.yaml file example above.
Variable Description
aci_severity This sheet is used to provide attributes of a fault to filter on.

code: Contains ACI severities to create SevOne NMS alerts on.

Important:
  • Sheet name must be aci_severity
  • First row of every column must be a header. For example, aci_severity
fault_code This sheet is used to provide attributes of a fault to filter on.

code: Contains fault codes to create SevOne NMS Alerts on. To learn more about the fault codes, please refer to https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/all/syslog/guide/b_ACI_System_Messages_Guide.html

Important:
  • Sheet name must be fault_code
  • First row of every column must be a header. For example, code
granular This sheet is used to provide attributes of a fault to filter on.

code: Contains fault codes to create SevOne NMS Alerts on. To learn more about the fault codes, please refer to https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/all/syslog/guide/b_ACI_System_Messages_Guide.html

aci_severity: ACI severities that the faults with the above-mentioned fault codes need to be mapped to.

Important:
  • Sheet name must be granular
  • First row of every column must be a header. For example, code, aci_severity
severity_mapping This sheet is used if the severity of faults with certain codes needs to be mapped to a particular SevOne NMS severity.

code: Contains fault codes mapped to the severity mentioned in severity.

severity: SevOne NMS severity that the faults with the above-mentioned fault codes need to be mapped to. Accepted keywords are emergency, alert, critical, error, warning, notice, info, or debug.

Important:
  • Sheet name must be severity_mapping
  • First row of every column must be a header. For example, severity, code

Troubleshooting

Upgrade from SevOne NMS version 6.8.x to 7.2.1 fails with Service and Pod issues

Upgrade process from SevOne NMS version 6.8 to 7.2.1, results in system initialization failure, potentially causing service outages and pods stuck in CrashLoopBackOff or Init status.

  1. Login to SevOne NMS as support user, and enter NMS container to view the list all running containers.
    
    ssh support@<NMS_IP_address>
    
    sudo su
    
    nms
    
    podman ps

    SDN Podman Container Id

  2. Locate the container ID of the faulty container in the displayed logs.
  3. Using the same container ID, run the command to access the faulty container.
    podman exec -it <container_id> /bin/sh
  4. Run the revert script.
    ./sdn-plugin revert.sh
    Important: SevOne NMS version 7.2.1 includes a revert script that streamlines the upgrade process.
  5. Exit the container.
    exit
  6. Following successful execution, restart the impacted container.
    podman restart <container_id>
  7. To verify whether the container is running successfully, run the command below and monitor the container logs.
    podman logs <container_id>

After performing the revert operation and restarting the container, the system will successfully complete the upgrade process.

Where can the log files be found?

Log files can be found in /var/log/<device-name> folder. For example, /var/log/Apic folder.
  1. Go to the log folder.
    cd /var/log/Apic
  2. Go to version 7.2, for example.
    cd 7.2.0
    You are now in /var/log/Apic/7.2.0.
  3. You will find a folder for each supported agent.
  4. If you are looking for the log file for agent, DeviceDescriptionAgent, for example, then go to folder DeviceDescriptionAgent.
    cd DeviceDescriptionAgent
    You are now in /var/log/Apic/7.2.0/DeviceDescriptionAgent.
  5. You will now find the log file <device-name>_DeviceDescriptionAgent_7.2.0.log file. For example, Apic_DeviceDescriptionAgent_7.2.0.log

Why are the SDN TopN views unavailable on NMS version 7.0.x after upgrading from version 6.0.x.?

If you are implementing new SevOne SDN solution deployment in NPM version 7.0.x environment, that was upgraded from NMS version 6.0.x to version 7.0.x without configuring the SDN solution beforehand, the SDN TopN OOTB views will not be available post-upgrade on the new SevOne NMS 7.0.x, and the SDN TopN OOTB reports in SevOne Data Insight will be inaccessible due to the absence of these views.

Note: Upgrading SevOne NMS from version 6.0.x to 7.0.x with the SDN solution already configured ensures that OOTB TopN reports remains accessible in SevOne Data Insight.
Important:

Do not import the .spk file for the SDN TopN views before configuring the SDN solution and adding the SDN devices. Execute these steps only after the SDN device has been enabled with the SDN plugin.

To import the .spk file for the SevOne SDN Solution TopN views, perform the steps on the cluster leader appliance of the NMS cluster running version 7.0.x.

To import the necessary TopN views for the new SevOne SDN Solution deployment, please follow the following steps.
  1. To execute the commands in the SDN container.
    podman exec -it <nms-container_id_or_name>/bin/sh
    Example
    podman exec -it nms-collections-sdn-plugin /bin/sh
  2. Copy the spk files from the /opt/reports/OOTB directory to the /config directory
    cp -r /opt/reports/OOTB/* /config/
  3. To exit from the current container.
    exit
  4. Login in to NMS container
    podman exec -it nms-nms-nms /bin/sh
  5. To import spk files, please run the commands as shown below
    SevOne-import --file config/SDNSolution-ACI-Capacity-reports-NMS.spk 
    SevOne-import --file config/SDNSolution-ootb-reports-NMS.spk
    Example:
    
    SevOne-import --file config/SDNSolution-ACI-Capacity-reports-NMS.spk
    * Verifying the package manifest...
    * Done.
    Allow overwrite: no
    Import tags only: no
    Dry run: no
    Output CSV: no
    Importing items for core/TopnView.
    Ignoring existing Top N View 'SDN Solution - ACI Capacity Average'.
    Ignoring existing Top N View 'SDN Solution - ACI Capacity Maximum'.
    Ignoring existing Top N View 'SDN Solution - ACI Capacity Minimum'.
    Ignoring existing Top N View 'SDN Solution - ACI Switch Capacity Average'.
    Ignoring existing Top N View 'SDN Solution - ACI Switch Capacity Maximum'.
    Ignoring existing Top N View 'SDN Solution - ACI Switch Capacity Minimum'.
    === Import complete 
     

Why does the upgrade from NMS version 6.8.x to 7.0.x result in a loss of functionality and data, after migrating from SDN solution to plugin mode?

The loss of functionality and data during the upgrade from NMS version 6.8.x to 7.0.x, when the SDN solution fails to transition to plugin mode, could be due to specific configurations that were not present or correctly set up in version 6.8.x, leading to a malfunction when attempting to migrate.

To resolve this issue, perform the following steps:
  1. Using ssh, login to SevOne NMS appliance as root.
    ssh root@<SevOne NMS appliance IP address>
  2. To create path:
    mkdir /tmp/clean_migration
    cd /tmp/clean_migration
  3. Download the following (latest) files from IBM Passport Advantage (https://www.ibm.com/software/passportadvantage/pao_download_software.html) via Passport Advantage Online. However, if you are on a legacy / flexible SevOne contract and do not have access to IBM Passport Advantage but have an active Support contract, please contact IBM SevOne Support for the latest files.

    In this case, you must download sevone_solutions_sdn_cleanMigration.tar.gz file and place it in /tmp/clean_migration directory.

  4. To retrieve the container id.
    podman ps
    Example: As shown in the output above, the container id is identified as 00bfd5a708e4.
  5. To copy the binary file in the sevone_solutions_sdn_cleanMigration.tar.gz to /tmp folder.
    sudo podman cp sevone_solutions_sdn_cleanMigration.tar.gz <container_id>:/tmp
    Example:
    sudo podman cp sevone_solutions_sdn_cleanMigration.tar.gz  00bfd5a708e4:/tmp
  6. To execute the commands in the SDN container.
    podman exec -it <container_id> /bin/sh
  7. Change directory to access /tmp folder.
    cd /tmp/
  8. Extract the binary file from .tar file.
    tar -xvf sevone_solutions_sdn_cleanMigration.tar.gz
  9. Execute the revert command.
    ./cleanMigration revert
  10. After executing the previous commands, please restart the container using the following command.
    podman restart <container_id>
    Note: Wait for at least 15 minutes after restarting the container to allow for the changes to take effect.

SelfMon Policy 'Trigger Condition' modifications do not persist after first save on SevOne NMS

Modifications made to the Policy Trigger Condition for SDN SelfMon policies do not persist after the first save in NMS. Upon refreshing the browser or re-opening the policy, the updated condition is not reflected. However, saving the modification a second time typically resolves the issue.

This issue is causing false positives in alerting.

To reproduce, execute the following steps.
  • Using a web browser of your choice, log in to SevOne NMS cluster.
  • From Events drop-down, click Configuration > Policy Browser.
  • Search for SDN.
  • Select siteN::SDN::SelfmonAvailability.
  • Select Trigger Conditions tab.
  • Under Conditions, edit the condition. By default, the values are set to:
    • Indicator = availability
    • Type = Time since newest data point
    • Threshold = 7200 seconds; 7200 seconds is the default OOTB value. If the threshold value is changed, the new value does not persist the first time.

      Workaround: To persist the value, you need to set the new value again.

    • Custom Message = Agent $objectName not available as $indicatorName is less than 100%, it is no longer running and/or is having communication issues writing to the NMS
  • Click Save button.

For details, please refer to Support Ticket DT444645.

Policy Name Trigger Condition Clear Condition Description
SelfmonAvailability 7200 seconds since newest data point 100 seconds since newest data point SDN Selfmon availability

Configuration Variables

YAML setting Default Value Description
msp_name ORGANIZATION MSP name for this instance. MSP is a grouping of one or more tenants. For example, ORGANIZATION.
version 7.2.0 Version of the build. For example, 7.2.0
run_agents_immediately_and_exit_collector true Will run all the agent in the include list sequentially and exit the collector.
log.level debug Log output minimum level. May be one of: debug, info, warning, error.
agent.include
  • InstallerAgent
  • TopologyInstallerAgent
  • PodAgent
  • NodeAgent
  • PodExtendedAgent
  • NodeExtendedAgent
  • NodeInterfaceAgent
  • MetadataAgent
  • TopologyAgent
  • DeviceDescriptionAgent
  • FaultStreamingAgent
  • ObjectGroupAgent
Set to array of agent names to explicitly include.
vendor.site.name (required) - <enter value> Provide the site name.
vendor.site.apic_URL (required) - <enter value> APIC IP address. For example, https://192.168.1.2
vendor.site.apic_uid (required) - <enter value> APIC username.
vendor.site.apic_password (required) - <enter value> APIC password.
vendor.site.device_name_prefix Site Name Common prefix name for all devices.
vendor.site.timeout 30s The amount of seconds to wait before timing out on attempting to connect to the APIC.
vendor.is_multi_site_mode false If set to True, run the collector in multisite mode. Default setting is false.
vendor.no_prefix false If set to true, prefixes will be provided to device names.
vendor.page_size 10000 The page size to use for paginating API requests.
vendor.sleep_time 200 The time to sleep after APIC API queries in milliseconds.
vendor.dn_order true Request objects to be sorted by DN in the APIC API query.
vendor.do_nodes_traffic true Enable Node device's network statistics.
vendor.fault_prefix "" Used to specify a prefix text in the summary field of alerts that are created from ACI faults.
vendor.do_pod_traffic true Enable POD device's network statistics.
vendor.do_bytes true Collect statistics in bytes.
vendor.do_packets false Collect statistics in packets.
vendor.do_virtual_traffic false Poll for network statistics of VMs and HVs.
vendor.skip_tunnel_if true Skip polling the POD for Tunnel Interfaces.
vendor.skip_off_vm true Skip VMs that have been powered off.
vendor.skip_bad_nic true Skip VM network interfaces with an IP address of 0.0.0.0.
vendor.pod_agent.schedule "" Poll pod agent devices every 10 mins.
vendor.node_agent.schedule "" Poll node agent devices every 10 mins.
vendor.pod_extended_agent.schedule "" Poll pod extended agent devices every 10 mins.
vendor.node_interface_agent.schedule "" Poll node interface agent devices every 10 mins.
vendor.node_extended_agent.schedule "" Poll node extended agent devices every 10 mins.
vendor.external_switch_agent.schedule "" Poll external switch agent devices every 10 mins.
vendor.hypervisor_and_virtual_machine_agent.schedule "" Poll hypervisor and virtual machine agent devices every 10 mins
vendor.topology_agent.schedule "" Poll topology agent devices every 10 mins
vendor.object_group_agent.schedule "" Poll object group agent devices every 10 mins.
nms.api.host "" The hostname or IP address for SOA and REST API endpoints.
nms.api.v2_api_key "" API key used for NMS REST API authentication.
nms.api.v3_api_key "" API key used for NMS SOA authentication.
nms.api.insecure_tls_connection true Set true to enable insecure TLS connection by skipping certification verification.