Clearing out topology data without reinstalling

You can clear all topology data without having to reinstall your system.

Before you begin

Ensure you have all the required administrator or management permissions to perform administration tasks.

Also ensure that you have the information specific to your deployment to hand, such as your specific release name prefix and namespace; (you will be setting these variables in step three, after stopping the observer jobs and backing up any required configuration settings).

Important

Clearing out topology data as described here resets the topology manager databases so that you have a fresh system to work with, as though you had performed a fresh installation of the product.

  • This process does not delete data associated with the alert, log, policy or metric management components of AIOps.

  • This process does reset the topology database held in Cassandra and the corresponding inventory data held in Postgres. These databases work together to deliver product topology and inventory capabilities and a reset of them will remove all of the following:

      - resources and relationships
      - topology history and their state
      - topology groups
      - topology group membership
      - topology manager configuration including rules
      - topology group templates
      - observer jobs
      - resource and relationship style settings
      - advanced configuration settings
    
  • You are strongly encouraged to backup your configuration using the documented process which includes merge, match, tag (etc) rules, custom icons, right-click tools, topology group templates, observer jobs, advanced configuration settings, as well as resource and relationship style configuration. This data is held in the Cassandra database.

  • The backup file should be stored in a safe location and not inside any of the AIOps topology pod directories. Instead, use a safe location in accordance with you company policies, such as an authorised source code repository.

    Note: File Observer files and Observer Job security certificates are not backed up as part of this process and should be handled separately.
  • Once the reset process is complete and the topology manager processes have been restarted, you will have empty databases to work with. You can either configure and start using the product from that point, or restore the configuration you previously backed up using the documented process.

    Note: The full restoration of observer jobs requires the reapplication of any applicable File Observer files and/or security certificates.

Clearing topology data summary

To clear all topology data without having to reinstall your system, you perform the following sequence of steps:

  1. Stop all observer jobs from running to ensure no new data is received until the old data has been removed.
  2. Back up any configurations if required.
  3. Set the environment variables manually. This will help with the pre- and post-validation of the topology and inventory databases.
  4. Pre-validate the topology and inventory databases to confirm that the data to be removed is present.
  5. Run the script to clean the topology and inventory databases.
  6. Post-validate to confirm that the data has been removed from the topology and inventory databases.
  7. Scale up the topology pods to restart the system and any configured observer jobs.

To clear topology data

  1. Stop all observer jobs.

    a. Expand Define, click Integrations , click the Manage observer jobs tab, then click Configure, schedule, and manage observer jobs. The Observer jobs page is displayed.

    b. Ensure that no observer jobs are running or are scheduled to run.

  2. Back up IBM Cloud Pak for AIOps configurations if required. For more information, see Backing up and restoring UI topology configuration data.

  3. Set the required environment variables. These will be specific to your deployment.

    a. Set the release and namespace as required (as in the following examples):

    export release=aiops
    export NAMESPACE=cp4aiops
    

    b. Set the following variables:

    export POSTGRES_PASS=$(oc get secret $release-topology-postgres-user -o jsonpath='{.data.password}' | base64 -d)
    export POSTGRES_USER=$(oc get secret $release-topology-postgres-user -o jsonpath='{.data.username}' | base64 -d)
    export INVENTORY_DB=$(oc get asmformation $release-topology -o jsonpath='{.spec.helmValues.global.postgres.inventory.dbname}')
    
    export INVENTORY_SCHEMA=$(oc get asmformation $release-topology -o jsonpath='{.spec.helmValues.global.postgres.inventory.schema}')
    export CASSANDRA_PASS=$(oc get secret $release-topology-cassandra-auth-secret -o jsonpath='{.data.password}' | base64 -d)
    export CASSANDRA_USER=$(oc get secret $release-topology-cassandra-auth-secret -o jsonpath='{.data.username}' | base64 -d)
    export POSTGRES_POD=$(oc get clusters.postgresql.k8s.enterprisedb.io aiops-topology-postgres --namespace $NAMESPACE -o jsonpath="{.status['currentPrimary']}")
    
  4. Pre-validate the topology and inventory databases to confirm the presence of the data to be cleared.

    a. Use the following command to verify that the topology database has the janusgraph keyspace:

    oc exec $release-topology-cassandra-0 -- bash -c "echo 'SELECT * FROM system_schema.keyspaces;exit' |cqlsh --ssl -u '$CASSANDRA_USER' -p '$CASSANDRA_PASS'"
    

    The following example output indicates that the janusgraph keyspace is present in the topology (Cassandra) database:

    | keyspace_name | durable_writes | replication                                                                           |
    | ------------- | -------------- | ------------------------------------------------------------------------------------- |
    |         aiops |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1'} |
    |        x_in_y |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1'} |
    |    janusgraph |           True |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'} |
    |   system_auth |           True |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'} |
    

    b. Use the following command to verify that the inventory database contains data (under types, edges and other tables).

    oc exec -it $POSTGRES_POD -- /bin/bash -c 'export PGPASSWORD='$POSTGRES_PASS'; psql --host localhost --username '$POSTGRES_USER' --dbname '$INVENTORY_DB' --command "select * from types"'    
    

    The following example output shows that data is present in the inventory (postgres) database:

    |  id |               type | description |
    | --- | ------------------ | ----------- |
    |   1 | undergroundStation |             | 
    |   2 |        railStation |             |
    |   5 | interchangeStation |             |
    |  31 |      completeGroup |             |
    |  32 |                TFL |             |
    | 550 |  waiopsApplication |             |
    | 552 |             server |             |
    | 553 |                 vm |             |
    | 554 |           database |             |
    
  5. Clear out the topology and inventory databases.

    Important: Make a note of the current scaling of the following core services that access the topology and search databases before you configure and run the script:
    $release-topology-inventory
    $release-topology-topology
    $release-topology-layout
    $release-topology-merge
    $release-topology-status
    

    The following command provides the current scaling of core services:

    oc get pods --field-selector=status.phase=Running --no-headers=true --output=custom-columns="CNAME:.metadata.ownerReferences[0].name" | grep $release-topology | uniq --count
    
    Remember: The $release prefix for the topology services will be specific to your scenario, for example aiops-.

    a. Scale down the core services which access the storage.

    oc scale deploy $release-topology-inventory --replicas=0
    oc scale deploy $release-topology-topology --replicas=0
    oc scale deploy $release-topology-layout --replicas=0
    oc scale deploy $release-topology-merge --replicas=0
    oc scale deploy $release-topology-status --replicas=0
    

    b. Drop the topology janusgraph keyspace.

    oc exec $release-topology-cassandra-0 -- bash -c "echo 'DROP KEYSPACE janusgraph;exit' |cqlsh --ssl -u '$CASSANDRA_USER' -p '$CASSANDRA_PASS'"
    

    c. Finally, clear out the inventory data

    oc exec -it $POSTGRES_POD -- /bin/bash -c 'export PGPASSWORD='$POSTGRES_PASS'; psql --host localhost --username '$POSTGRES_USER' --dbname '$INVENTORY_DB' --command "CREATE FUNCTION pg_temp.clear_inventory(inventory_schema text) RETURNS void AS \
    \$\$ \
        DECLARE \
            statements CURSOR FOR \
            SELECT schemaname, tablename \
            FROM pg_tables \
            WHERE schemaname = FORMAT('\''%s'\'',inventory_schema) \
            AND NOT tablename IN ('\''flyway_schema_history'\'','\''spatial_ref_sys'\'' ); \
        BEGIN \
            FOR stmt IN statements LOOP \
                EXECUTE '\''TRUNCATE TABLE '\'' || string_agg(format('\''%I.%I'\'', stmt.schemaname, stmt.tablename), '\'', '\'') || '\'' CASCADE;'\''; \
            END LOOP; \
        END; \
    \$\$ \
    LANGUAGE plpgsql;" --command "SELECT pg_temp.clear_inventory('\'''$INVENTORY_SCHEMA''\'');"'
    
  6. Free up disk space.

    The previous step saves the cleared Cassandra data as a backup in the /opt/ibm/cassandra/data/data/janusgraph/ directory on each Cassandra node. To free disk space, delete this data.

    To establish which directory contains the current data, use the cqlsh tool to check the Cassandra nodes, as in the following example:

    Example

    The oldest directories contain the obsolete data to be deleted.

    # Change 'aiops' to the relevant release name if required
    oc exec -ti aiops-cassandra-0 -- bash
    
    CASSANDRA_USER=`cat $CASSANDRA_AUTH_USERNAME_FILE`
    CASSANDRA_PASS=`cat $CASSANDRA_AUTH_PASSWORD_FILE`
    cqlsh -u $CASSANDRA_USER -p $CASSANDRA_PASS
    
    hdm@cqlsh> select table_name,id FROM system_schema.tables WHERE keyspace_name='janusgraph';
    
     table_name              | id
    -------------------------+--------------------------------------
                   edgestore | 00762440-549b-11ed-a10d-a1a6d478dacc
             edgestore_lock_ | 00a21640-549b-11ed-a10d-a1a6d478dacc
                  graphindex | 00cf19b0-549b-11ed-a10d-a1a6d478dacc
            graphindex_lock_ | 00f120a0-549b-11ed-a10d-a1a6d478dacc
              janusgraph_ids | 005ad410-549b-11ed-a10d-a1a6d478dacc
           system_properties | ff9c3f00-549a-11ed-a10d-a1a6d478dacc
     system_properties_lock_ | 0175e100-549b-11ed-a10d-a1a6d478dacc
                   systemlog | 01466c90-549b-11ed-a10d-a1a6d478dacc
                       txlog | 01121620-549b-11ed-a10d-a1a6d478dacc
    
    (9 rows) 

    The above command shows the IDs of the current tables, which should be retained. The directory names contain the same ID, but without the '-' (hyphen) characters, for example:

    oc exec -ti aiops-cassandra-0 -- ls /opt/ibm/cassandra/data/data/janusgraph/
    
    edgestore-00762440549b11eda10da1a6d478dacc
    edgestore_lock_-00a21640549b11eda10da1a6d478dacc
    graphindex-00cf19b0549b11eda10da1a6d478dacc
    graphindex_lock_-00f120a0549b11eda10da1a6d478dacc
    janusgraph_ids-005ad410549b11eda10da1a6d478dacc
    system_properties-ff9c3f00549a11eda10da1a6d478dacc
    system_properties_lock_-0175e100549b11eda10da1a6d478dacc
    systemlog-01466c90549b11eda10da1a6d478dacc
    txlog-01121620549b11eda10da1a6d478dacc 

    Once you have identified the current data, you can delete the redundant data.

  7. Post-validate to confirm that the data has been removed from the topology and inventory databases.

    a. Use the following command to verify that the topology database no longer has the janusgraph keyspace:

    oc exec $release-topology-cassandra-0 -- bash -c "echo 'SELECT * FROM system_schema.keyspaces;exit' |cqlsh --ssl -u '$CASSANDRA_USER' -p '$CASSANDRA_PASS'"
    

    The following example output indicates that the janusgraph keyspace is no longer present in the topology (Cassandra) database:

    | keyspace_name | durable_writes | replication                                                                           |
    | ------------- | -------------- | ------------------------------------------------------------------------------------- |
    |         aiops |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1'} |
    |        x_in_y |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1'} |
    |   system_auth |           True |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'} |
    

    b. Use the following command to verify that the inventory database contains no data. The inventory database name is hardcoded and should be set to exported variable in Step 3.b as shown here.

    oc exec -it $POSTGRES_POD -- /bin/bash -c 'export PGPASSWORD='$POSTGRES_PASS'; psql --host localhost --username '$POSTGRES_USER' --dbname aiops_topology --command "select * from types"'
    

    The following example output shows that no data is present in the inventory (postgres) database:

    | id | type | description |
    | -- | ---- | ----------- |
        
    (0 rows)
    
  8. Scale up the pods to the same state as before you cleared out the data.

    Note: The number of replicas can be found using the command mentioned in the Important note in Step 5.
    oc scale deploy $release-topology-inventory --replicas=<number_of_replicas>
    oc scale deploy $release-topology-topology --replicas=<number_of_replicas>
    oc scale deploy $release-topology-layout --replicas=<number_of_replicas>
    oc scale deploy $release-topology-merge --replicas=<number_of_replicas>
    oc scale deploy $release-topology-status --replicas=<number_of_replicas>