Threat Investigator troubleshooting

If you encounter an issue with the operation of the Threat Investigator application, see the following information on problem resolution.

The following issues, diagnosis, and resolutions are outlined to assist you in troubleshooting issues that are associated with Threat Investigator.

To view other common problems and workarounds with Threat Investigator, see Threat Investigator known problems and workarounds.

Threat Investigator automatic investigations fail

Automatic investigation of one or more cases are failing.

Automatic investigations fail symptoms

  • Investigation status shows as Failed in the Threat Investigator investigations table for one or more cases.
  • Case investigation starts but does not complete successfully and you cannot view the investigation results to see the findings for the incident.
  • The user ID that configured the app is no longer valid (for example, a user is deleted). The automatic investigations use the identity of the user that configured the app. If the user is deleted, investigations fail.
  • There is a problem with Universal Data Insights (UDI) component of QRadar® Suite Software or connected data sources. Threat Investigator uses UDI to run data mining queries on the configured data sources.
  • There is a problem with the AI Toolkit (AITK) component of QRadar Suite Software. Investigation is run as an AITK workflow and can fail if the AITK encounters any problems.
    • AITK is at the concurrent job limit and cannot start a new workflow. AITK limits the number of concurrent jobs to 100 by default. This limit is shared with DE (Data Explorer) search jobs and any other AITK jobs that are run by other components.
    • AITK jobs are failing due to low resources or connection problems to middleware (etcd, redis, etc.)
  • There is a problem with the Threat Investigator deployment.

Diagnosing the automatic investigations problem

In the following curl commands, replace <openshift_url> with the QRadar Suite Software application domain. Use a QRadar Suite Software application user ID that has user or admin role for the Threat Investigator app. Replace <password> with the login password of the user. Alternatively, use an API key for the user by replacing <username> with the API key and <password> with the API key secret that the user generates from the API keys page of QRadar Suite Software.
  1. Install the command-line interface (CLI) utility cpctl from the cp-serviceability pod. For more information, see Installing the cpctl utility.

  2. Run the QRadar Suite Software health checks for Threat Investigator to verify the health of its components and its dependencies, which include AITK, UDI, Cases, and re postgres:

    cpctl diagnostics check_deployment --only threatinv --token "$(oc whoami -t)"
  3. Verify that the user ID that configured the app is still valid by checking for any warnings in the Threat Investigator user interface or by calling the Threat Investigator account API with the following command:

    curl -u <username>:<password> https://<openshift_url>/api/advisor/v1/account

    If the is_user_valid field is set to false, then the user is no longer valid and automatic investigations fail. If this is the problem, this will also be displayed as a warning in the Threat Investigator User Interface above the Investigations table.

  4. Check the value of the has_data_sources field in the account API response. If it is false, automatic investigations can fail. This might mean that there are no configured data sources or the user that configured the Threat Investigator does not have permissions to use the data source. If no data sources are configured, Threat Investigator cannot perform any investigations. If data sources are configured, make sure the user ID (config_user field) has a Data sources user or Data sources admin role. The Data sources admin role grants access to all data sources. If user is given data sources user role, additional permissions must be given to user for each data source that might be used in case investigations. If this is the problem, it will be displayed as a warning in Threat Investigator above the Investigations table.

  5. If the value of the has_data_sources field is true, use Data Explorer or UDI REST APIs to run a test query against the configured data sources. If the query fails or does not complete, there is a problem with UDI component of QRadar Suite Software or with the configured data sources.

  6. To confirm if there is a problem with AITK (AI Toolkit) component, collect the logs from AITK pod and check for any errors or warnings. See the AITK runbooks for more information.

  7. Investigations are run by the worker container of the threat-inv-api pod. Collect the logs from the worker container and check for any errors and warnings. If the logs show the following error, it indicates a problem with starting a new AITK job:

    Failure starting ATK workflow advisor-atk-investigation: <response_code>

    If the logs show the following error, it indicates a problem with getting the job status from AITK:

    Failure in ATK workflow advisor-atk-investigation: <response_code>

    If the logs show the following error, it indicates a problem with getting the job results from AITK:

    Failure getting results for ATK workflow advisor-atk-investigation: <response_code>

    If the response code is 429, AITK has reached its max concurrent job limit and cannot start a new job. This does not necessarily indicate a problem because Threat Investigator retries the AITK API calls up to 5 times. Otherwise, the code indicates another problem with AITK (See AITK runbooks for more information)

Resolving the automatic investigations problem

  • If the user ID that configured the app is no longer valid, a valid user with a Threat Investigator admin role can reconfigure the app.
    1. Log in to QRadar Suite Software as a valid user.
    2. From Settings -> Threat Investigator, disable automatic investigations, save, and then re-enable automatic investigations and resave.

      Now the identity of the new user is registered for use with the automatic investigations.

  • If there are no configured data sources, disable automatic investigations until data sources are added, because Threat Investigator will not function without valid data sources.

  • If the user that configured Threat Investigator does not have access to the data sources needed for investigation, assign the user data sources user role. Then go to the Settings > Connections > Data sources page and add at least Viewer access to each data source for that user.

  • If the health check results show that there is a problem with any of the app dependencies, contact support with the health check results.

  • If the health check results show that there is a problem with the app components, try restarting the following app pods: threat-inv-api for API, worker, and scheduler, and threat-inv-ui for the user interface.

User response to test automatic investigation

Reinvestigate one of the failing cases that use the Threat Investigator API:

curl -u <username>:<password> -H "Content-Type: application/json" -X POST https://<openshift_url>/api/advisor/v1/investigation/case/<case_ID>?org_id=<org_ID> -d '{"start_time": 0, "end_time": 0}'

Wait until the investigation successfully completes.

Replace <case_ID> with the ID of the case that's failing and replace <org_ID> with the Cases org ID that the failing case belongs to. You can obtain the case ID from Threat Investigator by locating the failing case in the investigations table (The Case ID column of the row with a status set to Failed). You can then view the case with that ID in the Case Management app to obtain its org ID (org ID is shown in the browser URL path when the case is viewed in Case Management app). You can also use the following Case Management API to locate the org ID by checking the value of the preferred_org_id field in the JSON response:

curl -u <username>:<password> https://<openshift_url>/api/respond/rest/session

Threat Investigator investigations are not running

When automatic investigations are enabled, Threat Investigator investigates a new case every 6 minutes automatically.

Investigations not running symptoms

  • Threat Investigator doesn't investigate new cases after automatic investigations are enabled.
  • New cases are created but Threat Investigator is not investigating them as expected.
  • Automatic investigation starts from the first case that is created within the last 24 hours since it is first enabled. Older cases are not investigated and this does not indicate a problem with the app.
  • Only cases with supported artifacts qualify for automatic investigation. If cases do not have the supported artifacts that are attached, they will not be investigated and this does not indicate a problem with the app. For more information, see supported artifacts
  • Automatic investigations only investigate three cases at a time per account. If three investigations are still in progress for the account, it is expected that a new investigation will not be started until a previous one completes and this does not indicate a problem with the app.
  • The automatic investigation does not have permissions to access the case. If Threat Investigator does not have access to the case as member or owner, it cannot detect the case.
  • Registration of the investigation workflow encountered a problem and cannot run automatic investigation workflow.
  • Threat Investigator Scheduler is not scheduling the periodic automatic investigation task.
  • Threat Investigator Worker is not running the automatic investigation tasks.

Diagnosing the investigations not running problem

In the following curl commands, replace <openshift_url> with the QRadar Suite Software application domain. As <username>, use a QRadar Suite Software application user ID that has user or admin role for the Threat Investigator app. Replace <password> with the login password of the user. Alternately, an API key for the user can be used by replacing <api_user> with the API key and <api_password> with the API key secret that the user has generated from the API keys page of CP4S.
Tip: You should install the `jq` utility for easy readability. The utility is available for your platform at https://stedolan.github.io/jq/download/. Otherwise, omit the `| jq ...` from the commands.
  • Install the command-line interface (CLI) utility cpctl from the cp-serviceability pod. For more information, see Installing the cpctl utility.

  • Run the health checks for Threat Investigator to determine if the app components and their dependencies are healthy.

    cpctl diagnostics check_deployment --only threatinv --token "$(oc whoami -t)"

    This will test the health of all Threat Investigator components and dependencies.

  • Run the QRadar Suite Software tool for Threat Investigator for details on the configuration. Use the <api_user> and <api_password> API key created in the QRadar Suite Software application.

    cpctl diagnostics threatinv_check_config --apiuser <api_user> --apipassword <api_password> --token $(oc whoami -t)
  • Obtain the case_id and org_id of a case that did not get investigated as expected. You can locate the case in the Case Management app to obtain its org_id (org id is shown in the browser URL path when Case is viewed in Case Management app). You can also use the following Case Management API to locate the org id by checking the value of preferred_org_id field in the response json:

    curl -u <api_user>:<api_password> https://<openshift_url>/api/respond/rest/session | jq .preferred_org_id
  • Run the QRadar Suite Software case tool for Threat Investigator for details on a completed or failed investigation. Use the <api_user> and <api_password> API key created in the QRadar Suite Software application.

    cpctl diagnostics threatinv_check_case --apiuser <api_user> --apipassword <api_password> --token $(oc whoami -t) --case_id <case_ID> --org_id <org_ID>
  • Find out when the case was created using the Case Management REST APIs or User Interface. If it is older than 24 hours, then it is expected that it will not be investigated and no remediation is needed. You can get the case's details including its create date using the following REST API and look for the create_date field.

    curl -u <api_user>:<api_password> https://<openshift_url>/api/respond/rest/orgs/<org_ID>/incidents/<case_ID> | jq .create_date

    To convert the epoch time to be readable, enter the following commands:

    • For MacOS:

      date -r value

      Example: date -r 1455086371603

    • For Linux:

      date -d @value

      Example: date -d @1455086371603

  • From the Case Management app or by using the Case Management REST APIs, find out what artifact types are attached to the case. If no artifacts are attached or attached artifact types are not in the list of supported artifacts, then the case does not qualify for automatic investigation. This is expected and no remediation is needed. You can use the following REST API to get the artifacts attached to the case:

    curl -u <username>:<password> "https://<openshift_url>/api/respond/rest/orgs/<org_ID>/incidents/<ID>/artifacts?handle_format=names"

    Verify there is at least 1 artifact attached with the type matching the supported artifacts list. For the list of supported artifacts, see Configuring Threat Investigator.

  • From the Threat Investigator app, check if there are any investigations still in progress. This is shown under the In-Progress section at the top of the investigations table. You can also use the API to find out if any investigation is still in progress. If another investigation is in progress, then the app will not start a new investigation and this does not indicate a problem. You can use the following REST API to see if there is already an investigation in progress:

    curl -u <username>:<password> "https://<openshift_url>/api/advisor/v1/investigation/metrics?org_id=<org_ID>"

    If the in_progress field is greater than 0, then there is an investigation already running. Investigator can run up to 3 automatic investigations at a time.

  • Threat Investigator uses the identity (permissions, roles, entitlements) of the user that configured the app. If this user does not have access to the case as member or owner, then Threat Investigator cannot see the case to investigate. To confirm, use the Threat Investigator account API to find out which user configured the app and then check whether this user has permissions to the case as owner or member or in a group that's member of the case. Otherwise it is expected that this case will not be investigated. You can use the following REST API to find out the user id that configured Threat Investigator:

    curl -u <username>:<password> https://<openshift_url>/api/advisor/v1/account

    Check the config_user field for the user id.

  • Threat Investigator registers a workflow with the AI Toolkit (AITK) component of QRadar Suite Software to execute the investigation. If the registration fails, then the investigation cannot run. To find out if there was a problem with registration, collect the logs from the threat-inv-api pod's worker container and look for the following log message in the worker log:

    Failed to initialize ATK

    This log message indicates that there is a problem with registering the AITK workflow and that you should see the AITK runbooks to diagnose issues with AITK.

  • Threat Investigator Scheduler schedules the periodic auto investigation tasks but if there is a problem with the scheduler, it is possible they are not being scheduled. To check if there is a problem with the scheduler, collect the logs from the threat-inv-api pod's scheduler container and look for any errors and warnings.

  • Threat Investigator worker executes the scheduled tasks but if there is a problem with the worker, the auto investigation cannot run. To find out if there is a problem with worker, collect the logs from the threat-inv-api pod's worker container.

Resolving the investigations not running problem

If the user that configured the app does not have access to the cases, try the following fixes:

  • Add the user as a member to all new cases being created. If there is a group that has access to all the cases, then add the user to the group instead. If not, consider creating a new group that includes the user and always add the group as case member to any new cases being created.
  • If there is already a user ID that has access to all the cases, reconfigure the app because that user and future automatic investigations use the identity of the new user. To reconfigure the app:
    1. Log in to QRadar Suite Software as that user.
    2. Go to Settings -> Threat Investigator.
    3. Disable Automatic Investigations, and click Save.
    4. Re-enable Automatic Investigations and click Save again.
  • If the issue is confirmed to be a problem with workflow registration, you can try to re-register by using the Threat Investigator analytics API:
    curl -u <username>:<password> -X POST https://<openshift_url>/api/advisor/v1/analytics
    If the API returns 200 with status field set to registered, the registration is successful. Otherwise, check AITK runbooks or contact support.
  • If the health check indicates a problem with the worker or scheduler, restart the pods/containers.
  • If the health check indicates a problem with dependencies, middleware or both, contact support with the findings.

User response to test investigations running

Wait for 6 minutes to see whether the automatic investigation successfully starts a new case investigation (a qualifying case must exist). You can use the Threat Investigator API to check whether the investigation started.

Note that cases already skipped by the automatic investigation are checked again for up to 24 hours to determine if they can be investigated but they will not be checked again after 24 hours have passed since they were created.