IBM Cloud Pak for Watson AIOps workflows
You decided to install IBM Cloud Pak® for Watson AIOps, now it's time to explore the key workflows of your users and their interaction with IBM Cloud Pak for Watson AIOps.
Understanding how you can accomplish your tasks, who's responsible, and how each workflow fits together gives you a holistic picture of your entire AIOps-infused IT infrastructure management experience. You can bring IBM Cloud Pak for Watson AIOps to life by deploying more operators, customizing your configurations, and optimizing IBM Cloud Pak for Watson AIOps performance. For more information about how to get the most out of your team, see Archetypes.
Installing IBM Cloud Pak for Watson AIOps
Platform engineers prepare all required hardware and software, and then install the Red Hat® OpenShift® Container Platform. They prepare OpenShift clusters and install IBM Cloud Pak for Watson AIOps into those clusters.
For more information about installing IBM Cloud Pak for Watson AIOps, see the following sections:
To help first-time users get started with some workflows, guided tours are included within the UI console across the different dashboards. These tours introduce users to the various UI options and components, and demonstrate how to complete key tasks across the platform. Users can access the tours at any time by clicking the Tours icon in the console toolbar, and selecting their preferred tour.
For more information about the available guided tours, see Guided tours.
Setting up data and tool connections
You can create connections for logs, to establish a baseline of normal behavior and identify anomalies. Other connections provide event, applications, and infrastructure data to IBM Cloud Pak for Watson AIOps.
Systems engineers create connections to ticketing and to ChatOps tools. The connections to ticketing tools have a dual purpose: they enable ticket data to be collected for change request and similar ticket training. The same connection also enables synchronization between IBM Cloud Pak for Watson AIOps incidents and corresponding ServiceNow tickets. The connections to Microsoft Teams and Slack ChatOps tools enable incident notifications to be pushed to these ChatOps tools. These notifications are for the attention of Site Reliability Engineers who are tasked with resolving those incidents.
For more information about setting up tool connections, see the following sections:
Loading data and training AI
As a prerequisite to training your AI to provide insights on your monitored environment, systems engineers load event, log, metric, ticket, and other data from various data sources. They then run different types of AI training to generate AI models such as log anomaly models, which can easily identify and report on anomalies in your log data. Insights that are generated by these AI models are used to enrich incidents with related alert, similar ticket, and event grouping information.
For more information about loading data and training AI, see the following sections:
Systems engineers are responsible for building the applications that are monitored by Site Reliability Engineers. Systems engineers build applications from the ground up, by first defining observer connections to import resource topology data, and then defining templates to automatically generate resource groups from the incoming observer streams. When they have an incoming dynamic stream of resource groups, they build business applications that are made up of multiple resource groups.
For more information about building applications, see the following sections:
System administrators set up policies, runbooks, and actions to detect and remediate incoming alerts. Policies are rules that contain a condition and a set of outcomes that can be manual or automated. Runbooks help to solve common operational problems by automating procedures that do not require human interaction. Procedure-based documentation can be transferred into manual runbooks. Actions in runbooks are the collection of several manual steps into a single automated entity. Actions can be added to create semi-automated or fully automated runbooks.
For more information about creating automations, see the following sections:
Site reliability engineers are responsible for investigating and resolving incidents and incidents that are associated with applications and infrastructure. They use ChatOps tools such as Microsoft Teams and Slack to monitor incident channels, and investigate incidents by using application, topology, ticketing, alert, and similar ticket information provided in the incident notification. They act to resolve the incidents, where necessary involving Applications developers.
For more information about resolving incidents, see the following sections:
Administering the cluster
Platform engineers administer the cluster on which IBM Cloud Pak for Watson AIOps is installed on an ongoing basis. They ensure that all nodes, pods, and other elements within the cluster are performing as expected. Using the IBM Cloud Pak Administration panel, they monitor trends, display workload summaries, and display system utility status. They also support the systems engineer with complex cluster administration tasks such as multitenancy management, project (namespace) management, and authorization and authentication of users within the cluster.
For more information about administering the cluster, see the following sections:
Administering IBM Cloud Pak for Watson AIOps
Systems engineers, or in some organizations, Operations engineers, are responsible for authenticating and managing users. They can also customize the UI experience by setting console branding and customizing what content users see on the home page. For more complex cluster-based tasks such as multitenancy management, project (namespace) management, and authorization and authentication of users within the cluster, they collaborate with the Platform engineer.
For more information about administering IBM Cloud Pak for Watson AIOps, see the following sections: