2015 was an existing year for me. Adoption of new approaches to service management have seen the light, and clients are increasingly interested to apply them to drive innovation, improve quality or simply to reduce costs. In this blog post, I would like to discuss a few of these approaches.
We have discussed the need for more scalable approaches to service management for a while. Environments get larger (I am engaged with clients with well over 60.000 servers), and at the same time applications are becoming more dynamic due to trends like Agile and DevOps. Manually setting up monitoring thresholds or codifying rules does not scale in such environments, turning service management to become non-effective and inefficient.
With the appearance of Big Data, analytical approaches become mainstream – also in the domain of service management. Application performance management can observe the typical state of resources and derive their normal behavior. It automatically creates an envelope for these threshold and reacts as soon as a resource metric breaches this envelope.
Another application of analytics is to monitor the relationships between resources (such as number of active users on a website, CPU utilization, and memory consumption of the webserver). Again, instead of manually telling the system which resources may correlate together, the analytics will go through thousands of combinations and find relevant patterns to then monitor for violations.
As we do for metric values, we also perform correlation of related events. Looking at the history of events (typically 1 to 3 months of events), the system would identify patterns of related events: every time it sees one event, it has also observed these other events. An automatically generated rule groups these events as related and only present one event to the operator or create a single incident record instead of multiples.
This is only a start to the adoption of analytics. I am in conversations with clients and technical leaders in the lab on more sophisticated application of analytics in service management.
Analytics leads to another theme that we started to get engaged in: Cognitive Computing. By now, many people have read about IBM Watson and what it can do for industries like Healthcare and Banking. Equally Watson can also be applied to IT and Call Centers.
We started using Watson in our Support organization to help expedite the analysis and resolution of PMRs. After some learning experience (for us and Watson – I may add) we start seeing positive results from our new colleague.
Likewise, Lab Services is engaged in early projects to use Watson in IT Service Management. At one client, we are piloting Watson technologies to match new incidents with existing or historical incidents. The aim is to identify root causes and the right subject matter expertise to perform the incident resolution. At another client, we use Watson to build a graph of the IT landscape, federating repositories like CMDBs, Asset Databases, Logfiles and so on together. Building an ontology and leveraging natural language classification allows this federation across structured and non-structured data.
Automating repetitive tasks is a well understood topic in ITSM. We have clients that schedule tens of thousands of jobs every day. We have clients that automate their IT processes using the process automation platform, frequently based on the ITIL framework. We have clients that automate the provisioning of IaaS, PaaS, and even application landscapes – triggered by a single click in a self-service portal. And these are only 3 examples of automation.
However, in 2015 a new flavor of automation started getting enormous traction: Runbook Automation. In the past, clients were not ready for an automatic mitigation of problems. Partly because of lack of standardization, appropriate automation technology didn’t exist or simply clients didn’t consider this as something to focus at. With global labor arbitrage reaching its limits, clients now need to look for new opportunities to cut costs while assuring availability.
In Lab Services, we have had an offering for quite some time to automatically respond to incoming events with the appropriate action. For instance, on a “Filesystem Full” alert, we would remove temporary files, prune logfiles and potentially extend the volume group. The ITSM development team have taken over and started developing a Runbook Automation solution. This solution acknowledges the varying degree of maturity in our client base: in addition to a fully-automated solution, the offering will also support semi-automatic runbooks as well as manual runbooks. These levels will also support a client growing into runbook automation: he first documents his runbooks and make it accessible in-situ to an event. From there, more frequent or Error-prone manual operations can be automated and enabled as semi-automated runbooks; eventually leading into a full automation for a given event. Feedback from clients so far is extremely encouraging. Especially the ease of use of the solution, and the ability to plug-in additional triggers (the “IF” part) and execution methods (the “THEN” part) is a key value that clients raise. We are in Beta for several months and expect the offering to be released in 1h2016.
ITSM as a Service
The last theme I have observed is a push towards as-a-service consumption models. Not (yet) in the classic means of consuming everything from the cloud, but rather in a hybrid model. The classic IT service management functions are provided on-premise, while new functions are being provisioned from the cloud. A key reason is that an aaS model provides the functions much quicker compared to a traditional on-premise release: No install, no hardware / network acquisition, no complex system architecture. Another reason is that the function is not requested by a central IT department, but rather a line of business. Application Performance Management is a good example here, that also shows the need to integrate these two worlds – on-premise and hybrid.
I am certain that 2016 will be an important year in the journey towards cognitive operations. Relevant technology components exist – for instance the Watson ecosystem on BlueMix – and innovative clients start partnering with us in these projects. Some of these clients start with IT, to be prepared for an application of the technology in their lines of business. The role of IT changes, and I am excited working with clients in shaping their future role.
At InterConnect 2016, Richard Wilkins and I will review some of these approaches in our CTO Panel (session #4737). We are looking forward to an interactive session and brainstorm what 2016 may bring.