As an executive, are you struggling to see the business impact of the new tool you have invested in? As a Service Desk team member, do you find yourself always falling behind in resolving incidents? As a user, have you complained about how long it took to resolve your IT issue the last time you submitted a ticket?

With the fast-growing technological innovations that enabled digital transformation across enterprises, Infrastructure and Operations (I&O) as well as business leaders are facing the challenge of how to prioritize IT budget in order to maximize the benefit of new technology and minimize the pain of troubleshooting that reduces user satisfaction.

How can AI transform IT Operations and what are the key success factors to maximize your return on investment?

Context

Business leaders ranked IT related topics as one of the top 3 priorities in 2019, and 82% of the survey respondents are managing a digital transformation initiative or program1. As business leaders strived to invest in digital businesses by adopting new technologies and processes, the demand for support from IT Operations increased along with the complexity of IT issues.

To fully seize the value of IT investment, I&O leaders are seeking new solutions to tackle the workload, improve efficiency while struggling to cut costs. Many are considering Artificial Intelligence (AI) for its power to transform IT operations, but they also need to be prepared for the key challenges during implementation.

How AI can transform IT Operations

IT operations teams are faced with more and more complex issues while the demand from business and end users grows. AI can help improve application monitoring by centralizing data and producing more accurate predictions and causality. On the other hand, AI can help streamline Service Desk ticketing process to save time for the Service Desk team to work on more complex issues.

AIOps enables centralization of application monitoring data

A company can adopt new technologies to meet new business needs or upgrade the existing tools. Adopting new technologies also means that there are more applications to monitor. To add to the monitoring complexity, IT investments are not only made across various IT departments/domains, but also across business functions: in 2018, 24% of global services leaders (managers and above) reported to have made purchase decisions without IT2.

Currently, most applications are paired with their own Application Performance Monitoring (APM) tools and/or Digital Experience Monitoring (DEM) tools, resulting in data being collected in silos. Hence, I&O leaders are faced with the challenge to efficiently monitor all the applications across the organization and create a comprehensive view of overall performance for incident prediction, causality identification, and decision making.

One solution is Artificial Intelligence for IT Operations (AIOps) platforms with its data processing power.

Figure 1 – AIOps enables event correlation across applications and business/IT domains

Without replacing any existing APMs, AIOps platforms allow all the monitoring logs to be centralized and processed using Natural Language Processing (NLP): data of different sorts can be aggregated and analyzed together. Instead of looking at applications one at a time, AIOps enables event correlation across applications and business/IT domains, and Machine Learning (ML) models can then be applied to detect anomalies and identify causality within the IT ecosystem.

Monitoring powered by AIOps improves incident prediction and causality identification

I&O leaders can also leverage AI for more accurate incident prediction and alerts. Traditional monitoring tools have models that are either hardcoded or predetermined by vendors. Hence, they are modeled on patterns that are common and can be easily discovered by humans. With machine learning algorithms, models are built directly from data and take into consideration new patterns created by the machines. As APM collects more data for inputs, these models will continue to evolve; hence making them more precise and up to date than preprogrammed models. Periodically, data scientists will perform validation to ensure that the models are still understandable, and help improve prediction by either identifying new correlations or correcting inaccurate results.

Identifying the root cause is key in incident response and future prediction, and AI can help speed up the process. As mentioned above, AIOps platforms aggregate data from different departments/domains and generate event correlations that may not be obvious if the investigation only focuses on one application. While AI enhances pattern recognition, it will also help uncover cause and effect relationships among application performance data. Once the root cause has been identified, IT operations can respond to incidents better and faster with more targeted resolution. Moreover, understanding the cause will help prevent similar future incidents.

AI can enhance end user experience by increasing the availability of Service Desk teams

If an issue occurs with any IT products or services, business users will escalate the incident to Service Desk. Before implementing AI, each stage of the ticket management requires human involvement to understand the context and then take actions based on individual experience/expertise. It could take days if not weeks to close a single ticket given the volume of requests and the limited capacity of IT support team.

Figure 2 – AI can help increasing availability of experts for more complex tasks

With AI powered Virtual Support Agent (VSA), business users will be able to reach the Service Desk on demand whether a human agent is available or not. Utilizing Natural Language Processing (NLP), VSAs can identify common questions and commands from the conversation with business users. Then, they will be able to provide answers by searching through knowledge center and resources as well as performing basic tasks such as updating password or restoring settings.

Another way to use Machine Learning (ML) and Natural Language Processing (NLP) to improve ticket lifecycle is to automate ticket categorization and optimize the availability of experts. Starting by analyzing the ticketing process and understanding the key metrics of operational expectations (performance metrics, confidence thresholds, etc.), a predictive model can be built leveraging on the existing backlog and closed ticket history. Therefore, when a ticket is generated, the request will be qualified and automatically sorted to an incident category, and subsequently assigned to one of the available technical experts responsible for that type of requests. This will help the ticket to get to the right level of support faster while giving more time back to Level 1 support to handle more complex requests.

AI can help transform the IT operations while improving data management and facilitate centralization of databases. This in turn will enhance the incident prediction and the causality identification. In addition, AI can increase the availability of the Service Desk and therefore improving the end user experience with quicker turnarounds. To maximize the benefit of AI, some key success factors needs to be addressed to facilitate a successful roll-out.

Key success factors and how to implement them

To achieve success in AI implementation, decision makers need to make sure that first the needs and the targets are agreed upon by both business and I&O leaders. Second, the organization should be adequately informed throughout a comprehensive communication plan. Third, data governance and infrastructure must be sufficient properly implemented. Fourth, the organization’s core values shall be incorporated in the design of the models, so that bias can be minimized.

To drive business values using AI, business leaders need to share their priorities and objectives with I&O leaders. I&O leaders will then introduce these priorities as part of the guidelines and specifications for machine learning models to generate useful business insights. Decision makers should choose new technologies that adapt to the requirements as the most advanced technology may not be the best fit for the specific business needs.

Preparing the organization for transformation is critical for the success of adoption. As early as POC phase, business and I&O leaders need to showcase the functionality of the AI technology, and explain the changes in day to day work (where they come from, their objectives and impacts) to onboard business users and IT Ops team. These will help demystify the new technology, establish interest and excitement, and build users’ confidence in the tool.

The data governance should be effectively defined in order to implement a sustainable data strategy that ensures good data quality and efficiency of data valorization: the organization, processes, and controls from data collection to storage and recovery. In addition, before introducing new technologies, it is essential to know the data heritage in order to have secure foundations to build on. The infrastructures need to be able to handle a large volume of data and process data of different types, especially when data are pooled together in centralized data lakes instead of spread-out in multiple siloed databases.

Predictive models need to incorporate the core values of the organization to avoid unwanted bias in pattern identification and event correlation. When selecting datasets for training, data scientists need to make sure that the population does not include any systemic or institutionalized bias. Hence, they need to clean up data for accuracy, ensure the population is truly random and inclusive, and constantly monitor and audit the models. For instance, Locally Interpretable Model-agnostic Explanations (LIME) help to determine how predictions change when modifying inputs and/or parameters.

Conclusion

To keep up with the high demand for new technology, I&O leaders can consider the benefits of implementing AI in ITOps while keeping in mind the challenges when it comes to business objectives, change communication, data management, and model bias. For executives, AI helps to generate quick business insights and monitors applications performance to track the return on investment of the new and legacy tools. For Service Desk team, AI streamlines simple tasks and optimizes the workload based on individual expertise. For users, AI powered VSA executes simple requests or suggest solutions right away.

In this paper, we focused on the benefit of AI and AIOps platforms. In addition to incorporate AI, companies can also leverage blockchain technology to drive data analytics and combat some of the challenges mentioned above. For instance, data heritage is easy to trace as they are recorded and verified by a distributed network of computers every time a transaction or activity takes place, hence ensuring the integrity of data. Moreover, companies can leverage the computational power of blockchain to perform analysis on large sets of data, and make trustworthy predictions given the accuracy of data stored on a blockchain network.