AIOps 2.0: Making Actionable Intelligence Actually Actionable

PinIt

An automation-first, data-driven, self-service approach delivers on the promise of AIOps, enabling teams to fix problems rather than just identifying their causes and moving on.

Companies today are under immense pressure to drive digital transformation while navigating the increasing demands of complex systems, services, and infrastructure — all with minimal downtime. IT teams are inundated with data, meaning they’re dealing with alert noise. They’re forced to spend more time sifting through data from disparate systems. And they end up wasting time on very manual tasks. These challenges aren’t likely to go away; instead, it’s up to businesses to figure out how to handle them. AIOps provides the opportunity for organizations to automate the resolution of IT issues so businesses can spend less time chasing down answers and get back to building and innovating creative, competitive solutions. 

The concept of AIOps – where the capabilities of machine learning and people are combined to deliver technical outcomes for IT operations – isn’t new. We’re starting to see legacy players in the event management, observability, and APM spaces put ownership around AIOps, claiming they are, in fact, the tool of choice. But when you take a step back and look at AIOps, you realize it’s not simply a tool, it’s either a set of capabilities encompassing a monitoring-centric approach or it’s an approach that’s event-management-led. Carlos Casanova of Forrester has referred to this as a “superstructure of interoperability,” built on pillars of capabilities to achieve this end.

First, we have application monitoring tools, which encompass a monitoring-centric approach that leverages metrics, KPIs, logs, and other sources of data with machine learning and trend analysis to make predictions sooner. On the one hand, this approach enables teams to monitor everything and identify root causes more easily, but, on the other, it requires businesses either replicate their monitoring systems or rip and replace large portions of existing toolsets, which can be extremely costly.

Second, there’s an event management-led approach. This approach integrates disparate monitoring, so teams end up with a capability that provides a single pane of glass to centralize disparate information to inform decision making. This can lead to bottlenecks of capabilities, and since many vendors have different charge metrics, sizing solutions can be difficult.

Both approaches, no matter how accurate at identifying a root cause, fail to provide the information needed to eventually fix the problem and restore service. It’s time for a new approach for AIOps.

See also: Increased IT Complexity Drives the Need for AIOps

AIOps 2.0

It’s time we look at AIOps through a new lens, one that encompasses an automation-first approach to help teams transform their work by not only greatly improving the mean time to resolution (MTTR) but by also freeing up teams from noise fatigue and providing context, so they can stay focused on what really matters: resolving affected services and preventing customer impact. In freeing up their time from chasing problems, development teams can drive down technical debt, improve availability, deliver new features, and ultimately drive increased customer satisfaction and customer experience.

Rather than rip and replace, teams should be able to leverage the tools, teams, and capabilities they already have to achieve operational wins while supporting broader strategic digital transformation goals. Teams need a simple, low friction way to solve tactical problems while building strategic solutions that impact the rest of the business. AIOps can’t simply identify the root cause. It must be able to help with the “what’s next” question. How do we better drive the “action” in “actionable intelligence?”

AIOps must take an automation-first, data-driven, and self-service approach to truly provide value.

In practice, this means incorporating automation as a key component to the management of today’s complex, modern IT systems. When applied in the right ways, it helps teams avoid mistakes, increase reliability, and reduce toil in day-to-day tasks – ideally, without the need for mobilizing a team.

Automated resolution is the key to improving MTTR, and it also provides the added benefit of allowing teams to save time on handling manual tasks so they can concentrate on delivering value and not chasing problems. For example, say we just focus on noise reduction. Machine learning can recognize patterns over time, across content and payload, within certain activities and past incidents, and narrow down the focus to what makes the most sense. So, if automation becomes the first virtual responder, teams can drive down their response time and reduce alert fatigue. With automated diagnostics, teams are armed appropriately with the data and context they need to not only respond accurately and in a timely manner but actually fix the issue.

Being data-driven means creating situational awareness that allows teams to readily make operational and strategic decisions at all levels. AIOps solutions create added context for IT teams, so they can understand what was affected, how customers are impacted, and what the broader business implications are. It creates a comprehensive audit trail to improve problem management and avoid issues from reoccurring.

Self-service puts the power back in the hands of humans, especially when large organizations with multiple teams need to manage through a single solution. Rather than being dependent on one centralized team to update rules or manage configurations, administrators can leverage repositories to get their teams the updates they need quickly.

See also: Improved Service Delivery Drives AIOps Use in Government

How teams can enable their first responders to fix problems instead of chasing incidents

An automation-first, data-driven, self-service approach delivers on the promise of AIOps, enabling teams to fix problems rather than just identifying their causes and moving on – it gets to the root of the issue so teams can stop chasing incidents and spend their valuable time moving business goals forward.

As organizations look to meet the balance of accelerating the delivery of features while maintaining meeting high customer expectations, AIOps can provide critical capabilities for the business. Here are a few actionable next steps.

Identify

Start with the end in mind. The “easy” metric is MTTR. Everyone wants to go faster and save money. But it’s important to think more broadly on the capabilities you want to truly deliver. Many successful IT projects end up abandoned because they aren’t seen as valuable to the business. Identify the key stakeholders inside and outside the organization, so you can benchmark and baseline business metrics that will align with the organization’s goals. Then, ensure the services you create and manage are strongly aligned to meet those goals.

Iterate

AIOps doesn’t always have to be a grand vision. While you are building towards strategic goals, you can accomplish short-term wins through getting better answers, creating better context, and leveraging fewer people to solve problems. The great can’t be the enemy of the good in building capabilities. You have the foundation for good services. You can iterate on monitoring, analytics, response plays, and other areas that will build towards your two-year plan while solving your Tuesday problem.

Automate

Teams often jump to the complex multi-step automations and get quickly stuck. Design a foundation that allows for self-service, secure, and orchestrated response. Then work on a simple reuse of existing automation: how do you get it in the hands of more responders to free up subject matter experts? Add automated diagnostics to expedite response when minutes count, creating context for responders and making it easier to resolve issues. Build on these capabilities towards orchestrating multi-step responses for complex problem signatures.

AIOps can be a journey — you don’t have to have all the answers. Start with the needs of your teams. This will give you the inertia and confidence to solve much larger business problems as you iteratively deliver.

Heath Newburn

About Heath Newburn

Heath Newburn is a distinguished field engineer at PagerDuty. He is responsible for helping teams take their existing strategic capabilities and leverage automation, AIOps, CSOps, and automated Incident response to create new business outcomes. He has a long background in monitoring, event management, and operations in many organizations and is focused on enabling the personal success of individuals and teams across IT.

Leave a Reply

Your email address will not be published. Required fields are marked *