Configuring AI Ops

This section will define how to configure AI Ops for Problem Management. You must have AI Ops in your IPK Security role to configure AI Ops.

AI Ops was initially intended as an enhancement to problem management and for ITIL, that is exactly what it does best. The strength of AI Ops is that it monitors traffic based on any parameter or set of parameters that have been defined as either a distinct count or as a % of change over the normal operating rhythm in a given period. In addition to Problem Management, AI Ops has been used extensively for event management and event ticket collation, and for correlation and monitoring of many other things.

For Example: an organization may have a high profile user or location and they need to watch it closely. AI Ops can be used to watch the tickets that come in from a person, group, or location and set so that if a 5% or greater change in the number of tickets is logged in a given time frame, and new AI Ops ticket is created and the appropriate alert sent the owning agent or support group. In this scenario, it doesnt matter what kind of tickets are being logged, just that there is an uptick in activity.

In another example, AI Ops can be used during a technology refresh to alert support when specific, impacted groups are logging tickets of a certain type. This helps identify issues much more quickly.

Typically, a Problem Manager or someone in a similar role will be notified via email that an AI Ops rule has been triggered. If you are using AI Ops for other purposes, the call template linked to the rule is configured to automatically assign and notify the correct groups and people, associate the correct SLA, and even profile certain fields. In either scenario, the person being assigned the AI Ops ticket will be responsible for assessing the triggers and linking the calls that are truly part of the larger issue that AI Ops is watching.

When you are viewing the list of tickets (calls) that the AI Ops rule found, it is important to understand they are not yet linked, merely associated as triggers that fulfill the parmaters of the AI ops rule. You still need to review these triggers and then link the calls that are actually relevant.

AI Ops for Problem Management

Problem management is a key process of IT service management. It consists in preventing problems and resulting incidents from happening. Ideally, a good problem management strategy should aim to solve problems before incidents occur.

There are two approaches to problem management:

  • Reactive problem management, through the logging of calls and requests

  • Proactive problem management, through

    • trend analysis of call and request data (for example by performing simple searches

    • the integration of event management tools that enable you to identify events (defined as any deviation from normal or expected operation of a piece of infrastructure) before incidents are logged

    • the automated logging of calls and requests based on a set of user-defined criteria. This is supported by the AI Ops functionality, as described below

AI Ops enhances your Problem Management process by allowing you to set up rules to automatically log calls or requests based on events in your call and request activity. In this way, problems may be identified before Incidents occur.

You can schedule “AI Ops rules” which will run and analyze your Call and Request activity. Each rule has a “threshold” that is, a particular number of events within a running period, and a set of conditions which, when met, will automatically trigger a new call/request in ASM.

Some scenarios for using AI Ops

Example #1: A Problem Manager suspects instability in the network environment. She can configure an automated AI Ops rule to log a Problem call whenever more than 5 high priority outage calls are logged against critical servers. She can configure the rule to exclude any servers that have a Physical Status of “In Test” or “Training Dedicated”. Finally, she can link a Call Template to the rule to direct the call to the Problem Management team when it is logged by the system. When this AI Ops runs and reaches the threshold of 5 high priority outage calls against the critical server, a new call is automatically logged by the system and forwarded to the Problem Management team.

Example #2: The Problem Manager is concerned that a high level of redundancy in the network is making it difficult to identify unreliable hardware. She is most concerned when outages occur in multiple redundant items supporting a parent hardware item.

She configures an automated AI Ops rule to log a call whenever there are more than 5 occurrences in a 3-month period where more than 60% of the redundant items linked to the parent CMDB item are out at the same time.

You can use the IPK Workflow Rules Builder to automate the routing of calls and call notifications through IPK Rules.

An AI Ops rule will consider open/existing calls or requests for analysis, including a call has been created but not yet forwarded to anyone. It can also be setup to consider only New calls.