Recently, Gartner announced its top 10 strategic technology trends for 2019. It is a nice list, touching on digital transformation trends that range from empowered edge computing to artificial intelligence-driven autonomous things. But while Gartner’s trends sound great in annual reports and Forbes articles, operationally, most enterprises aren’t properly (or digitally) prepared to adopt these trends. The reason why? Today’s pace of business and the disorderly data that’s needed to make sense of it all.
In the past, IT environments were simpler and more accessible for humans. But with the advent of cloud, containers, multi-modal delivery and other new technologies resulting in inordinately massive and complex environments, IT is being forced to move at machine speed, rendering manual processes too slow and inefficient.
To keep up with the rapid pace and scale of today’s digital environments, enterprises are turning to AIOps, which is powered by machine learning (ML) and artificial intelligence (AI). Unfortunately, ML-based algorithms and AI-based automation, key elements of unlocking digital transformation, are easier said than done. The underlying reason is that ML-based algorithms, by themselves, aren’t sophisticated enough to deal with today’s ephemeral, containerized, cloud-based world. ML needs to evolve into AI, and to do that, it needs cleaner actionable data to automate processes.
But attaining high-quality data presents its own unique challenges, and enterprises that do not have the right strategy in place will encounter cascading problems when trying to implement digital transformation initiatives in the future.
How To Build A High-Quality Data Strategy — Two Types Of Data
Imagine cooking a meal from scratch only to realize you forgot to chop an onion. You might be able to add it in later, but it won’t add the same texture and flavor. Too often, enterprises embark on an AI/ML transformation only to realize mid-development that they are missing key performance indicator (KPI) data that they did not foresee needing. Such mid-process realizations can have deleterious effects on a digital transformation initiative, stalling or even crippling its progress. Simply put, AI/ML doesn’t function without the right data.
The first step to building a high-quality data strategy is realizing that you need two separate data strategies: one for historical data and the other for real-time data or continuous learning.
Historical data is crucial for AI/ML strategies and serves as the fundamental building block for any effective anomaly detection, predictor or pattern analysis implementation. However, getting the right historic training data is much more difficult and challenging than many might assume.
There are several key questions to consider:
• What do your end goals and use cases for automation look like?
• What data do those use cases demand?
• How much of that data do you need?
• At what fidelity do you need that data?
Next, realize that training AI/ML on historical data is not enough. It needs to ingest real-time data to respond to and automate processes. Real-time data is the fuel that allows the ML algorithms to learn and adapt to new situations and environments. Unfortunately, real-time data presents its own set of challenges, too. The volume, velocity, variety and veracity of data can be overwhelming and expensive to manage.
Finally, enterprises must ensure the ML algorithms don’t acquire bad habits as a consequence of using poor data. And like bad human habits, it is hard to get an AI to unlearn a bad habit once formed. Specifically, these could be outliers that are erroneously deemed normal when they aren’t. Or they could present data gaps, which may skew newly learned behavior. Fundamentally, an AI/ML platform that does learn from bad data can ultimately result in extraneous false alerts and have negative impacts on IT operations. There are multiple ways to avoid going down this path, but they all boil down to one important thing: data quality.
The Two Most Important Ingredients For Data Quality
Historic and real-time training data are foundational to AI, ML and automation. However, data quality remains a major sore point for enterprises that underestimate the complexity of that challenge. Fortunately, data quality issues don’t have to be a terminal problem if approached strategically.
The most important step is to have full visibility both horizontally across operational silos and vertically, deep into infrastructure layers. You won’t know what KPIs are going to be important, so an ideal solution is one that allows you to ingest as much data as possible from as many places as possible right from the start.
It is also crucial that data be stored and normalized in a way that connects it to other data. Data that rests in silos will never be able to power automation; it has to have context. An ideal solution is one that can ingest data and contextualize it simultaneously. Spending time stitching data together, normalizing and correlating it after it is ingested is time-consuming and difficult.