4 min read

The 3 Dimensions of AI Data Preparedness

This past year has been exciting, representing the dawning of a new age for artificial intelligence (AI) and machine learning (ML)—with large language models (LLMs) and generative-AI tools like ChatGPT leading the way. IDC anticipates that by the end of 2023, global spending on AI will reach $154 billion—representing a 26.9% year-over-year increase.

And this only makes sense. AI and ML offer powerful data-automation capabilities that can transform a business and give it a competitive edge. In a recent study by Vinson Bourne and Workday surveying senior decision-makers at large enterprises around the world, 80% reported that their organizations need AI to stay competitive.

Despite all of this, most organizations are not yet prepared for these emerging technologies.

A Vicious Circle

Recently, members of the Crux team spent some time at Google Next ’23 (we recapped our observations from Google Next ’23 here). And as attendees mixed and mingled with us at the Crux booth, the observation we heard the most was this (more or less):

The critical insights that drive business value come from fast, reliable data.

The observation is not merely anecdotal. In the Vinson Bourne/Workday study, 77% of respondents indicated that they “are concerned that their organization’s data is neither timely nor reliable enough to use with AI and ML.” (Illustrating the point, only 4% of respondents reported that their AI/ML initiatives did not fall short of expectations.)

This is a problem. As IDC puts it, “Companies that are slow to adopt AI will be left behind—large and small.”

At scale, the problem becomes an outright disaster—and extends beyond AI. Gartner forecasts that, by 2025, 80% of organizations seeking to scale digital business will have failed because of a failure to adopt a modernized approach to data governance.

To wit, companies need the speed and reliability of AI and ML—but their data is too slow and unreliable to adopt AI and ML. They are caught between a rock and a hard place.

Get Ready for AI in 3 Dimensions

To address the problem, companies need to understand the problem.

AI/ML preparedness necessarily means data preparedness. (After all, AI and ML systems need to be based on something.) And AI data exists in three dimensions. At the outset of any AI/ML initiative, an organization should assess their data in terms of those three dimensions:.

  1. Quality
  2. Accessibility
  3. Quantity

AI Data Quality

An AI model is only as good as the data on which it relies. Moreover, in terms of data quality, there is arguably more at stake in the context of AI than in the context of any other data model or use case. It is critical, then, that AI and ML initiatives leverage only data that is completely trustworthy.

The two most fundamental elements of data trustworthiness are (1) completeness and (2) correctness. But there are nuances to these elements—even beyond the facts of the values themselves. Is all the formatting correct? Are the values consistently represented? Do the values conform to expected data types? Is the data still “fresh” and relevant? Is the data, in every way, validated and analytics-ready?

Ensuring data completeness often requires varied sourcing from internal and external datasets (see below), while ensuring data correctness demands strong data governance and data management.

AI Data Accessibility

Because AI systems need data, they need reliable access to data. In a perfect AI world, all the data that an AI model could ever need exists in a single centralized system, stored in a single standardized format.

Few if any organizations live in that world (yet). It’s common for necessary data to be stored in a variety of formats across numerous, disparate on-prem and cloud infrastructures, the infrastructures themselves all managed and secured differently.

The problem is further compounded in larger organizations that have had their share of mergers and acquisitions. There one may find a variety of legacy systems, all constructed and managed differently, each siloed within different stakeholder “ownerships” that may not even be known to parts of the organization that need that data.

This makes the perfect AI world all the more difficult to achieve. To prepare for AI, therefore, an organization must identify its data stores and assess all the places and formats in which its data may live. Eventually, some measure of streamlining, centralization, and standardization will have to take place—but the problem has to be properly identified before it can be solved. Anything less means slow, unreliable data—anathema to AI success.

All of this, incidentally, is to say nothing of external data.

AI Data Quantity

Organizations—especially in data-intensive industries like financial services—often cannot subsist on their internal data alone; they must optimize their data stores by augmenting with external data.

The problems of external data mirror those of internal data. While third-party data suppliers at least typically know what data they have to offer, where it’s stored, and how it’s stored, an outsourced data supply is still not a perfect world.

For starters, standardization issues are still a problem; data suppliers often store and deliver data in formats that don’t alway match their customers’ needs. Similarly, external data may undergo unexpected changes as it is updated, leading to schema breaks and other data-quality issues. Additionally, external-data updates may not always happen on schedule—leaving downstream organizations guessing as to how to optimize their seeker tools.

Organizations face countless more complex problems with onboarding external data, hindering speed and reliability. And, because there is literally no limit to the amount of external data there is to be collected, external data presents a scalability problem as well—making it cost-prohibitive to build a sustainable in-house solution.

Quantity breeds complexity—but to reduce data quantity would be to reject the benefits of external-data supplementation. Therefore, at the outset of any AI/ML initiative, organizations should reduce their external-data complexity. The best and most proven way to do this is to choose “buy” over “build”—and outsource the external-data onboarding function to a managed-services provider. This will sustainably ensure that the third-party data your AI/ML systems will rely on is exactly where you need it, when you need it, and how you need it.

Preparing for AI isn’t easy, and there are many potential pitfalls along the way. If you’d like to learn more about preparing your organization for an AI initiative, follow up with us here.

What Cloud Marketplaces Do and Don’t Do

What Cloud Marketplaces Do and Don’t Do

Not long ago, we observed here in our blog that the critical insights that drive business value come from data that is both (1) fast and (2) reliable.

Read More
The 3 Dimensions of AI Data Preparedness

The 3 Dimensions of AI Data Preparedness

This past year has been exciting, representing the dawning of a new age for artificial intelligence (AI) and machine learning (ML)—with large...

Read More
How Do Small Hedge Funds Solve the Big Problem of External-Data Integration?

How Do Small Hedge Funds Solve the Big Problem of External-Data Integration?

How do you get white-glove customer service from a major data supplier?

Read More