3 min read

Crux Makes Dealing With Schema Changes Easy

Getting external data ready for analytics and data science is complicated. You can spend hours sifting through reference specs trying to find the details you need to prepare the data to fit your use case. Then the next day it comes crashing down thanks to a renamed column.

Building and maintaining a pipeline requires the right technology and the right people. And engineering resources are scarce when you’re fixing schema breaks across hundreds or thousands of datasets. Not to mention a mounting backlog of data products waiting to be onboarded. 

What if there was a way to make data easy? What if all information could be analytics-ready? Well, that’s what we do at Crux. It’s our mission to be the reason the world forgets about the complexity of the data ecosystem. One of the ways we do that is by making schema changes easy.

Prepare for schema changes

We mean it when we say we commit to delivering your data usable and analytics-ready, regardless of what happens upstream.

Schema changes happen all the time as datasets get updated. But if the data user isn’t prepared for them, the result is a schema break.

Even the smallest and simplest of schema breaks can have far-reaching consequences for the data user. Broken pipelines can make the downstream data user draw inaccurate insights and make costly bad decisions. And worst case, schema changes will cause downstream business processes to fail and reports will be inaccessible because data cannot even be loaded into production systems. Engineering resources will get diverted again.

Crux makes it easy and worry-free. We handle schema changes in three ways:

Proactive schema-change management

Despite having hundreds of partnerships with data suppliers, we don’t let our relationships slip through the cracks—because we take the word “partnership” seriously. We keep in touch with our partners, and we keep track of what they’re up to.

When a data supplier plans a schema change, the supplier will carefully document the new data structure, along with the effective date, issuing a minimum of 30 days’ notice. Crux proactively keeps track of these notices and changes. From there, we do our work, updating the impacted datasets so that—once the effective date arrives—everything is seamless.

Reactive schema-change management

Data products from vendors are rarely static. They’re constantly changing and, naturally, surprise schema changes happen. And suppliers might not send a notification for the occasional time a data type changes or a column is removed.

We’ve got a fix for that, too. Crux products have built-in “circuit breakers” to detect schema changes and default behaviors, protecting data consumers and their data pipelines. When a schema change is detected, Crux immediately stops the flow of data temporarily to prevent a pipeline break and sends alerts.

If that were the end of the story, we wouldn’t have a lot to boast about. After all, alerts don’t make things easy. They just remind you that hard things are hard. That’s not what we’re about.

Crux makes external data easy with proactive reactivity

Crux doesn’t just send alerts to data consumers and call it a day. We fix it.

With Crux, schema-break alerts don’t go only to the data user. We at Crux get alerted too.

From there, our response team will work with our data-supplier partner to determine what caused the schema break exactly and whether or not it was an aberration. From there, our data engineers will make changes to accommodate the immediate change and any future changes.

Even that is not the end of the story, however. If the schema change represents the new status quo, Crux separates the historical schema from the new schema and delivers it independently on the side—allowing data users the flexibility to consume data the way they like it. Data à la carte.

Why it’s easy

When Crux gets a dataset, we dig into it. Thoroughly. We want to understand everything about it—both as it presently exists and historically.

From there, we create a blueprint for the data, and test our resulting code extensively. Only when our data engineers are fully satisfied with it do we deploy it to run continuously. That’s why customers turn to Crux for external-data integration, and that’s why we work with more than 300 data sources to deliver their datasets. We take what’s hard and make it easy for our customers to take their external data from raw to analytics-ready—with no compromises.

We get it. External data can be hard. But it doesn’t have to be.

Are you completely satisfied with how easy it is to integrate your external data? If not, let’s talk.

What Cloud Marketplaces Do and Don’t Do

What Cloud Marketplaces Do and Don’t Do

Not long ago, we observed here in our blog that the critical insights that drive business value come from data that is both (1) fast and (2) reliable.

Read More
The 3 Dimensions of AI Data Preparedness

The 3 Dimensions of AI Data Preparedness

This past year has been exciting, representing the dawning of a new age for artificial intelligence (AI) and machine learning (ML)—with large...

Read More
How Do Small Hedge Funds Solve the Big Problem of External-Data Integration?

How Do Small Hedge Funds Solve the Big Problem of External-Data Integration?

How do you get white-glove customer service from a major data supplier?

Read More