3 min read

Bad Data’s Lasting Impact on your Analytics Runs Deeper than you Think

Data monitoring can’t be neglected–the net effect of erroneous data can be catastrophic for your business.

Imagine this: it’s around 7 pm, and you’re just now getting home after work, shuttling your kids around to their after-school activities, and picking up some dinner. You walk into your dark house, flip on the lights, and notice that your kitchen floor is covered in water, and you can hear it coming from under your sink. You have no idea what happened, but it’s after business hours, so you shut off the water, and have no choice but to call an emergency plumber and pay the fee that comes along with it. 

The plumber arrives an hour later, informs you that a pipe burst–likely earlier that day–and that he’s fixed it. When you weren’t looking, someone put Legos down the sink, and they blocked up the pipe and the pressure caused it to burst. You had no idea because you don’t regularly inspect the plumbing in your house–who does? You thank him, pay him, and wonder how on Earth you’re going to keep it from happening again while still maintaining your house, kids, and sanity. 

Plumbing for external data is no different. Much like parents, data engineers are expected to play their role on top of acting as a data “plumber,” monitoring, maintaining, and correcting issues when they occur within the flow of external data into your company. The work is tedious, human error is rife, and it’s hard to know what you’re looking for until it’s flooded your house, or in the case of external data, ruined a mission critical application. Just like an unseen plumbing issue, the effects of one mistake can wreak havoc on anything downstream. 

Considering the Consequences of Bad Data 

Much like our fictional parent, most organizations don’t hire data engineers purely to maintain the plumbing of their external data flow and keep them on the bench until there’s an issue that needs to be resolved. Data teams are built of talented individuals who are required to drop everything to diagnose what the problem is, where it is coming from and then fix the issues as they arise. Because this type of monitoring is often manual and isn’t a first priority, bad data can slip through the cracks and create a constant series of fire drills. 

Bad data isn’t as nefarious as it sounds, but its consequences are. Something as small as a one-letter typo, a schema change, missing information, or an incorrect classification for a stock can set off a ripple effect. For example, an error in a pricing file can materially impact performance and attribution reports for internal and external clients. With many manual processes involved in the production of dashboards, portfolio reconciliations and client portfolio performance and attribution, the risk of errors is serious and can result in monetary or job loss, regulatory fines, and significant reputational damage to institutional client relationships. 

Time for a Change: Reducing Bad Data's Reach

A data marketplace exists within a specified cloud environment that can produce raw data within seconds. Oftentimes, these marketplaces pull in data from several aggregators to serve up quickly to your organization. When it comes to our catering metaphor, this would be akin to purchasing a delivery of catering portions from Chipotle–but it’s still up to you to serve your guests and potentially prepare the food, depending on what you purchase, but the delivery and cooking and are done for you. Data from marketplaces is still raw data, and will have to be transformed to meet your business needs. The main difference between an aggregator and marketplace is the speed at which you get the data, and that with a marketplace, it’s conveniently delivered right to your storage–no onboarding required.

In today’s market, the risk of a negative impact of having bad data rises exponentially as the volume of data builds in the metaphorical pipes. One bad ‘leak’ or broken pipe can really cause significant negative impact to your business and will require resulting in an increased risk factor every time someone new touches a dataset. This is a problem that cannot be ignored, as this data drives strategic business plans, investment decisions, and more. But the risk is accepted because this is just the status quo for data operations. 

While it’s easy to fear technology malfunctioning and trusting a real person to complete quality work, that isn’t often the case. A systemic, technology-based approach to data operations is far safer, as it reduces risk of introducing bad data into your plumbing. Businesses who take the leap and remove the manual, human error-prone steps from their data management processes will maintain a competitive edge over those who choose not to. It’s just like having an emergency plumber on call, 24/7/365, and never having to deal with a flooded kitchen, because the Lego was stopped from ever going down the sink. This transition also allows your data engineers and scientists to refocus their daily energy from tedious tasks to the value-add analytical work you hired them to do in the first place. 

Crux is that on-call emergency plumber. Our technology and external data expertise allow us to take over that maintenance and monitoring, and correct those common bad data issues before you even knew they were there. If you’re ready to optimize data operations and let us do the heavy lifting for you, reach out at charles.ashwanden@cruxinformatics.com today. 

Supplier Spotlight: RS Metrics

Supplier Spotlights highlight Crux's partners and shine a light on their unique data product offerings. 

Read More

Outsourcing External Data Operations: Three Common Misconceptions

Managed services can be a data engineer’s best friend. Here’s what they don’t know–but should–about utilizing these services.

Read More

Supplier Spotlight: Denominator

Supplier Spotlights highlight Crux's partners and shine a light on their unique data product offerings. 

Read More