Data issues continue to plague businesses and communicators—it seems the more data becomes available, the more confounding the quality issues. Data reliability firm Monte Carlo recently announced the initial results of its 2022 data quality survey, which found that data professionals are spending 40 percent of their time evaluating or checking data quality and that poor data quality impacts 26 percent of their companies’ revenue.
The report, based on a survey conducted by Wakefield Research, reveals that 75 percent of the 300 data professionals surveyed take four or more hours to detect a data quality incident and about half said it takes an average of nine hours to resolve the issue once identified. Worse, 58 percent said the total number of incidents has increased somewhat or greatly over the past year, often as a result of more complex pipelines, bigger data teams, greater volumes of data, and other factors.
Today, the average organization experiences about 61 data-related incidents per month, each of which takes an average of 13 hours to identify and resolve. This adds up to an average of about 793 hours per month, per company.
However, 61 incidents only represents the number of incidents known to respondents. Proprietary data from the Monte Carlo platform suggests the average organization experiences about 70 data incidents per year for every thousand tables in their environment.
“In the mid-2010s, organizations were shocked to learn that their data scientists were spending about 60 percent of their time just getting data ready for analysis,” said Barr Moses, Monte Carlo CEO and co-founder, in a news release. “Now, even with more mature data organizations and advanced stacks, data teams are still wasting 40 percent of their time troubleshooting data downtime. Not only is this wasting valuable engineering time, but it’s also costing precious revenue and diverting attention away from initiatives that move the needle for the business. These results validate that data reliability is one of the biggest and most urgent problems facing today’s data and analytics leaders.”
Nearly half of respondent organizations measure data quality most often by the number of customer complaints their company receives, highlighting the ad hoc—and reputation damaging—nature of this important element of modern data strategy.
The business cost of data downtime
“Garbage in, garbage out” aptly describes the impact data quality has on data analytics and machine learning. If the data is unreliable, so are the insights derived from it.
In fact, on average, respondents said bad data impacts 26 percent of their revenue. This validates and supplements other industry studies that have uncovered the high cost of bad data. For example, Gartner estimates poor data quality costs organizations an average $12.9 million every year.
Nearly half said business stakeholders are impacted by issues the data team doesn’t catch most of the time, or all the time.
In fact, according to the survey, respondents that conducted at least three different types of data tests for distribution, schema, volume, null or freshness anomalies at least once a week suffered fewer data incidents (46) on average than respondents with a less rigorous testing regime (61). However, testing alone was insufficient and stronger testing did not have a significant correlation with reducing the level of impact on revenue or stakeholders.
“Testing helps reduce data incidents, but no human being is capable of anticipating and writing a test for every way data pipelines can break. And if they could, it wouldn’t be possible to scale across their always changing environment,” said Lior Gavish, Monte Carlo CTO and co-founder, in the release. “Machine learning-powered anomaly monitoring and alerting through data observability can help teams close these coverage gaps and save data engineers’ time.”
Within six months, 90 percent of organizations will invest or plan to invest in data quality
Last year, organizations spent $39.2 billion on cloud databases such as Snowflake, Databricks and Google BigQuery. This year, 88 percent of respondent organizations are already investing or planning to invest in data quality solutions within six months.