With all the buzz around ‘big data’ and the proliferation of new data sources – social media, event tracking tools – combined with the residual issues relating to legacy system data, it is important to remember that the value of data needs to be measured by quality as well as quantity. The old adage: ‘Garbage in, garbage out’ has never been more important. And, the issue is the domain of business, not IT, management.
So what’s the problem? Simply put, different systems collect data in different ways. Some systems rigorously edit and filter data during input. However, modern social media sites, as well as older legacy systems leave lots of room for free-style data input. Compounding this, there are compelling business reasons to bring diverse data sets together into a common data source, e.g., converting legacy systems, mergers and acquisitions, providing a single client view for marketing and sales. As the number of sets increases, the possibility of compromised data quality rises.
How important is data quality? That depends on its business use. If data are used to help accent reports with anecdotal references (example tweets, etc.), quality may not be as important. However, if the data are being used to provide information to regulators or during due diligence, the quality will have a higher degree of import.
There are a wide variety of technical tools and techniques to do data matching and consolidation. There are also an emerging discipline in data testing (see Syed Haider’s recent blog post in Insurance Networking News, for example). However, data quality remains a business issue, not a technical issue. This has two implications.
First, there needs to be a business structure to address data quality issues. In mid- to large-size organizations, there may be value in establishing a data governance structure. This could be a committee that includes representation from business units, Finance, and IT that meets as required to address data issues, approve data remediation projects, set data quality standards and establish responsibility and accountability. In smaller organizations, this might be set our as a responsibility within a specific business position (e.g., COO, CFO).
Second, any organization examining a data cleansing effort – as a project itself or as part of a larger project – needs to make a business decision about the amount of effort to address the issues. We recently were reminded of a 2006 white paper from Informatica, The Data Quality Business Case: Projecting Return on Investment. It is reasonably short, and written in business terms. It’s purpose is straight forward:
Even though everyone fundamentally understands the need for high quality data, technologists
are often left to their own devices when it comes to ensuring the high levels of data quality.
However, at some point an investment must be made in the infrastructure necessary to provide
measurably acceptable levels of data quality. In order to justify that investment, we must be able
to articulate the business value of data quality in a way that will show a return on the investment
To paraphrase French Prime Minister Georges Clemenceau, “Data are too important to be left to technologists.”
What do you think? Is Data Quality an issue with you? What are the business impacts? How are you addressing the challenges?