With all the buzz around ‘big data’ and the proliferation of new data sources – social media, event tracking tools – combined with the residual issues relating to legacy system data, it is important to remember that the value of data needs to be measured by quality as well as quantity. The old adage: ‘Garbage in, garbage out’ has never been more important. And, the issue is the domain of business, not IT, management.
So what’s the problem? Simply put, different systems collect data in different ways. Some systems rigorously edit and filter data during input. However, modern social media sites, as well as older legacy systems leave lots of room for free-style data input. Compounding this, there are compelling business reasons to bring diverse data sets together into a common data source, e.g., converting legacy systems, mergers and acquisitions, providing a single client view for marketing and sales. As the number of sets increases, the possibility of compromised data quality rises.
How important is data quality? That depends on its business use. If data are used to help accent reports with anecdotal references (example tweets, etc.), quality may not be as important. However, if the data are being used to provide information to regulators or during due diligence, the quality will have a higher degree of import.
There are a wide variety of technical tools and techniques to do data matching and consolidation. There are also an emerging discipline in data testing (see Syed Haider’s recent blog post in Insurance Networking News, for example). However, data quality remains a business issue, not a technical issue. This has two implications.
First, there needs to be a business structure to address data quality issues. In mid- to large-size organizations, there may be value in establishing a data governance structure. This could be a committee that includes representation from business units, Finance, and IT that meets as required to address data issues, approve data remediation projects, set data quality standards and establish responsibility and accountability. In smaller organizations, this might be set our as a responsibility within a specific business position (e.g., COO, CFO).
Second, any organization examining a data cleansing effort – as a project itself or as part of a larger project – needs to make a business decision about the amount of effort to address the issues. We recently were reminded of a 2006 white paper from Informatica, The Data Quality Business Case: Projecting Return on Investment. It is reasonably short, and written in business terms. It’s purpose is straight forward:
Even though everyone fundamentally understands the need for high quality data, technologists
are often left to their own devices when it comes to ensuring the high levels of data quality.
However, at some point an investment must be made in the infrastructure necessary to provide
measurably acceptable levels of data quality. In order to justify that investment, we must be able
to articulate the business value of data quality in a way that will show a return on the investment
made.
To paraphrase French Prime Minister Georges Clemenceau, “Data are too important to be left to technologists.”
What do you think? Is Data Quality an issue with you? What are the business impacts? How are you addressing the challenges?
Data quality is a big issue for every level of insurance, because it is directly related to the decision making ability of anyone viewing the insights and intelligence derived from the data.
For us (IMS – http://www.intellimec.com) as a Telematics Solution Provider, the quality of data that our customers require, determines the level of infrastructure and engineering we need to put in place to deliver this quality level. If we start with the assumption that the reason for acquiring data is to make better decisions, then it stands to reason that better decisions require better intelligence derived from better, or a higher quality of data. The “math” would therefore be based on the following “equation”:
Better decisions > Better intelligence > Better data > Better engineering > Better device.
Insofar as this argument is true then you need to have a better foundation at gathering data starting with the business fundamentals but really including the device, software (firmware on the device and the back-end to collect, analyze, visualize and interpret the data generated by the device) and communications link.
For example, we know that the alignment of an accelerometer in a device relative to the vehicle where it is plugged in, can greatly affect the output of data in terms of braking, cornering and indeed direction of the vehicle. Unless the “X”, “Y”, and “Z” axes of a device and accelerometer are aligned with the chassis of a vehicle, then the data will be skewed. To compensate for any misalignment we need to introduce correctional algorithms to ensure the highest quality data. This requires additional engineering support but is necessary to achieve a high level of data quality.
Further, our experience shows us that a GPS device works best when you fuse the GPS and Accelerometer sensors in a process called “Sensor fusion”. So combining the data streams from the GPS with the Accelerometer (Data fusion?) will result in a more complete and accurate picture of the route a particular driver takes, and hence a path of the relative risks he/she takes while driving. Again this requires more engineering but results in a clearer picture of actual driving behavior.
The key challenge we see is in physically demonstrating our equation that Better decision making requires Better data from a better data delivery system. I think people can see the argument for high data quality, but with costs for telematics programs still considered to be high, there are trade-offs being made between data quality and costs. We’re working on side-by-side demonstrations etc., to try to quantify and communicate the differences and value that quality of data makes. That’s because in the end we know that prices of telematics programs will drop and quality will rise in importance.