As a long-time data professional, I am pleased to see that companies have put a much greater focus on the quality of their corporate data assets in recent years. As everyone rushes to herald data as their #1 corporate asset, it is important to realize that the collection, transformation, and publication of flawed data can have far-reaching negative impacts.
IBM estimates that the yearly cost of poor quality data in the United States in 2016 alone was $3.1 trillion. Do I have your attention now? Decisions made based on inaccurate or mischaracterized data can negatively impact your corporate operations, profitability, and other key processes.
Laws and regulations protecting PHI and PII are now common place and companies go to great lengths to mask and encrypt this type of data. But how do you know if one of your source systems is embedding a credit card number or a social security number in a text field or is using it as an unencrypted primary key? The answer is, you probably don’t. So, what can you do about it?
Start the Conversation
A great starting point is socializing the concept of a data quality program with your co-workers in both the IT and business organizations. Begin asking questions about your data by reaching out to those who deal with production issues on a daily basis. Talk with your chief information security officer and ask him/her what concerns they might have with the cleanliness of your corporate data. Talk to peers in your industry and ask them what successes and failures they might be experiencing in the context of their corporate data. Do some research and get people talking.
Commit to Quality
It is imperative that a data quality initiative have the full support of key stakeholders who are committed to the long-term results. While an initiative of this type may start out as a project, it is important to establish some key wins early that show the positive impacts that establishing a program will have on your organization. An overarching set of objectives should be agreed upon before a common set of processes and standards are implemented. If you are unsure how to jump start this new initiative, partner with a consulting organization that has had success working with other clients in your industry.
Partner with the Business
Knowing technology better than their counterparts in the business organization, information technologists might be emboldened to “go it alone” with a high revving toolset at their disposal. That would be a huge mistake. IT should work in concert with the data owners and a solid workflow should be established so the right resources are assigned to the right tasks. Business analysts, quality control resources, and end users will know better than anyone where potential data quality issues exist.
Tools You Can Use
A good data quality tool will provide the following functionality:
- Data Profiling
- Standardization
- Cleansing
- Monitoring
- Matching
A quick glance at the latest Gartner Magic Quadrant for Data Quality tools illustrates how much this segment has grown over the past several years. Informatica offers a best-in-class suite of data quality tools and IBM’s Quality Stage is also a great option for those that are already invested in IBM products. These are enterprise level tools that integrate nicely with their traditional ETL tools. The Informatica Data Quality tool provides workflow functionality, quality scorecard functionality, and the ability to schedule data profiling runs and compare the results to see if the quality of your data is improving.
Profiling for Success
An essential component of a successful data quality program is the profiling of key data sets. Profiling can provide insights into the quality of your data and serves as a key milestone as you lay out a roadmap to improving the overall quality of your data assets. No matter the size of your organization, getting started with the profiling process can be difficult. Here are a few ideas on how to get started:
- Target a few of your key systems that are core to the management of your business.
- Gather anecdotal evidence from a variety of resources (e.g. operations, business analysts, IT developers and testers, and even end users)
- Begin by sampling your data to get a better idea how best to use the tool set at your disposal
Turn Insight into Action
The results of a data profiling exercise will yield a variety of insights into the health of your data. It is important to act on these insights to maximize your ROI. While not always possible, it is a best practice to correct the data issues at the source by working with the data owners. Fixing data at the source will ensure that any downstream systems or processes will be consuming clean data.
If fixing the flawed data at the source is not possible, many data quality tools will allow your team to build rules that not only identify data issues, but that fix them. These rules can often be shared with your ETL tools and will provide a transformation that will fix a data issue.
Tune in Next Time
This is the first installment in a series of blog articles regarding data quality. In my next blog, I will break down three common types of data quality issues, and provide a detailed methodology for correcting each.