Poor Data Quality Case Study

With big data only expected to get bigger, IT professionals and other organizational leaders admit that they lack complete confidence in their company's data quality management (DQM) practices, according to a recent survey from Blazent. The accompanying report, titled "The State of Enterprise Data Quality: 2016," reveals that that vast majority of survey respondents believe that their organization's perception of data quality is better than it actually is. Problems are created when employees manually input flawed data, or when issues emerge during the data migration/conversion process. Such problems translate directly to lost business value, in the form of additional costs, lost revenues, bad decision-making and/or delays in deploying new systems. "As foundational as data quality is to an organization's success, a majority of IT execs are not confident in their data quality management practices," said Charlie Piper, CEO at Blazent. "The pace of change and seemingly never ending increase in the amount of data and data sources are significant drivers of this lack of confidence. And, critical business decisions are made without a complete and accurate picture. These findings further validate how crucial it is for IT and the C-suite to continue to prioritize data quality, employing an organization-wide streamlined process for data management." An estimated 200 C-level execs, senior IT pros and key business decision-makers took part in the research.

Dennis McCafferty is a freelance writer for Baseline Magazine.






  • 1.

    R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.I. Verkamo, “Fast discovery of association rules,” in Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Shapiro, P. Smyth, and R. Uthurusamy (Eds.), AAAI Press, 1996.Google Scholar

  • 2.

    D. Bitton and D.J. Witt, “Duplicate record elimination in large data files,” ACM Transactions on Database Systems, vol. 8, no. 2, pp. 255–265, 1983.Google Scholar

  • 3.

    F. Caruso, M. Cochinwala, U. Ganapathy, G. Lalk, and P. Missier, Demonstration of Telcordia's Database Reconciliation and Data Quality Analysis Tool, Poster presentation, VLDB, Cairo, Egypt, Sept. 2000.Google Scholar

  • 4.

    P. Cheeseman and J. Stutz, “Bayesian classification (auto class): Theory and results,” in Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.), AAAI Press/MIT Press, 1996, pp. 153–180.Google Scholar

  • 5.

    M. Cochinwala, V. Kurien, G. Lalk, and D. Shasha, Efficient Data Reconciliation, Bellcore Research, 1998.Google Scholar

  • 6.

    I.P. Fellegi and A.B. Sunter, “A theory for record linkage,” Journal of the American Statistical Association, vol. 64, pp. 1183–1210, 1969.Google Scholar

  • 7.

    H. Galhardas, D. Florescu, D. Shasha, and E. Simon, “An extensible framework for data cleaning,” in Procs. EDBT, 1999.Google Scholar

  • 8.

    M.A. Hernadez and S.J. Stolfo, “The merge-purge problem for large databases,” in Proc. of the 1995 ACM SIGMOD Conference, 1995, pp. 127–138.Google Scholar

  • 9.

    M.A. Hernadez and S.J. Stolfo, “Real-world data is dirty: Data cleansing and the merge/purge problem,” Journal of Data Mining and Knowledge Discovery, vol. 1, no. 2, 1998.Google Scholar

  • 10.

    M. Jarke, M.A. Jeusfeld, C. Quix, and P. Vassiliadis, “Architecture and quality in datawarehouses: An extended repository approach,” Information Systems, vol. 24, no. 3, pp. 229–253, 1999.Google Scholar

  • 11.

    M.A, Jaro, “UNIMATCH: A Record Linkage System, User's Manual,” Washington, DC, U.S. Bureau of the Census, 1976Google Scholar

  • 12.

    M. Kubat, I. Bratko, and R. Michalski, Machine Learning and Data Mining, Methods and Applications, John Wiley: New York, 1998.Google Scholar

  • 13.

    A.E. Monge and C.P. Elkan, “Anefficient domain-independent algorithm for detecting approximately duplicate database records,” in Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997.Google Scholar

  • 14.

    A. Motro and I. Rakov, “Not all answers are equally good: Estimating the quality of database answers,” in Flexible Query-Answering Systems, T. Andreasen et al. (Eds.), Kluwer Academic Publishers: Dordrecht, 1997, pp. 1–21.Google Scholar

  • 15.

    H.B. Newcombe, J.M. Kennedy, S.J. Axford, and A.P. James, “Automatic linkage of vital records,” Science, vol. 130, pp. 954–959, October 1959.Google Scholar

  • 16.

    D. Quass, “A framework for research in data cleaning,” Draft, 1999, Brigham Young University.Google Scholar

  • 17.

    R. Quinlan, C4.5--Programs for Machine Learning, Morgan Kauffman: San Mateo, CA, 1993.Google Scholar

  • 18.

    V. Raman and J.M. Hellerstein, “Potter's wheel: An interactive framework for data cleaning and transformation,” University of California, Berkeley, 2000, Submitted, SIGMOD.Google Scholar

  • 19.

    G.K. Tayi and D.P. Ballou, “Examining data quality,” Communications of the ACM, vol. 41, no. 2, pp. 54–57, 1998.Google Scholar

  • 20.

    A. Umar, G. Karabatis, L. Ness, B. Horowitz, and A. Elmagarmid, “Enterprise data quality: A pragmatic approach,” Information Systems Frontiers, vol. 1, no. 3, pp. 279–301.Google Scholar

  • 21.

    P. Vassiliadis, M. Bouzeghoub, and C. Quix, “Towards quality-oriented data warehouse usage and evolution,” Information Systems, vol. 25, no. 2, pp. 89–115, 2000.Google Scholar

  • 22.

    V.S. Verykios, A.K. Elmagarmid, M. Elfeky, M. Cochinwala, and S. Dalal, “On the completeness and accuracy of the record matching process,” in Proceedings of the 2000 Conference on Information Quality, October 2000, Boston, MA, pp. 54–69.Google Scholar

  • 23.

    V.S. Verykios, A.K. Elmagarmid, and E.N. Houstis, “Automating the approximate record matching process,” Journal of Information Sciences, vol. 126, nos. 1–4, pp. 83–98, 2000.Google Scholar

  • 24.

    Y. Wand and R.Y. Wang, “Anchoring data quality dimensions in ontological foundations, Communications of the ACM, vol. 39, no. 11, pp. 86–95, 1996.Google Scholar

  • 25.

    R.Y. Wang and H.B. Kon, “Towards total data quality management (TDQM),” in Information Technology in Action: Trends and Perspectives, R.Y. Wang (Ed.), Prentice Hall: Englewood Cliffs, NJ, 1993.Google Scholar

  • 26.

    R.Y. Wang, V.C. Storey, and C.P. Firth, “Aframework for analysis of data quality research,” IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 4, pp. 623–640, 1995.Google Scholar


    Leave a Reply

    Your email address will not be published. Required fields are marked *