Computing Reviews

Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields
Jaiswal A., Miller D., Mitra P. ACM Transactions on Database Systems38(1):1-34,2013.Type:Article
Date Reviewed: 07/30/13

Schema matching is a key enabler for addressing data access and knowledge acquisition in this new era of data deluge. Streams from big data and heterogeneous databases produce huge demands for analytics and information discovery. In that sense, this paper represents very important progress, since it provides an algorithm for schema matching based on two key aspects from different databases. Continuous attribute matching and the use of value mapping can point the way to enhanced schema mapping. The challenges have been addressed with a global objective function minimization algorithm that matches columns with continuous value attributes, modeled with a Gaussian mixture model and an iterative descent algorithm that embeds value mappings to enhance schema matching accuracy.

To provide a context for their proposition, the authors present a thorough history of schema matching research for heterogeneous databases, contrasting their ideas with alternatives throughout the paper. For example, they propose the use of log-likelihood versus Euclidean distance metrics and demonstrate the effectiveness of their proposed methods with experimental tests. These approaches result in a dense but straightforward resource for the researcher eager to learn about state-of-the-art schema matching.

To strengthen the proposition of this paper, it would be interesting to advance the research on two fronts: a) develop more test cases besides the same old US Census Bureau dataset of 1990, and b) enhance the study beyond the traditional relational model that obeys first and second normal forms. The former is mandatory to position the method as a real breakthrough in the field of schema matching, and the latter is required to handle schema matching in a world where big data grows exponentially.

Reviewer:  Jair Merlo Review #: CR141410 (1310-0927)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy