Università degli Studi dell'Insubria Insubria Space
 

InsubriaSPACE - Thesis PhD Repository >
Insubria Thesis Repository >
01 - Tesi di dottorato >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10277/676

Authors: Zanzi, Antonella
Internal Tutor: TROMBETTA, ALBERTO
Title: Data quality evaluation through data quality rules and data provenance.
Abstract: The application and exploitation of large amounts of data play an ever-increasing role in today’s research, government, and economy. Data understanding and decision making heavily rely on high quality data; therefore, in many different contexts, it is important to assess the quality of a dataset in order to determine if it is suitable to be used for a specific purpose. Moreover, as the access to and the exchange of datasets have become easier and more frequent, and as scientists increasingly use the World Wide Web to share scientific data, there is a growing need to know the provenance of a dataset (i.e., information about the processes and data sources that lead to its creation) in order to evaluate its trustworthiness. In this work, data quality rules and data provenance are used to evaluate the quality of datasets. Concerning the first topic, the applied solution consists in the identification of types of data constraints that can be useful as data quality rules and in the development of a software tool to evaluate a dataset on the basis of a set of rules expressed in the XML markup language. We selected some of the data constraints and dependencies already considered in the data quality field, but we also used order dependencies and existence constraints as quality rules. In addition, we developed some algorithms to discover the types of dependencies used in the tool. To deal with the provenance of data, the Open Provenance Model (OPM) was adopted, an experimental query language for querying OPM graphs stored in a relational database was implemented, and an approach to design OPM graphs was proposed.
Keywords: missing
Subject MIUR : INF/01 INFORMATICA
Issue Date: 2013
Language: eng
Doctoral course: Informatica
Academic cycle: 24
Publisher: Università degli Studi dell'Insubria
Citation: Zanzi, A.Data quality evaluation through data quality rules and data provenance. (Doctoral Thesis, Università degli Studi dell'Insubria, 2013).

Files in This Item:

File Description SizeFormatVisibility
Phd_thesis_antonellazanzi_completa.pdftesto completo tesi851,63 kBAdobe PDFView/Open

This item is licensed under a Creative Commons License
Creative Commons


Items in InsubriaSPACE are protected by copyright, with all rights reserved, unless otherwise indicated.


Share this record
Del.icio.us

Citeulike

Connotea

Facebook

Stumble it!

reddit


 

  ICT Support, development & maintenance are provided by the AePIC team @ CILEA. Powered on DSpace Software.  Feedback