Towards a framework for making applications provenance-aware stars

  1. SÁENZ ADÁN, CARLOS
Zuzendaria:
  1. Beatriz Pérez Valle Zuzendaria
  2. Francisco José García Izquierdo Zuzendaria

Defentsa unibertsitatea: Universidad de La Rioja

Fecha de defensa: 2019(e)ko urria-(a)k 30

Epaimahaia:
  1. Paul Thomas Groth Presidentea
  2. Ángel Luis Rubio García Idazkaria
  3. Maria Antonia Zapata Abad Kidea
Doktorego-tesi honek du
  1. Mención internacional
Saila:
  1. Matemáticas y Computación

Mota: Tesia

Gordailu instituzionala: lock_openSarbide irekia Editor

Laburpena

Aiming at shedding light on data produced by systems, provenance has emerged to refer to the entire amount of information that contributes to the existence of a piece of data. The capture of provenance entails a number of benefits, from reproducibility to accountability, including assessing data quality and validity. With such tangible benefits, it is no wonder that the ability to consider the use of provenance from the early stages of the software development cycle, such as the design phase, has become critically important to support software designers in making provenance-aware applications; that is, applications with the functionality to answer questions regarding the provenance they produce. However, current approaches considering provenance during the design phase do not integrate with existing software engineering methodologies. This makes them challenging to use in practice. UML2PROV is a novel framework intended to bridge the gap between application design and provenance design, minimising software engineers intervention and without requiring them to have provenance skills. With UML2PROV, designers can follow their preferred software engineering methodology in order to create the UML diagrams representing an application’s design, and then, UML2PROV comes into play to automatically generate: (1) the design of the provenance to be generated (expressed as PROV templates); and (2) a software module for collecting values of interest as application is running (encoded as variable-value associations referred to as bindings), and which can be deployed in the application with a minimal developers intervention. The combination of the PROV templates with the bindings generates high-quality provenance ready to be exploited. Hence, UML2PROV ultimately comes to help software engineers in making applications provenance-aware. Around UML2PROV, this thesis presents three main contributions. First, a systematic review of provenance systems, which, among other results, provides a six-dimensional taxonomy of provenance characteristics that can help researchers analyse provenance systems. Second, the conceptual definition of UML2PROV, consisting of a rigorously defined set of 17 patterns mapping UML diagrams to PROV templates, along with the requirements that any generated software module to be deployed in the application for collecting bindings must meet. This approach has been proposed aiming at minimising the intervention on software designers’ and developers’ modus operandi, as well as at facilitating the maintenance of the provenance-aware applications. Third, a reference implementation of UML2PROV based on Model Driven Development techniques. This implementation automatically generates, starting from the UML diagrams of an application, both the PROV templates and the module to collect bindings. Additionally, this reference implementation provides potential users with mechanisms for managing the collection of bindings in different ways. Thus, users may choose the mechanism that best suits their needs attending to the persistence system, the run-time overhead, and storage needs, among others. UML2PROV has also been systematically evaluated. We analysed the quality and efficiency of the provenance generated by our reference implementation, to show the benefits and trade-offs of applying UML2PROV, yielding relevant conclusions for the software engineering community. In particular, as the UML design drives both the design and capture of provenance, we study how different strategies followed during the UML design phase can affect aspects such as provenance design generation, application instrumentation, provenance capability maintenance, run-time overhead and storage needs, and quality of the generated provenance.

Ikerketa datuak