PROV-IDEA: Supporting Interoperable Schema and Data Provenance within Database Evolution

  1. Pérez, Beatriz 1
  2. Rubio, Ángel Luis 1
  3. Zapata, María A. 2
  1. 1 Universidad de La Rioja
    info

    Universidad de La Rioja

    Logroño, España

    ROR https://ror.org/0553yr311

  2. 2 Universidad de Zaragoza
    info

    Universidad de Zaragoza

    Zaragoza, España

    ROR https://ror.org/012a91z28

Journal:
ACM Transactions on Software Engineering and Methodology

ISSN: 1049-331X 1557-7392

Year of publication: 2024

Type: Article

beta Ver similares en nube de resultados
DOI: 10.1145/3697008 GOOGLE SCHOLAR lock_openOpen access editor

More publications in: ACM Transactions on Software Engineering and Methodology

Abstract

Database evolution and data provenance are two closely related research fields. On the one hand, the registry (via provenance)of the schema evolution allows the maintenance of its version record. On the other hand, the origin of the data (i.e. itsprovenance) will always be affected by modifications (i.e. the evolution) in the schema on which they are based. Despite theseinterrelationships, there are few works in the literature that have proposed advances in that direction. In particular, to the bestof our knowledge, there is no research that has resulted in a general and interoperable solution to the problem of managingdatabase evolution using provenance. In this paper we present PROV-IDEA: a PROV-Interoperable Database EvolutionApproach. This is a proposal that allows the simultaneous management of the provenance of schemas (of relational databases)and data, using the PROV standard as a way to guarantee interoperability. Furthermore, it is an adaptable and expandableapproach (by using PROV templates), which allows a non-intrusive and seamless integration with existing applications, aswell as different aspects of provenance information generation. These properties are demonstrated in the article by presentinga proof of concept built on top of a third-party relational database evolution too

Bibliographic References

  • Christos Athinaiou and Haridimos Kondylakis. 2019. VESEL: VisuaL Exploration of Schema Evolution using Provenance Queries. In EDBT/ICDT Workshops, Vol. 2322.
  • 10.1007/978-3-319-98379-0_24
  • 10.18420/btw2021-18
  • Bartłomiej Żyliński. 2022. Database Migration tools: Flyway vs Liquibase. Online at https://dzone.com/articles/flyway-vs-liquibase. Last accessed September 4, 2024.
  • Marco Boskovic, Timo Warns, and Wilhelm Hasselbring. 2006. Model Driven Instrumentation for Relational Event Traces. Radioelektronic and Computer Systems 6(18) (2006).
  • Zouhaier Brahmia, Fabio Grandi, Barbara Oliboni, and Rafik Bouaziz. 2018. Schema versioning in conventional and emerging databases. In Encyclopedia of Information Science and Technology, Fourth Edition. IGI Global, 2054–2063.
  • Peer C Brauer, Florian Fittkau, and Wilhelm Hasselbring. 2014. The aspect-oriented architecture of the caps framework for capturing, analyzing and archiving provenance data. In Proceedings of the International Provenance and Annotation Workshop (IPAW’14). 223–225.
  • 10.1145/1247480.1247646
  • Ian Cassar, Adrian Francalanza, Luca Aceto, and Anna Ingólfsdóttir. 2017. A Survey of Runtime Monitoring Instrumentation Techniques. In Proc. 2nd Int. Workshop on Pre- and Post-Deployment Verification Techniques. 15–28.
  • You-Wei Cheah and Beth Plale. 2014. Provenance quality assessment methodology and framework. Journal of Data and Information Quality (JDIQ) 5, 3 (2014), 1–20.
  • James Cheney, Laura Chiticariu, Wang-Chiew Tan, et al. 2009. Provenance in databases: Why, how, and where. Foundations and Trends® in Databases 1, 4 (2009), 379–474.
  • 10.1007/s00778-012-0302-x
  • Carlo A. Curino Hyun J. Moon and Carlo Zaniolo. 2008. Graceful Database Schema Evolution: The PRISM Workbench. In VLDB Endowment. 761–772. https://doi.org/10.14778/1453856.1453939
  • Susan B. Davidson and Juliana Freire. 2008. Provenance and scientific workflows: challenges and opportunities. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (MOD’08). ACM, New York, NY, USA, 1345–1350.
  • Michael de Jong. 2015. Zero-Downtime SQL Database Schema Evolution for Continuous Deployment. Ph. D. Dissertation. Delft University of Technology.
  • Flyway. [n. d.]. Available at https://flywaydb.org/. Last visited on September 4, 2024.
  • 10.1007/978-3-642-33999-8_9
  • Paul Groth and Luc Moreau (eds.). 2013. PROV-Overview. An Overview of the PROV Family of Documents. W3C Working Group Note NOTE-prov-overview-20130430. World Wide Web Consortium. Available at www.w3.org/TR/2013/NOTE-prov-overview-20130430/. Last visited on September 4 2024.
  • Kai Herrmann, Hannes Voigt, Andreas Behrend, and Wolfgang Lehner. 2015. CoDEL - A Relationally Complete Language for Database Evolution. In Proceedings of the 19th East European Conference of Advances in Databases and Information Systems -, ADBIS 2015 (Lecture Notes in Computer Science, Vol. 9282). Springer, 63–76.
  • Andrea Hillenbrand Uta Störl Shamil Nabiyev and Stefanie Scherzinger. 2021. MigCast in Monte Carlo: The Impact of Data Model Evolution in NoSQL Databases. arXiv:2104.11787 [cs.DB]
  • David A Holland, Margo I Seltzer, Uri Braun, and Kiran-Kumar Muniswamy-Reddy. 2008. PASSing the provenance challenge. Concurrency and Computation: Practice and Experience 20, 5 (2008), 531–540.
  • 10.1016/j.datak.2021.101932
  • Hannu Jaakkola and Bernhard Thalheim. 2020. Trends and future of data modelling. In 30th International Conference on Information Modelling and Knowledge Bases. 57–76.
  • 10.1007/BFb0053381
  • Haridimos Kondylakis and Dimitris Plexousakis. 2012. Ontology Evolution: Assisting Query Migration. In Conceptual Modeling, Paolo Atzeni, David Cheung, and Sudha Ram (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 331–344.
  • Ramnivas Laddad. 2009. Aspectj in action: enterprise AOP with spring applications. Manning Publications Co.
  • Yannis Lilis and Anthony Savidis. 2019. A Survey of Metaprogramming Languages. ACM Comput. Surv. 52, 6 (Oct. 2019).
  • Liquibase. [n. d.]. Available at https://liquibase.org/. Last visited on September 4, 2024.
  • 10.1109/ICDE51399.2021.00270
  • 10.1109/TSE.2017.2659745
  • Luc Moreau and Paolo Missier (eds.). 2013. PROV-DM: The PROV Data Model. W3C Recommendation REC-prov-dm-20130430. World Wide Web Consortium. Available at https://www.w3.org/TR/2013/REC-prov-dm-20130430/ Last visited on September 4 2024.
  • 10.14778/3402755.3402768
  • 10.1007/s10115-018-1164-3
  • 10.14778/3137765.3137789
  • João Felipe N. Pimentel, Paolo Missier, Leonardo Murta, and Vanessa Braganholo. 2018. Versioned-PROV: A PROV Extension to Support Mutable Data Entities. In Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, London, UK, July 9-10, 2018, Proceedings (LNCS, Vol. 11017). Springer, 87–100.
  • PROV-IDEA. 2024. Supplementary material. Online at https://zenodo.org/doi/10.5281/zenodo.6380742. Last accessed September 4, 2024.
  • PROV-IDEA Expander. 2024.. Online at https://github.com/PROV-IDEA/PROV-IDEA-Expander/. Last accessed September 4, 2024.
  • PROV-IDEA Provenance Inspector. 2024.. Online at https://github.com/PROV-IDEA/PROV-IDEA-Provenance-Inspector/. Last accessed September 4, 2024.
  • ProvStore. 2024. Provenance storage and distribution. Online at https://openprovenance.org/store/public/. Last accessed September 4, 2024.
  • ProvValidator. 2024.. Online at https://openprovenance.org/service/validator.html. Last accessed September 4, 2024.
  • QuantumDB GitHub repository. [n. d.]. Available at https://github.com/quantumdb/quantumdb. Last visited on September 4, 2024.
  • QuantumDB with PROV-IDEA. GitHub repository. 2022.. Online at https://github.com/PROV-IDEA/QuantumDBWithPROV-IDEA. Last accessed September 4, 2024.
  • Nick Geral Richter. 2021. Zero-downtime PostgreSQL database schema migrations in a continuous deployment environment at ING. Business Information Technology MSc (60025). Available at http://purl.utwente.nl/essays/88687. Last visited on September 4, 2024.
  • John F. Roddick. 2009. Schema Evolution. In Encyclopedia of Database Systems, Ling Liu and M. Tamer Özsu (Eds.). Springer, 2479–2481.
  • John F. Roddick. 2009. Schema Versioning. In Encyclopedia of Database Systems, Ling Liu and M. Tamer Özsu (Eds.). Springer, 2499–2502.
  • Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. 2005. A Survey of Data Provenance Techniques. Computer Science Department, Indiana University, Bloomington IN 612 (2005). Extended version of SIGMOD Record 2005. Available at: https://www.cs.indiana.edu/pub/techreports/TR618.pdf.
  • SPARQL 1.1 Query Language. [n. d.]. Available at https://www.w3.org/TR/sparql11-query/. Last visited on September 4, 2024.
  • 10.1007/s00607-021-01012-x
  • 10.1109/TSE.2020.2977016
  • Wang Chiew Tan. 2007. Provenance in Databases: Past, Current, and Future. IEEE Data Eng. Bull. 30, 4 (2007), 3–12.
  • Tilmann Zäschke, Stefania Leone, and Moira C Norrie. 2012. Optimising schema evolution operation sequences in object databases for data evolution. In Conceptual Modeling: 31st International Conference ER 2012, Florence, Italy, October 15-18, 2012. Proceedings 31. Springer, 369–382.
  • Yu Zhu. 2017. Towards Automated Online Schema Evolution. Ph. D. Dissertation. University of California, Berkeley.