Extending GelJ for interoperability: Filling the gap in the bioinformatics resources for population genetics analysis with dominant markers

  1. Domínguez, C. 1
  2. Heras, J. 1
  3. Mata, E. 1
  4. Pascual, V. 1
  5. Vázquez-Garcidueñas, M.S. 2
  6. Vázquez-Marrufo, G. 2
  1. 1 Universidad de La Rioja
    info

    Universidad de La Rioja

    Logroño, España

    ROR https://ror.org/0553yr311

  2. 2 Universidad Michoacana de San Nicolás de Hidalgo
    info

    Universidad Michoacana de San Nicolás de Hidalgo

    Morelia, México

    ROR https://ror.org/00z0kq074

Revista:
Computer Methods and Programs in Biomedicine

ISSN: 0169-2607

Año de publicación: 2017

Volumen: 140

Páginas: 69-76

Tipo: Artículo

DOI: 10.1016/J.CMPB.2016.12.001 SCOPUS: 2-s2.0-85007385521 WoS: WOS:000397074300009 GOOGLE SCHOLAR

Otras publicaciones en: Computer Methods and Programs in Biomedicine

Resumen

Background and objective: The manual transformation of DNA fingerprints of dominant markers into the input of tools for population genetics analysis is a time-consuming and error-prone task; especially when the researcher deals with a large number of samples. In addition, when the researcher needs to use several tools for population genetics analysis, the situation worsens due to the incompatibility of data-formats across tools. The goal of this work consists in automating, from banding patterns of gel images, the input-generation for the great diversity of tools devoted to population genetics analysis. Methods: After a thorough analysis of tools for population genetics analysis with dominant markers, and tools for working with phylogenetic trees; we have detected the input requirements of those systems. In the case of programs devoted to phylogenetic trees, the Newick and Nexus formats are widely employed; whereas, each population genetics analysis tool uses its own specific format. In order to handle such a diversity of formats in the latter case, we have developed a new XML format, called PopXML, that takes into account the variety of information required by each population genetics analysis tool. Moreover, the acquired knowledge has been incorporated into the pipeline of the GelJ system – a tool for analysing DNA fingerprint gel images – to reach our automatisation goal. Results: We have implemented, in the GelJ system, a pipeline that automatically generates, from gel banding patterns, the input of tools for population genetics analysis and phylogenetic trees. Such a pipeline has been employed to successfully generate, from thousands of banding patterns, the input of 29 population genetics analysis tools and 32 tools for managing phylogenetic trees. Conclusions: GelJ has become the first tool that fills the gap between gel image processing software and population genetics analysis with dominant markers, phylogenetic reconstruction, and tree editing software. This has been achieved by automating the process of generating the input for the latter software from gel banding patterns processed by GelJ. © 2016 Elsevier Ireland Ltd