New balance indices and metrics for phylogenetic trees

  1. Rotger García, Lucía
Supervised by:
  1. Arnau Mir Torres Director
  2. Francesc Andreu Rosselló Llompart Co-director

Defence university: Universitat de les Illes Balears

Fecha de defensa: 13 July 2020

Committee:
  1. Josep Maria Miret Biosca Chair
  2. Maria de la Mercè Llabrés Segura Secretary
  3. Anna Rio Committee member

Type: Thesis

Abstract

Introduction The belief that the shape of a phylogenetic tree reflects the properties of the evolutionary processes underlying it has motivated the study of indices quantifying the graph-theoretical properties of phylogenetic trees and of metrics allowing for their comparison. The main contribution of this PhD Thesis is then the addition to the set of available techniques for the analysis and comparison of phylogenetic trees of the total cophenetic balance index, the family of Colless-like balance indices, and the family of cophenetic metrics. Research content The total cophenetic index turns out to be a good alternative to other popular balance indices like Sackin's and Colless' indices. This index is defined for multifurcating trees and it achieves its maximum value exactly at the combs and its minimum value among the multifurcating trees exactly at the star trees and among the bifurcating trees at the maximally balanced trees, being the first balance index published in the literature satisfying this last property. The Colless-like indices provide the first sound extension to multifurcating trees of the Colless index for bifurcating trees, in the sense that, when restricted to bifurcating trees, they give the classical Colless index up to a constant factor, and, for any given number of leaves, the only multifurcating trees that yield their minimum value are exactly the fully symmetric. These Colless-like indices depend on the choice of a dissimilarity function and of a size of rooted trees, and we show that this choice may affect how they measure the balance of a tree. Finally, we have defined the family of cophenetic metrics d_(φ,p), with p∈\{0\}∪[1,∞[, for phylogenetic trees with possibly nested taxa and weights on the arcs. Conclusion We have computed closed formulas for the expected value of the total cophenetic index under the Yule and the uniform models of bifurcating phylogenetic tree growth and a simple recurrence for its variance under the uniform model. As a by-product of this study, we have obtained a closed formula for the expected value of the Sackin index under the uniform model, a problem that remained open so far. In connection with the Colless-like indices, we introduce in this Thesis our R package “CollessLike”, available on the CRAN, that allows to perform goodness of fit tests of a phylogenetic tree with null model any α-γ model. On different types of spaces of non-weighted trees, we have computed their least non-zero value, the order of their diameter, and the neighborhood of any given tree. Moreover, we have obtained closed formulas for the expected value under the Yule and the uniform models of the square of the metric d_(φ,2).