DISTANCE-BASED PHYLOGENETIC ALGORITHMS: NEW INSIGHTS AND APPLICATIONS
Abstract
Phylogenetic methods have recently been rediscovered in several interesting areas among which immunodynamics, epidemiology and many branches of evolutionary dynamics. In many interesting cases the reconstruction of a correct phylogeny is blurred by high mutation rates and/or horizontal transfer events. As a consequence, a divergence arises between the true evolutionary distances and the distances between pairs of taxa as inferred from the available data, making the phylogenetic reconstruction a challenging problem. Mathematically this divergence translates in the non-additivity of the actual distances between taxa and the quest for new algorithms able to efficiently cope with these effects is wide open. In distance-based reconstruction methods, two properties of additive distances were extensively exploited as antagonist criteria to drive phylogeny reconstruction: on the one hand a local property of quartets, i.e. sets of four taxa in a tree, the four-point condition; on the other hand, a recently proposed formula that allows to write the tree length as a function of the distances between taxa, the Pauplin's formula. A deeper comprehension of the effects of the non-additivity on the inspiring principles of the existing reconstruction algorithms is thus of paramount importance. In this paper we present a comparative analysis of the performances of the most important distance-based phylogenetic algorithms. We focus in particular on the dependence of their performances on two main sources of non-additivity: back-mutation processes and horizontal transfer processes. The comparison is carried out in the framework of a set of generative algorithms for phylogenies that incorporate non-additivity in a tunable way.