Cographs, orthologs, and the inference of species trees from paralogs
Peter F. Stadler
U East Anglia
U East Anglia
Minisymposium: GENERAL SESSION TALKS
Content: Phylogenomics heavily relies on well-curated sequence data sets that consist, for each gene, exclusively of 1:1-orthologs, i.e., of genes that have arisen through speciation events. Paralogs, which arose from duplication events, are treated as a dangerous nuisance that has to be detected and removed. Building upon recent advances in mathematical phylogenetics we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees provided orthologs and paralogs can be distinguished with a high degree of certainty. Starting from tree-free estimates of orthology, co-graph editing can sufficiently reduce the noise to by translated into constraints on the species trees. While the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees. The mathematical content of this work comprises (1) the characterization of graphs of orthologous genes as co-graphs, (2) an analysis of cograph editing that allows the reliable correction of empirical data to mathematically correct co-graphs, (3) the identification of a triple set in the corresponding co-trees that constrains the species tree, and (4) results on the decomposition of co-graphs that suggest that paralgous gene pairs can in many be safely included in classical phylogenetic reconstruction pipelines. The presentation will summarize results obtained by a larger group of authors in several publications as well recent unpublished results. Authors are listed in random order.