From pairs of most similar sequences to phylogenetic best matches

P. Stadler, M. Geiß, D. Schaller, A. López Sánchez, M. González Laffitte, D. Valdivia, M. Hellmuth, M. Hernández Rosales. From pairs of most similar sequences to phylogenetic best matches. Algorithms for Molecular Biology, volume 15, DOI, 4, 2020.

  • Peter F. Stadler
  • Manuela Geiß
  • David Schaller
  • Alitzel López Sánchez
  • Marcos González Laffitte
  • Dulce I. Valdivia
  • Marc Hellmuth
  • Maribel Hernández Rosales
JournalAlgorithms for Molecular Biology

Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, however, whenever there are large lineage specific rate variations among paralogous genes. In practice, this introduces a high level of noise into the input data for best-hit-based orthology detection methods.