Preprint open access publication

Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era

bioRxiv, Cold Spring Harbor Laboratory,

2019

DOI:10.1101/730531, Dimensions: pub.1120248067,

Affiliations

Organisations

  1. (1) University of California, Santa Cruz, grid.205975.c
  2. (2) Beijing Genomics Institute, grid.21155.32
  3. (3) University of Chinese Academy of Sciences, grid.410726.6
  4. (4) Kunming Institute of Zoology, grid.419010.d
  5. (5) University of Copenhagen, grid.5254.6, KU
  6. (6) Broad Institute, grid.66859.34
  7. (7) Uppsala University, grid.8993.b
  8. (8) University of Massachusetts Medical School, grid.168645.8
  9. (9) Howard Hughes Medical Institute, grid.413575.1
  10. (10) Rockefeller University, grid.134907.8
  11. (11) Chinese Academy of Sciences, grid.9227.e

Description

Abstract Cactus, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequence. We describe progressive extensions to Cactus that enable reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We show that Cactus is capable of scaling to hundreds of genomes and beyond by describing results from an alignment of over 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment yet created. Further, we show improvements in orthology resolution leading to downstream improvements in annotation.

Funders

Research Categories

Main Subject Area

Fields of Research

Links & Metrics

NORA University Profiles

University of Copenhagen

Dimensions Citation Indicators

Times Cited: 13

Open Access Info

Green, Published