Preprint open access publication

Assembly-free and alignment-free sample identification using genome skims

bioRxiv, Cold Spring Harbor Laboratory,

2017

DOI:10.1101/230409, Dimensions: pub.1099635934,

Authors

* Corresponding author

Affiliations

Organisations

  1. (1) University of California, San Diego, grid.266100.3
  2. (2) University of East Anglia, grid.8273.e
  3. (3) University of Copenhagen, grid.5254.6, KU
  4. (4) Norwegian University of Science and Technology, grid.5947.f

Description

Abstract The ability to quickly and inexpensively describe taxonomic diversity is critical in this era of rapid climate and biodiversity changes. The currently preferred molecular technique, barcoding, has been very successful, but is based on short organelle markers. Recently, an alternative genome-skimming approach has been proposed: low-pass sequencing (100Mb – several Gb per sample) is applied to voucher and/or query samples, and marker genes and/or organelle genomes are recovered computationally. The current practice of genome-skimming discards the vast majority of the data because the low coverage of genome-skims prevents assembling the nuclear genomes. In contrast, we suggest using all unassembled reads directly, but existing methods poorly support this goal. We introduce a new alignment-free tool, Skmer, to estimate genomic distances between the query and each reference genome-skim using the k-mer decomposition of reads. We test Skmer on a large set of insect and bird genomes, sub-sampled to create genome-skims. Skmer shows great accuracy in estimating genomic distances, identifying the closest match in a reference dataset, and inferring the phylogeny. The software is publicly available on https://github.com/shahab-sarmashghi/Skmer.git

Research Categories

Main Subject Area

Fields of Research

Links & Metrics

NORA University Profiles

University of Copenhagen

Dimensions Citation Indicators

Times Cited: 2

Field Citation Ratio (FCR): 0.42

Open Access Info

Green, Published