Preprint open access publication

SPIKEPIPE: A metagenomic pipeline for the accurate quantification of eukaryotic species occurrences and abundances using DNA barcodes or mitogenomes

bioRxiv, Cold Spring Harbor Laboratory,


DOI:10.1101/533737, Dimensions: pub.1111768825,


Roslin, Tomas (2) (3)
Yu, Douglas W. * (1) (5) (6)
Ovaskainen, Otso * (2) (7)

* Corresponding author



  1. (1) Kunming Institute of Zoology, grid.419010.d
  2. (2) University of Helsinki, grid.7737.4
  3. (3) Swedish University of Agricultural Sciences, grid.6341.0
  4. (4) Aarhus University, grid.7048.b, AU
  5. (5) Chinese Academy of Sciences, grid.9227.e
  6. (6) University of East Anglia, grid.8273.e
  7. (7) Norwegian University of Science and Technology, grid.5947.f


Abstract The accurate quantification of eukaryotic species abundances from bulk samples remains a key challenge for community description and environmental biomonitoring. We resolve this challenge by combining shotgun sequencing, mapping to reference DNA barcodes or to mitogenomes, and three correction factors: (1) a percent-coverage threshold to filter out false positives, (2) an internal-standard DNA spike-in to correct for stochasticity during sequencing, and (3) technical replicates to correct for stochasticity across sequencing runs. This pipeline achieves a strikingly high accuracy of intraspecific abundance estimates from samples of known composition (mapping to barcodes R 2 =0.93, mitogenomes R 2 =0.95) and a high repeatability across environmental-sample replicates (barcodes R 2 =0.94, mitogenomes R 2 =0.93). As proof of concept, we sequence arthropod samples from the High Arctic systematically collected over 17 years, detecting changes in species richness, abundance, and phenology using either barcodes or mitogenomes. SPIKEPIPE provides cost-efficient and reliable quantification of eukaryotic communities, with direct application to environmental biomonitoring. Statement of authorship NMS has been involved in running the BioBasis sampling program for more than twenty years. TR, NMS, DWY, and OO conceived the study and its design. TH led the work in generating all the DNA samples and YJ led the work in assembling and annotating the mitogenomes for the mitochondrial genome reference database. TH led the work in generating the mock communities and bulk samples, with contributions from YJ and JW. YJ and DWY developed the molecular and bioinformatic methods. OO led the modelling of the data. TR and OO wrote the first draft of the manuscript, and all authors contributed substantially to its further improvement. Data accessibility statement Should the manuscript be accepted, the data supporting the results will be archived in an appropriate public repository (Dryad), and the data DOI will be included at the end of the article. The bioinformatic and R scripts and associated data tables will also be made available on .


Research Categories

Main Subject Area

Fields of Research

Links & Metrics

NORA University Profiles

Aarhus University

Dimensions Citation Indicators

Times Cited: 2

Open Access Info

Green, Published