Qualitative comparison of four DNA metabarcode markers for species recovery sensitivity based on Hodophilus (fungi) sequence dataset
Vasilii Shapkin 1
Miroslav Caboň 1
Slavomir Adamčík 1
1 Institute of Botany, Plant Science and Biodiversity Centre, Slovak Academy of Sciences, Bratislava, Slovakia
High throughput next generation sequencing technologies allow identification of individual organisms in mixed environmental DNA (eDNA). To identify species-level molecular taxonomic operational units (MOTUs) of prokaryotes, well adopted and widely used is 16S nrDNA region. However, it is a big challenge to achieve the same quality of results for fungi, because the ITS nrDNA region as barcode shows high variation in length, secondary structure, nucleotide variation and representation. To avoid difficulties connected with ITS MOTU clustering, some studies propose alternative DNA barcode regions. This study aims to test barcoding performance of the most frequently used ITS2 region alongside with less popular D1 and D2 regions of 28S nrDNA and rpb2 region. The source of data is a multilocus dataset generated by Sanger sequencing in the previous phylogenetic study. The total of 476 sequences trimmed to amplicon length representing 29 Hodophilus species are used. Among them, 163 are ITS, 113 D1, 113 D2 and 87 rpb2 amplicon sequences. Frequently used VSEARCH clustering algorithm at four different similarity thresholds (-99.5, -99, -98, -97%) is compared with results of the RAxML phylogenetic analysis. To evaluate the ability of the DNA metabarcode markers to recover MOTUs corresponding to the phylogenetically defined Hodophilus species two parameters have been selected: MOTU performance (MP) is proportion of MOTUs that correspond to one species; Species performance (SP) is proportion of species that correspond to one or more MOTUs. The F1 score is defined as the harmonic mean of the species and MOTU performances: F1=2×(MP×SP)/(MP+SP). The best species performance with ability to cluster all individual species in one or more MOTU clusters showed ITS2 at 99.5% and 99% similarity thresholds and rpb2 at 97.5%. The best MOTU performance of 77% positive identifications showed D2 and rpb2, both at 99% similarity thresholds. The best overall result (F1=0.85) was achieved using rpb2 marker gene sequences clustered with 99% similarity threshold.