Abstract |
Environmental DNA (eDNA), i.e. genetic material obtained directly from environmental
samples (soil, sediment, water, etc.) without any obvious signs of biological
source material [1], and metabarcoding, a DNA barcoding method that allows the identification
of a mixture of organisms using universal PCR primers, attempt to turn the
page into the way biodiversity is perceived and monitored. Their combination is considered
to be a rapid method of biodiversity assessment. Furthermore, eDNA metabarcoding
is a holistic approach that, once standardized, allows for higher detection capacity
and at a lower cost compared to conventional methods of biodiversity assessment.
Even though metabarcoding is gaining ground as a fast non-invasive biodiversity
assessment technique, numerous issues of the method need to be addressed. The nonexistence
of a standardized protocol for this kind of analysis, the primer affinity bias
due to the use of universal primers and the different relative density of each sequence in
the sample, as well as the difficulty in determining species abundances, are all hurdles
of crucial importance for biodiversity studies.
This MSc thesis’ main focus is the troubleshooting of some of those drawbacks
on eDNA metabarcoding, by designing blocking primers to address "the primer issue"
(biases due to the use of generic primers, are defined as such) in the case of Fungi and
by building a standardized bioinformatic pipeline for its analysis.
Due to PCR biases, sequences of low abundance in the samples compared to sequences
of other OTUs, are not amplified efficiently. Fungi usually prevail in environmental
samples and, thus, they are responsible for a considerable and undesirable
noise in the PCR product. In order to overcome this problem, i.e. in order to prevent
PCR amplification of Fungi, blocking primers for the two marker genes (16S rRNA and
COI) were designed in silico. The predicted blocking primers were evaluated, also in silico;
in the case of 16S marker gene the results were promising as the blocking primer
pair did not prevent amplification of Bacteria. In the case of COI marker gene, the results
show that there is a small percentage of eukaryotic sequences that are blocked,
along with the desired blockage of fungal sequences. However, the designed blocking
1
primer pairs for both marker genes have the potential to act as such and should be
further tested in the laboratory.
Moreover, no standardised pipeline for the analysis of the millions of the amplicon
reads per experiment has been developed. Numerous tools for each step of the
analysis are provided, but there is no set of tools which are a priori evaluated and
benchmarked and thus can be used as the “golden standard” of each metabarcoding
analysis. To this end, the goal of this study was to build a complete and efficient
pipeline (entitled “P.E.M.A.”) for both 16S and COI marker genes. A programming language
for data processing pipelines, called Big Data Script (BDS), was used for the
design of P.E.M.A. whose input are raw sequence read files (.fastq format). Different
(Molecular) Operational Taxonomic Unit ((M)OTU) clustering algorithms and taxonomic
assignment approaches are provided for the user to choose, depending on the chosen
marker gene and the particularities of the dataset. P.E.M.A. was evaluated using two
datasets from published studies and the produced results were similar to those of the
studies. It is suggested that P.E.M.A. can be used for accurate eDNA metabarcoding
analysis and, hence, it can enhance the applicability of next-generation biodiversity
assessment studies.
|