Please enable JavaScript.
Coggle requires JavaScript to display documents.
16s rRNA gene metagenomics, 16S rRNA gene metagenomics - Coggle Diagram
16s rRNA gene metagenomics
16S rRNA gene metagenomics
Metagenomics
- study of genetic material recovered directly from environmental samples.
Sequencing
single gene
- like 16s rRNA (like a fingerprint)
From recovered sample,
extract total DNA
- gives a snapshot of all the bacteria present
Preserve and handle sample properly -
avoid contamination and overgrowth
PCR amplification
Sequencing of the resulting amplicons
Taxonomic classification of the resulting reads
using a database
16s gene databases -
Greengenes, RDP, Silva (good for environmental)
Give a
taxonomic classification
for each ASV/ASV bin using a database.
1 more item...
Visualize using Phylogenetic trees
Need thermostable taq polymerase, primers (conserved regions), nucleotides and
target DNA (variable regions from region of interest within gene)
If the primers belong to purely conserved regions, we can ideally ensure that all the pieces of DNA between the primers are amplified.
Region of interest, i.e., the hyper variable regions, chosen
depend on what kind of organisms
we find in a sample and intend to classify between.
Lecture 18, slide 20
Lecture 18, slide 8-9
E.g. of analysis systems/workflows -
Qiime2, Mothur, DADA2
DADA2
workflow
First
demultiplex the reads
= index the reads (unique index for each sample's sequence)
Prepare the paired reads
- each read pair is a consensus sequence (when the region of interest is comparable in size to the combined size of a read pair)
Remove
adaptors and primers
,
Match the reads with respective primers/adaptors and truncate the matched portions to remove the primer/adaptor portions of the reads
filter on
quality
and remove incomplete reads,
remove copies of reads to generate a
count for each unique
sequence
Each consensus sequence is an Amplicon Sequence Variant
- they all have some or the other difference (because we sequenced the variable regions of the core genome)
ASV binning
- In 16S metagenomics approaches,
OTUs
are cluster of similar sequence variants of the 16S rDNA marker gene sequence.
Each of these clusters is intended to represent a taxonomic unit of a bacteria species or genus depending on the sequence similarity threshold.
Typically, OTU cluster are defined by a 97% identity threshold of the 16S gene sequences to distinguish bacteria at the genus level.
Species separation requires a higher threshold of 98% or 99% sequence identity, or even better the use of exact sequence variants instead of OTU cluster
Differentiate between
read errors
and low abundance actual reads (ASVs)
Remove
chimeric sequences
- formed by mixing of pairs belonging to different isolates - it is an amplification artefact
Multiplexed
- all the reads alongwith their indices are arranged in a file (fastq) without any grouping.
Diagram in metagenomics guest zoom lect., slide 15
Demultiplexed
- when indices are removed and each sample gets a separate file (fastq).
Also, forward file and reverse file
Qiime2 workflow
in guest lect. - similar to DADA2
Cleaning manipulating ASVs using DADA2
Sequencing
whole genome
- whole metagenome
Fragment DNA followed by
WGS
E.g. of analysis systems/workflows -
Kraken, MetaPhlan
Kraken
- tool used to assign taxonomic labels to
short reads
Main limitation
- either provides a high level classification or no classification for
unknown sequences
not in the database
Example workflow - last answer of assignment 4
Limitations of 16S metagenomics methods
16S gene is
highly conserved at species level
-
very few features
and hence hard to classify at that level and possible only at higher levels (e.g. genus)
Database 'noise'
Evolving taxonomic classification not regularly updated in database
Politics about naming organisms
Poorly curated
Novel sequences not present in database
Outdated representative sequence (reference) of species
No sequencing tech is good enough to produce both
high enough read length as well as high throughput at low cost
Need a tradeoff between
extent of gene coverage Vs extent of bacterial community coverage
Avoid rarefaction - If more taxa are present within a community sample, greater read depth is needed to classify the greater diversity
Lecture 18, slide 18
Lecture 18, slide 17
Hard to have primers that are absolutely conserved across the sample - lead to
amplification bias
- E.g.
Bifidobacterium has SNPs in one of its conserved regions leading to 4 different primers possible
Lecture 18, slide 22,23,24
Tough bacteria that cannot be lysed -
failure to extract the DNA
Sample can be sensitive to
contamination
by reagent, workspace, etc.
Not
every bacterium within the sample would have the
same number of 16S gene copies
- this would in turn affect amplification and read depth while sequencing
Low resolving power
(classification power) at lower taxonomic levels - good for higher levels
Quality metric to analyse MAGs - N50
- Workshop 8 end