Please enable JavaScript.
Coggle requires JavaScript to display documents.
Sequencing, DNA sequencing - Reading the order of nucleotides in a DNA -…
Sequencing
DNA sequencing - Reading the order of nucleotides in a DNA
Whole Genome Shotgun (WGS)
Genome is fragmented (from different random starting points) and amplified to produce many fragments.
Read sequences are produced when SEQUENCING machines read the fragments.
Long read sequencing
Oxford Nanopore - 40,000++ bp read length
DNA passing through Nanopore in an electrically resistant membrane, thus disturbing the current passing. Each nucleotide has a characteristic disturbance.
Can sequence both the strands as one contiguous read back to back
kmer approach
Lecture 3 slide 23
No need of amplification (no amplification bias), quality does not depend on read length.
Lower per base quality
Long reads can be around 40,000 bases or more
PacBio Single Molecule RealTime (SMRT) sequencing - 40,000 bp read length
DNA polymerase in each 20 zeptolitre reaction chambers/wells (Zero Mode Waveguides [ZMVs])
Free phospholinked nucleotides are attached with fluorophores. As bases get incorporated by the polymerase, the phosphate linked flourophore chains are severed off and normal DNA gets synthesised.
Each base has a colour and duration (indicating methylation, etc.). The colours are read and their order reveals the sequence of the DNA.
Lecture 3 slide 21,22
Can get 1 contig per chromosome
Celera Assembler Nu (CANU)
Short read sequencing
Illumina - 350 bp read length
Nucleotides tagged with fluorescence, DNA polymerase, primers attached to flow cells, adaptors attached to DNA fragments, ssDNA template attached to flow cell
LIBRARY PREPARATION
- Break down DNA and attach adaptors to fragments
CLUSTER AMPLIFICATION (NO PCR)
-
DNA fragments attach to templates through adaptors to initiate cycle. Through cycles of isothermal denaturation, bridge formation and cloning, cluster amplification takes place.
SEQUENCING
- After amplification, free nucleotides with flourescence move through the cell and get incorporated. The flow cell's colour pattern is recorded each time, where each colour represents a base.
1 more item...
High throughput, low cost, low error rates and consistency
Errors
Not detecting phasing - can't detect variants in each cluster with high accuracy because so many cycles happen in each cluster, and so many reactions to incorporate the same base.
Crosstalk - Overlap of signals and diffuse outputs
Not detecting GC DNA because they can form secondary structures with the polymerase.
Error Management
Use of astronomy software to locate random arrays on surface
Image alignment after each cycle
Better image processing for to enable phasing and increase output resolution
Removal of adaptors
Types
Hi Seq x10
Human genome - 150 bp read length
Mi Seq
Bacterial genomes - 350 bp read length
Lecture 4, slide 21,22
High per base quality
Short reads can be around 100-1000bp (shortest was 18 bp)
Sanger Sequencing - > 500 bp read length
Low throughput - low number of sequences read at a time
Reaction chamber containing ssDNA template, oligonucleotide primers, di deoxynucleotide with fluorescence, regular nucleotides, DNA polymerase.
Primers attach to template and free nucleotides get incorporated by polymerase, until a ddNTP is incorporated. ddNTP TERMINATES synthesis.
The resulting different sized fragments are passed through gel electrophoresis, where fragments are pulled through a gel by the application of electricity. The smaller fragments travel farthest and hence they are separated by size.
As they pass through a flouroscent detection unit, the sequence of DNA is read, by reading the end base of each fragment of consecutive sizes.
Human Genome Project
lecture 4, slide 17-20
Lecture 4
Reads are assembled to overlap -
ASSEMBLY
(2 methods)
Consensus method
- This process yields contiguous unbroken consensus sequences (from many overlapping reads (read depth)), known as contigs.
Ideally, we expect 1 contig per chromosome. But
due to repeating regions being longer than read length, low sequencing depth and read errors,
contigs can be broken and multiple, for a single chromosome/isolate
Output of the assembly process is hence a set of contigs.
Assembly is produced by tools such as
SPAdes, Velvet, MegaHit, Skesa, Unicycler
.
FASTA format, multi FASTA files
Lecture 3, slide 30, 31
FASTQ format, multi FASTQ files
Lecture 3, slide 33,34
Lecture 4, slide 4
Quality is encoded by letters or symbols for each base
Forward and reverse reads of paired reads in separate files
Quality Control
Phred Quality scores
- measure of confidence of a base being called/assigned correctly during sequence assembly
Lecture 4, slide 9
FastQC
- tool that displays data quality in a readset (average across reads for each position)
Drops as we go towards end of read - hence read length matters
When dealing with multiple read sets, MultiQC provides a summary of all FastQC reports
Sequences unrelated to sample that exist in the sample
ADAPTORS
- attached to ends of DNA fragments to facilitate attachment to primers and initiate synthesis - E.g. during PCR
1 more item...
ADDITIVE DNA
- samples can also be spiked with noise to test the sequencing machine
1 more item...
Filter by:
4 more items...
Tool for visualizing assemblies - Bandage
By Alignment
Kmer approach
- Reads are converted to overlapping Kmers and these are assembled using De Bruijn graph method
Isolate Assemblies
Reference Alignment of reads - Read Mapping
Read Alignment - SemiGlobal
OR
To find position of sequenced reads in a reference - Read mapping
De Novo Genome assembly of reads
Required for genome assembly of non-model organisms, novel sequences, novel splice variants (disease causing)