Please enable JavaScript.
Coggle requires JavaScript to display documents.
Next Generation Sequencing - Coggle Diagram
Next Generation Sequencing
Why sequence entire genome?
Biodiversity and speciation
Diversity within a species
Biology of an organism
Molecular Biology Principles
4 nucleotide triphosphates for DNA
Pentose
Base
Triphosphate
Purines and pyrimidines
DNA extension through attack of 3'OH group of pentose sugar by 5'phosphate of the free nucleotide
Phosphodiester bond + diphosphate released
PCR
Thermostable DNA polymerase
3 stage process
1.Denaturation
Anneal primer
Extend new strand by incorporating DNPs
Sanger sequencing
Library preparation
Labour intensive
Up to 700bp per read
Reactions
For given template, similar to PCR except
Uses a single primer + polymerase to make ssDNA pieces
Includes regular nucleotides for extension but also dideoxynucleotides
Lack 3'OH - stop DNApol
Each reaction gives a chromatogram
~600-1000bp
Limitations
Expensive
Low throughput
Labour intensive
Low sensitivity
Detection of mutations in cancer needs to present in >30% cells
What is NGS?
Technologies enabling you to sequence hundreds of millions of short sequences in a single run
Parallel sequencing or single molecule
454 Technology
DNA is sheared into 300-800bp fragments, ends 'polished' by removing any unpaired bases at the end
Adapters added to each end. DNA made ss at this point.
One adapter contain biotin, which binds streptavidin-coated bead
Ratio of beads to DNA molecules is controlled so most beads only attach a single DNA molecule
Oil is added to beads + an emulsion created
PCR, each aqueous droplet forming its own micro-reactor
Each bead ends up coated with ~a million identical copies of original DNA
After emulsion PCR, oil is removed and beads put into a picotiter plate
2 more items...
Left behind in terms of cost + throughput
Roche no longer markets 454
Homopolymer (e.g. AAAA) is a big problem
Detects light flashes from one nucleotide added vs, detection of light flashes from many nucleotides added
A vs AA = 100% difference
AAAAA vs AAAAAA = 20% difference
Illumina
Massively parallel system
Attach different adapters on each end of fragmented DNA
Bind it to slide coated with complementary sequences for each primer
Slide contains millions of individual DNA spots
Spots visualised during sequencing run, using fluorescence of nucleotides added
Allows 'bridge PCR' producing small amount of DNA on the slide
Sequencing
P5 and P7 sequences inserted by amplification
Up to 2x300bp reads
From 1 million to 10 billion clusters (sequences)
Chemistry
Basic Sanger idea of dye termination of second strand of a DNA molecule
Starting with primer, new bases are added one at a time
Fluorescent tags to determine which base was added
Fluorescent tag blocks 3'OH of new nucleotide, so next base can only be added when tag is removed
Unlike pyrosequencing, never have to worry about how many adjacent bases of the same type are present
Cycle is repeated 50-100 times
Paired end indexing sequencing
Absolutely required for discovery of genome variation
Enables better coverage uniformity by allowing highly repetitive sequence to be anchored by unique paired read
Insertion + deletion events can be detected by searching for reads that have unusual distance between their pairs
Pacific BioSequencing (PacBio)
SMRT sequencing sample prep workflow
Fragment input DNA sample
Ends are repaired + hairpin structures ligated to each end
Size selection + purification to select fragments with adapters on both ends
SMRTbell templates go through sequencing reaction
Strand displacing DNApol opens SMRTbell into circular template + generates independent reads both reverse and forward of the same DNA molecule
Performance score increases linearly with no. times molecule is sequenced
2 sequencing modes
LS - Long sequencing reads
Large insert sizes (20kb - >100kb)
Generates one pass on each molecule sequenced
CCS - high quality sequencing reads
Circular consensus
Small insert sizes (<10kb)
Generates multiple passes on each molecule sequenced
Chemistry
Uses triphosphate linked fluorophore to reduce steric hindrance
Allows sequencing to happen in 'real time'
Zero mode waveguides (ZMWs) hold fluorescent signal
Can detect base incorporated despite background of other nucleotides
Sequencing
Diffusion loading onto ZMWs
Single polymerase + DNA molecule per ZMW
Incorporated fluorescent signal is held
Laser used to excite fluorophore + emitted fluorescence is measured
10bp/sec incorporated
Single molecule resolution in real time
Short waiting time for result + simple workflow
Generate basecalls in <1day
Polymerase speed \(\geq\)1 base per second
No amplification required
Bias not introduced
More uniform coverage
Direct observation
Distinguish heterogenous samples
Simultaneous kinetic measurements
Long reads
Identify repeats + structural variants
Less coverage required
Polymerase kinetics
Methylation can effect gene expression
Altered gene expression may be associated with malignant cellular transformation
Polymerase kinetic is the duration between 2 successive base incorporations
Altered in presence of modified bases
Can be detected as increased intervals between fluorescent pulses (interpulse duration/IPD)
Oxford Nanopore
Library prep
PCR barcoding
~4hrs
PCR-free barcoding
~1.5hrs
Rapid barcoding
~10 mins
Sequencing
Engineered CsG pore from
E.coli
Strand sequencing by passing DNA libraries through protein nanopores into synthetic polymer membrane
DNA fragments form a complex with a processive enzyme that forces ssDNA through the nanopore
1 nucleotide at a time
Potential is applied to membrane and disruption by the passing molecule is detected + decoded by software
Longest read reported to date is >1Mb
MinIon
800 reusable pores/flow cell
Up to 12 million reads
30Gb per run
Tested 48Kb read length
£1k per instrument
GridION
Run 5 flow cells simultaneously
150Gb per run
£100k per instrument
DNA shearing
Mechanical
Sonication
Highly controllable
Shears DNA to desired lengths (150bp-75Kb)
Multi-sample parallel-processing (96 samples)
G-Tube
Centrifugal force
Fragment sizes range from 6-20kb
Low throughput (12 samples)
Enzymatic
Rapid prep
90 min prep, only 15 mins on-hand time
Optimised for small genomes, PCR amplicons + plasmids
Innovative sample normalisation
No library quantification needed
Fastest time to results
DNA to analysed data within <8hrs MiSeq
Ultra low input
Only a single nanogram of DNA needed
Step 1: Tagmentation of template DNA
Step 2: PCR to add adapters + indices
Step 3: Cleanup + sequence
Transposomes + genomic DNA
Transposomes tag at space of ~300bp
Sample pooling
Locus specific primer F/R
Bind to target DNA to allow specific amplification
Index 1+2
8bp DNA sequence, unique for each sample
Allows reads to be assigned to samples after sequencing
P5/7 tail
Bind product to flow cell
Common practice to multiplex multiple samples together once barcoded
Then demultiplexed computationally
Advantages
Disadvantages
Reduced read no. per sample
Introduces normalisation step to minimise variation in read no. per sample
Reduces reagent cost
Quicker turnover time per sample