Next Generation Sequencing
Why sequence entire genome?
Biodiversity and speciation
Diversity within a species
Biology of an organism
Molecular Biology Principles
4 nucleotide triphosphates for DNA
DNA extension through attack of 3'OH group of pentose sugar by 5'phosphate of the free nucleotide
Pentose
Base
Triphosphate
Phosphodiester bond + diphosphate released
Purines and pyrimidines
PCR
Thermostable DNA polymerase
3 stage process
1.Denaturation
- Anneal primer
- Extend new strand by incorporating DNPs
Sanger sequencing
Library preparation
Labour intensive
Up to 700bp per read
Reactions
For given template, similar to PCR except
Each reaction gives a chromatogram
Uses a single primer + polymerase to make ssDNA pieces
Includes regular nucleotides for extension but also dideoxynucleotides
Lack 3'OH - stop DNApol
~600-1000bp
Limitations
Expensive
Low throughput
Labour intensive
Low sensitivity
Detection of mutations in cancer needs to present in >30% cells
What is NGS?
Technologies enabling you to sequence hundreds of millions of short sequences in a single run
Parallel sequencing or single molecule
454 Technology
DNA is sheared into 300-800bp fragments, ends 'polished' by removing any unpaired bases at the end
Adapters added to each end. DNA made ss at this point.
One adapter contain biotin, which binds streptavidin-coated bead
Ratio of beads to DNA molecules is controlled so most beads only attach a single DNA molecule
Oil is added to beads + an emulsion created
PCR, each aqueous droplet forming its own micro-reactor
Each bead ends up coated with ~a million identical copies of original DNA
After emulsion PCR, oil is removed and beads put into a picotiter plate
Each well is just big enough to hold a single bead
Pyrosequencing enzymes attached to much smaller beads which are then added to the plate
Plate is repeatedly washed with each of the 4 dNTPS (+ other reagents) in a repeating cycle
Plate is coupled into a fibre optic chip, CCD camera records light flashes from each well
Read lengths are typically 500bp, up to 1kb is possible
Left behind in terms of cost + throughput
Roche no longer markets 454
Homopolymer (e.g. AAAA) is a big problem
Detects light flashes from one nucleotide added vs, detection of light flashes from many nucleotides added
A vs AA = 100% difference
AAAAA vs AAAAAA = 20% difference
Illumina
Massively parallel system
Attach different adapters on each end of fragmented DNA
Bind it to slide coated with complementary sequences for each primer
Slide contains millions of individual DNA spots
Allows 'bridge PCR' producing small amount of DNA on the slide
Spots visualised during sequencing run, using fluorescence of nucleotides added
DNA shearing
Mechanical
Sonication
Highly controllable
Shears DNA to desired lengths (150bp-75Kb)
Multi-sample parallel-processing (96 samples)
G-Tube
Centrifugal force
Fragment sizes range from 6-20kb
Low throughput (12 samples)
Enzymatic
Rapid prep
90 min prep, only 15 mins on-hand time
Optimised for small genomes, PCR amplicons + plasmids
Innovative sample normalisation
No library quantification needed
Fastest time to results
DNA to analysed data within <8hrs MiSeq
Ultra low input
Only a single nanogram of DNA needed
Step 1: Tagmentation of template DNA
Step 2: PCR to add adapters + indices
Transposomes + genomic DNA
Transposomes tag at space of ~300bp
Step 3: Cleanup + sequence
Sequencing
P5 and P7 sequences inserted by amplification
Up to 2x300bp reads
From 1 million to 10 billion clusters (sequences)
Sample pooling
Locus specific primer F/R
Bind to target DNA to allow specific amplification
Index 1+2
8bp DNA sequence, unique for each sample
Allows reads to be assigned to samples after sequencing
P5/7 tail
Bind product to flow cell
Common practice to multiplex multiple samples together once barcoded
Then demultiplexed computationally
Advantages
Disadvantages
Reduces reagent cost
Quicker turnover time per sample
Reduced read no. per sample
Introduces normalisation step to minimise variation in read no. per sample
Chemistry
Basic Sanger idea of dye termination of second strand of a DNA molecule
Starting with primer, new bases are added one at a time
Fluorescent tags to determine which base was added
Unlike pyrosequencing, never have to worry about how many adjacent bases of the same type are present
Fluorescent tag blocks 3'OH of new nucleotide, so next base can only be added when tag is removed
Cycle is repeated 50-100 times
Paired end indexing sequencing
Absolutely required for discovery of genome variation
Enables better coverage uniformity by allowing highly repetitive sequence to be anchored by unique paired read
Insertion + deletion events can be detected by searching for reads that have unusual distance between their pairs
Pacific BioSequencing (PacBio)
SMRT sequencing sample prep workflow
Fragment input DNA sample
Ends are repaired + hairpin structures ligated to each end
Size selection + purification to select fragments with adapters on both ends
SMRTbell templates go through sequencing reaction
Strand displacing DNApol opens SMRTbell into circular template + generates independent reads both reverse and forward of the same DNA molecule
Performance score increases linearly with no. times molecule is sequenced
2 sequencing modes
LS - Long sequencing reads
CCS - high quality sequencing reads
Circular consensus
Large insert sizes (20kb - >100kb)
Generates one pass on each molecule sequenced
Small insert sizes (<10kb)
Generates multiple passes on each molecule sequenced
Chemistry
Uses triphosphate linked fluorophore to reduce steric hindrance
Allows sequencing to happen in 'real time'
Zero mode waveguides (ZMWs) hold fluorescent signal
Can detect base incorporated despite background of other nucleotides
Sequencing
Diffusion loading onto ZMWs
Single polymerase + DNA molecule per ZMW
Incorporated fluorescent signal is held
Laser used to excite fluorophore + emitted fluorescence is measured
10bp/sec incorporated
Polymerase kinetics
Methylation can effect gene expression
Altered gene expression may be associated with malignant cellular transformation
Polymerase kinetic is the duration between 2 successive base incorporations
Altered in presence of modified bases
Can be detected as increased intervals between fluorescent pulses (interpulse duration/IPD)
Single molecule resolution in real time
Short waiting time for result + simple workflow
Generate basecalls in <1day
Polymerase speed ≥1 base per second
No amplification required
Bias not introduced
More uniform coverage
Direct observation
Distinguish heterogenous samples
Simultaneous kinetic measurements
Long reads
Identify repeats + structural variants
Less coverage required
Oxford Nanopore
Library prep
PCR barcoding
PCR-free barcoding
Rapid barcoding
~10 mins
~4hrs
~1.5hrs
Sequencing
Engineered CsG pore from E.coli
Strand sequencing by passing DNA libraries through protein nanopores into synthetic polymer membrane
DNA fragments form a complex with a processive enzyme that forces ssDNA through the nanopore
1 nucleotide at a time
Potential is applied to membrane and disruption by the passing molecule is detected + decoded by software
Longest read reported to date is >1Mb
MinIon
800 reusable pores/flow cell
Up to 12 million reads
30Gb per run
Tested 48Kb read length
£1k per instrument
GridION
Run 5 flow cells simultaneously
150Gb per run
£100k per instrument