Next Generation Sequencing

Why sequence entire genome?

Biodiversity and speciation

Diversity within a species

Biology of an organism

Molecular Biology Principles

4 nucleotide triphosphates for DNA

DNA extension through attack of 3'OH group of pentose sugar by 5'phosphate of the free nucleotide

Pentose

Base

Triphosphate

Phosphodiester bond + diphosphate released

Purines and pyrimidines

PCR

Thermostable DNA polymerase

3 stage process

1.Denaturation

  1. Anneal primer
  1. Extend new strand by incorporating DNPs

Sanger sequencing

Library preparation

Labour intensive

Up to 700bp per read

Reactions

For given template, similar to PCR except

Each reaction gives a chromatogram

Uses a single primer + polymerase to make ssDNA pieces

Includes regular nucleotides for extension but also dideoxynucleotides

Lack 3'OH - stop DNApol

~600-1000bp

Limitations

Expensive

Low throughput

Labour intensive

Low sensitivity

Detection of mutations in cancer needs to present in >30% cells

What is NGS?

Technologies enabling you to sequence hundreds of millions of short sequences in a single run

Parallel sequencing or single molecule

454 Technology

DNA is sheared into 300-800bp fragments, ends 'polished' by removing any unpaired bases at the end

Adapters added to each end. DNA made ss at this point.

One adapter contain biotin, which binds streptavidin-coated bead

Ratio of beads to DNA molecules is controlled so most beads only attach a single DNA molecule

Oil is added to beads + an emulsion created

PCR, each aqueous droplet forming its own micro-reactor

Each bead ends up coated with ~a million identical copies of original DNA

After emulsion PCR, oil is removed and beads put into a picotiter plate

Each well is just big enough to hold a single bead

Pyrosequencing enzymes attached to much smaller beads which are then added to the plate

Plate is repeatedly washed with each of the 4 dNTPS (+ other reagents) in a repeating cycle

Plate is coupled into a fibre optic chip, CCD camera records light flashes from each well

Read lengths are typically 500bp, up to 1kb is possible

Left behind in terms of cost + throughput

Roche no longer markets 454

Homopolymer (e.g. AAAA) is a big problem

Detects light flashes from one nucleotide added vs, detection of light flashes from many nucleotides added

A vs AA = 100% difference

AAAAA vs AAAAAA = 20% difference

Illumina

Massively parallel system

Attach different adapters on each end of fragmented DNA

Bind it to slide coated with complementary sequences for each primer

Slide contains millions of individual DNA spots

Allows 'bridge PCR' producing small amount of DNA on the slide

Spots visualised during sequencing run, using fluorescence of nucleotides added

DNA shearing

Mechanical

Sonication

Highly controllable

Shears DNA to desired lengths (150bp-75Kb)

Multi-sample parallel-processing (96 samples)

G-Tube

Centrifugal force

Fragment sizes range from 6-20kb

Low throughput (12 samples)

Enzymatic

Rapid prep

90 min prep, only 15 mins on-hand time

Optimised for small genomes, PCR amplicons + plasmids

Innovative sample normalisation

No library quantification needed

Fastest time to results

DNA to analysed data within <8hrs MiSeq

Ultra low input

Only a single nanogram of DNA needed

Step 1: Tagmentation of template DNA

Step 2: PCR to add adapters + indices

Transposomes + genomic DNA

Transposomes tag at space of ~300bp

Step 3: Cleanup + sequence

Sequencing

P5 and P7 sequences inserted by amplification

Up to 2x300bp reads

From 1 million to 10 billion clusters (sequences)

Sample pooling

Locus specific primer F/R

Bind to target DNA to allow specific amplification

Index 1+2

8bp DNA sequence, unique for each sample

Allows reads to be assigned to samples after sequencing

P5/7 tail

Bind product to flow cell

Common practice to multiplex multiple samples together once barcoded

Then demultiplexed computationally

Advantages

Disadvantages

Reduces reagent cost

Quicker turnover time per sample

Reduced read no. per sample

Introduces normalisation step to minimise variation in read no. per sample

Chemistry

Basic Sanger idea of dye termination of second strand of a DNA molecule

Starting with primer, new bases are added one at a time

Fluorescent tags to determine which base was added

Unlike pyrosequencing, never have to worry about how many adjacent bases of the same type are present

Fluorescent tag blocks 3'OH of new nucleotide, so next base can only be added when tag is removed

Cycle is repeated 50-100 times

Paired end indexing sequencing

Absolutely required for discovery of genome variation

Enables better coverage uniformity by allowing highly repetitive sequence to be anchored by unique paired read

Insertion + deletion events can be detected by searching for reads that have unusual distance between their pairs

Pacific BioSequencing (PacBio)

SMRT sequencing sample prep workflow

Fragment input DNA sample

Ends are repaired + hairpin structures ligated to each end

Size selection + purification to select fragments with adapters on both ends

SMRTbell templates go through sequencing reaction

Strand displacing DNApol opens SMRTbell into circular template + generates independent reads both reverse and forward of the same DNA molecule

Performance score increases linearly with no. times molecule is sequenced

2 sequencing modes

LS - Long sequencing reads

CCS - high quality sequencing reads

Circular consensus

Large insert sizes (20kb - >100kb)

Generates one pass on each molecule sequenced

Small insert sizes (<10kb)

Generates multiple passes on each molecule sequenced

Chemistry

Uses triphosphate linked fluorophore to reduce steric hindrance

Allows sequencing to happen in 'real time'

Zero mode waveguides (ZMWs) hold fluorescent signal

Can detect base incorporated despite background of other nucleotides

Sequencing

Diffusion loading onto ZMWs

Single polymerase + DNA molecule per ZMW

Incorporated fluorescent signal is held

Laser used to excite fluorophore + emitted fluorescence is measured

10bp/sec incorporated

Polymerase kinetics

Methylation can effect gene expression

Altered gene expression may be associated with malignant cellular transformation

Polymerase kinetic is the duration between 2 successive base incorporations

Altered in presence of modified bases

Can be detected as increased intervals between fluorescent pulses (interpulse duration/IPD)

Single molecule resolution in real time

Short waiting time for result + simple workflow

Generate basecalls in <1day

Polymerase speed 1 base per second

No amplification required

Bias not introduced

More uniform coverage

Direct observation

Distinguish heterogenous samples

Simultaneous kinetic measurements

Long reads

Identify repeats + structural variants

Less coverage required

Oxford Nanopore

Library prep

PCR barcoding

PCR-free barcoding

Rapid barcoding

~10 mins

~4hrs

~1.5hrs

Sequencing

Engineered CsG pore from E.coli

Strand sequencing by passing DNA libraries through protein nanopores into synthetic polymer membrane

DNA fragments form a complex with a processive enzyme that forces ssDNA through the nanopore

1 nucleotide at a time

Potential is applied to membrane and disruption by the passing molecule is detected + decoded by software

Longest read reported to date is >1Mb

MinIon

800 reusable pores/flow cell

Up to 12 million reads

30Gb per run

Tested 48Kb read length

£1k per instrument

GridION

Run 5 flow cells simultaneously

150Gb per run

£100k per instrument