Expression of Recombinant Proteins in E.coli

Studying proteins

Why?

Important part of cell

Structure

Function

Transport, catalysis etc...

Drug design

Engineered enzymes

Biocatalysis

Antiretroviral drug therapies

HIV-1, HCV

Polyprotein cleavage by proteases essential step of virus maturation

Protease inhibitor

Inhibit virus maturation/replication

Development based on knowledge of proteins

Protein-based therapeutics

Hormones

Cytokines

Vaccines

Monoclonal Ab's

140 therapies approved

1/3 produced in E.coli

How?

Purification

NMR, X-ray, cryo-EM

In vivo studies

Labelling

Cloning + overexpression

High yield of purified protein

Possibility to mutate amino acid residues specifically

Engineering

Studying function

Cell expression systems

Bacterial

Most well developed

High protein expression

Reproducibility is robust

High cost

Short production cycles

Ab's + enzymes produced

No complex proteins

Protein-tag needed for post production recovery

Mammalian

Medium protein expression

High cost

Short production cycles

Some complex proteins

Human glycoprotein - vaccines produced

Post-production recovery by concentration of proteins from culture medium

Reproducible

Dominates bio-therapeutic market

Plants

High protein expression

Low-high cost

Seasonal production cycles - slow

Complex proteins produced

Ab's, vaccines, enzymes produced

Post-production recovery by co-extraction with commercial by-product or seed endosperm

Reproducible

Protein purification

Why?

Purification of a single protein from a mixture

DNA technologies

Genes of interest amplified by PCR

Cloned into expression vector

Study structure/function of a particular protein individually

Comparison of mutant proteins

Structural studies

x-ray, NMR etc...

General method

  1. Identify target gene
  1. Create PCR product with RE sites at either end
  1. Digest product + plasmid vector
  1. Ligate digested product + vector
  1. Insert plasmid vector with gene into E.coli for expression

Purification tags allow detection

High throughput genome sequencing

Protein can be produced in liquid culture e.g. E.coli

Plasmid vectors for protein production in E.coli

Requirements

Transcription

RNApol binds promoter + mRNA produced

Translation

How?

Insert target gene in between upstream regions for starting transcription/translation and terminating transcription

MCS at correct location

Repressor binding site for control of transcription

Arabad promoter (PBAD)

Vector contains

Origin of replication (ori)

Selection marker

Repressor gene (araC)

ATG start codon

MCS containing ATG start codon e.g. NcoI (CCATGG) or NdeI (CATATG)

Other features

C-terminal myc

His-tags

Ab recognition

Purification

Promoter systems

LacI

L-arabinose induction

Based on lac-operon

Anhydrotetracycline

Based on tet-repressor

arabad promoter system

All 3 based on repressor proteins

Block RNApol binding

Inducible by small molecule

Small molecule binds repressor + it dissociates from DNA binding site

RNApol can bind + transcribe

Autoinduction

lac + arabad

Actively repressed by glucose

Even in presence of inducer

Glucose metabolised as cells grow

Relieving expression

Lac operon

Genes

3 structural

lacZ

Encodes \(\beta\)-galactosidase

lacY

Encodes lactose permease

lacA

Encodes thiogalactoside transacetylase

3 functional

lacP

Promoter

lacI

Repressor

lacO

Operator

Only produced when lactose is present

Metabolism of lactose

In recombinant gene expression

Replace lacA/Z/Y cassette with gene of interest

Induce expression by addition of lactose or synthetic analogue (\(\beta\)-D-1-thiogalactopyranoside)

pET-series vectors

2-step expression system

T7 promoter binding site

Not standard RNApol binding site from E.coli

From phage artificially inserted into vector

More efficient

More mRNA

Expression

lacO sits after T7 promoter, lacI blocks

Gene cloned in plasmid MCS

T7 expression under control of lacI

Transcribed by T7

Protein only expressed in E.coli strains with T7 pol

DE3

Genetics carried out in strains not expressing endonucleases - prevent DNA degradation in storage

lacI repressor blocks at 2 points

lacO site on host genome repressing transcription of T7 RNApol gene

lacO site on pET-vector repressing transcription of recombinant gene

Leaky, incomplete repression

Problem if target gene is toxic

Some T7 still produced

Expression of T7 lysosome from pLysS plasmid blocks T7 RNApol

IPTG induction produces excess T7, overcoming block from T7 lysosome

Primer design

Typical MCS for cloning genes for overexpression

General rules

  1. Length \(\geq\)18nt
  1. Finish at 3' end in G or C
  1. T\(_m\) values of primer pair must be within 5\(^o\)C of each other

T\(_m\) = 69.3 + (0.41 x %GC) - (\(\frac{650}{primer length}\))

  1. Annealing temp. of PCR is 5\(^o\)C lower than lowest T\(_m\) value

Forward primer

Exactly the same as start of gene from ATG

Once at 18nt, if it does notend in G or C keep going until it does

Reverse primer

Reverse compliment of original DNA strand

Once at 18nt, if it does notend in G or C keep going until it does

e.g. original strand is AATGGCTA

Compliment is TTACCGAT

Then reverse TAGCCATT

After ~5 cycles the majority of template molecules include originally non-annealing basepairs

T\(_m\) changes

If T\(_m\)'s are too far apart, add basepairs to shorter primer

T\(_m\) is the temperature at which half of the primer is bound to the template molecule

Non-pET series vectors

pASKIBA63b+-NdeI

Resistant marker

Ampicillin

Based on Tet repressor

Inducer

Anhydrotetracycline

Expression strain

Any

Challenges of expressing recombinant proteins

Insolubility

Doesn't fold intro proper conformation inside E.coli

mRNA stability

Too stable

Can't be translated

Too instable

Degraded

Codon utilisation

Insoluble aggregates formed

Inclusion bodies

Possible causes

High % protein production

~50% rather than 1%

Combined with slow and/or incorrect folding

Large hydrophobic patches produced

Protein molecules aggregate via these patches(inclusion bodies)

Possible remedies

Reduce temp. of E.coli propagation during overexpression

Test range of temps (20,28 + 37\(^o\)C

Slows all processes including transcription + translation

Allowing folding to 'catch up'

Link recombinant protein to soluble fusion protein e.g. maltose binding protein