Please enable JavaScript.
Coggle requires JavaScript to display documents.
Protein Analysis - Coggle Diagram
Protein Analysis
Protein
Function
Prediction
Generally only way to determine function from
sequence
is to ask whether the
expressed protein
is similar to
any other proteins
for which functional info is already available
Can be done 2 ways
1) Determine what
known proteins
have a sequence
similar
to that of the expressed protein
2) Determine whether the expressed protein contains any
subsequences
or patterns of
conserved
residues that are associated w/ particular protein families or functions
2 ways in more detail
Comparing a protein sequence against a sequence database to determine function
Proteins w/
similar sequences
should have
similar functions
Most reliable method for determining protein function is to do a
database search
(as discussed in previous section)
Some functional features can be
predicted directly
from a protein seq e.g. hydrophobicity profiles can b used to predict
transmembrane helices
Hyrdophobicity profiles can be generated and displayed graphically using
ProtScale
at
ExPASy
allows user to calculate over 50 diff properties of proteins where a number (i.e. hydrophobicity) assigned to each AA
Input to program can either be a sequence pasted into sequence window or a SWISS-PROT accession code (only other parameter is size of window)
Predicting transmembrane domains
Prediction of transmembrane helices in seq's easiest to look at regions of protein containing a run of
20 hydrophobic residues
Algorithm TMBase, program named TMPRED (80-95% accurate in prediction location of helices and orientation)
Leader Sequences and Protein Localisation
P contain signals in seq that help their processing within cell e.g.
leader sequences
or signals which target proteins to specific compartments in cells
Program name
SignalP
(predicts
leader sequences
and
cleavage
sites in both Prok and Euk
Program name PSORT (analyses Prok or Euk seq and searches for
protein sorting signals
+ program reports back probability of protein being localised to diff compartments within cell) Accuracy 60%
Comparing protein sequence against
motif
databases to determine function
Often protein seq is too distantly related to an in databases to allow reliable ID to be made by sequence alignment
Alternatively seq alignment might find a match but to a protein of
no known function
in this case there is still a lot that can be done to predict function using bioinfo tool
Diff regions of proteins
evolve at different rates
some parts much retain certain patterns of residues for protein to function
If ID these conserved regions its possible to make predictions about the protein function
e.g. there are many short seq's that are diagnostic of the active site or binding region of a protein i.e. Metal binding domains (MBD) in the Cu uptake system
If the protein seq contains an MBD motif its possible to predict that one of its functions might be
binds to metals
. Presence of MBD motif doesn't mean that the protein binds metal ions but it provides an
experimentally testable hypothesis
as to protein function
Several bioinformatic resources have been made to build DB of conserved motifs and to search for instances of such motifs in seq
Pattern bases
Best known = PROSITE and contains ~2,000 diff families
PROSITE uses
highly conserved regions
to create a
signature
of multiple motifs for each domain family similar to finger prints
Typical entry in PROSITE would be : [ST]-x(2)-[DE]
i.e. a
S
erine or
T
hreonine followed by any 2 residues followed by a D or E which is the consensus sequence of a Casein kinase II phosphorylation site
Profile databases
More sensitive tool (PROSITE, PRINTS, BLOCKS, Pfam, InterPro)
InterPro Protein Archive
Central collection of family and domain descriptions linking different resources
Provides access to a range of diagnostic opportunities for a given query through a single interface i.e. provided an unified front end to the signature databases