Functional Consequences of genomic variations
Functional Consequences of genomic variations
Splice variants - portions of protein sequence can go missing due to mRNA splicing errors.
Protein composition of membranes and organelles establishes their function
Primary protein (protein sequences) contains stretches of amino acids towards their end terminus that contain information about the subcellular location the protein needs to go to function
SNPs in these regions can cause mislocalised protein and hence functional disturbance
Conserved sequence motifs are more impacted by variations
Common protein sequences/domain shared by multiple species, etc.
Searching methods - Regular expression matching, SCANsite, ProSite, MOTIFScan
Examples
Nuclear import signal and export signal - to import and export proteins to and from the nucleus
Lecture 12, slide 11
ER targeting motifs - result in ER localisation of protein - 2 distinct families of motifs for the signal
Lecture 12, slide 12
Predicting protein function based on:
simple sequence features - motifs, signal sequences
secondary structure elements - alpha helix, beta sheets, etc.
3D protein domains
protein structure - can help group together proteins with no recognizable sequence similarity
multi protein interactions
Disturbances to pathway (leading to overexpression/inactivation)
Can sometimes determine localization - sub cellular location of protein
Mostly determines localization - it is complex
through AA sequences attached to the end terminus that act as signals
SNPs in this leads to mislocalized protein
Lecture 12, slide 8
Consider broader range of all organisms containing a motif, to understand the diversity available within it.
Protein translation pathway
First decision - protein translated in cytoplasm OR transported to ER through secretory pathway?
In ER, whether protein is to be secreted onto ER membrane OR ER lumen?
If membrane, cellular membrane OR plasma membrane?
Proteins go into other cellular organelles via cytoplasmic translation, NOT secretory pathway
Proteins are targetted to the secretory pathway by end terminus signal sequence. - picked up by the signal sequence recognition particle (SRP) floating around in the cytoplasm
If there is a particular signal at the end of the signal peptide, it would get translated into the lumen of the ER instead of the membrane
Lecture 12, slide 14
Second decision - Membrane residence or secretion in the ER
ER signal sequence can be cleavable signal peptide
Signal peptidase removes cleavable signal peptides; proteins are then secreted, or retained in the lumen or organelles in the secretory pathway
Lack of cleavable signal - signal anchor
signal anchors lack cleavage signal - so protein remains in the ER membrane during translation and becomes transmembrane protein
Protein associates with a membrane through:
interactions with other proteins
Post translational modifications
Beta barrels - secondary structures that cause proteins to remain in the membrane
alpha-helical transmembrane domains
Signal Anchors (or stop transfer sequences)
form helical membrane domains (~20 AAs) in the lipid bilayer
Incorporated into the lipid bilayer of the membrane as part of the translation process; and pass through the secretory pathway as a membrane protein
Why predict membrane organization of proteins?
subcellular location and membrane organization provide clues about protein function
Experiments to directly determine these can be time consuming and expensive
hierarchical protein sorting - essential to correctly predict the first decision
Use of Ensemble predictors
Hidden Markov Model
Artificial Neural Network
rules of physiochemistry
Outputs are normalized and combined using bootstrap aggregator
Prediction servers - SignalP (ANN and HMM), THMM (HMM)
Prediction of subcellular location varies in accuracy depending on the compartment
Protein Domains
MemO - Predicting membrane organization using combinations of features
Topology is hence established during translation by presence of signal peptides and signal anchors
Domain databases/models - PFAM, InterProScan,PROSITE, PRINTS, HAMAP
A domain is associated with a particular function, due to sequence/structural conservation
Can contain phylogenetically diverse data
Domain curation - collection of proteins that do and don't contain the function (including mutated versions)
Refinement of models - wet-lab based mutation studies by knocking out, studies that characterize function in a broad range of organisms
Domain architecture - collection of domains found in a protein sequence
Families of protein have similar domain architecture == similar functions
Differences in domain architecture = differences in protein function
Functional impact of variation in non-coding regions
When non-coding regions are conserved,
Indicator of evolutionary constraint
Used in discovery of protein binding sites
SNPs in intergenic regions can have phenotype impact - E.g. brown/blue eye