Functional Consequences of genomic variations

Functional Consequences of genomic variations

Splice variants - portions of protein sequence can go missing due to mRNA splicing errors.

Protein composition of membranes and organelles establishes their function

Primary protein (protein sequences) contains stretches of amino acids towards their end terminus that contain information about the subcellular location the protein needs to go to function

SNPs in these regions can cause mislocalised protein and hence functional disturbance

Conserved sequence motifs are more impacted by variations

Common protein sequences/domain shared by multiple species, etc.

Searching methods - Regular expression matching, SCANsite, ProSite, MOTIFScan

Examples

Nuclear import signal and export signal - to import and export proteins to and from the nucleus

Lecture 12, slide 11

ER targeting motifs - result in ER localisation of protein - 2 distinct families of motifs for the signal

Lecture 12, slide 12

Predicting protein function based on:

simple sequence features - motifs, signal sequences

secondary structure elements - alpha helix, beta sheets, etc.

3D protein domains

protein structure - can help group together proteins with no recognizable sequence similarity

multi protein interactions

Disturbances to pathway (leading to overexpression/inactivation)

Can sometimes determine localization - sub cellular location of protein

Mostly determines localization - it is complex

through AA sequences attached to the end terminus that act as signals

SNPs in this leads to mislocalized protein

Lecture 12, slide 8

Consider broader range of all organisms containing a motif, to understand the diversity available within it.

Protein translation pathway

First decision - protein translated in cytoplasm OR transported to ER through secretory pathway?

In ER, whether protein is to be secreted onto ER membrane OR ER lumen?

If membrane, cellular membrane OR plasma membrane?

Proteins go into other cellular organelles via cytoplasmic translation, NOT secretory pathway

Proteins are targetted to the secretory pathway by end terminus signal sequence. - picked up by the signal sequence recognition particle (SRP) floating around in the cytoplasm

If there is a particular signal at the end of the signal peptide, it would get translated into the lumen of the ER instead of the membrane

Lecture 12, slide 14

Second decision - Membrane residence or secretion in the ER

ER signal sequence can be cleavable signal peptide

Signal peptidase removes cleavable signal peptides; proteins are then secreted, or retained in the lumen or organelles in the secretory pathway

Lack of cleavable signal - signal anchor

signal anchors lack cleavage signal - so protein remains in the ER membrane during translation and becomes transmembrane protein

Protein associates with a membrane through:

interactions with other proteins

Post translational modifications

Beta barrels - secondary structures that cause proteins to remain in the membrane

alpha-helical transmembrane domains

Signal Anchors (or stop transfer sequences)

form helical membrane domains (~20 AAs) in the lipid bilayer

Incorporated into the lipid bilayer of the membrane as part of the translation process; and pass through the secretory pathway as a membrane protein

Why predict membrane organization of proteins?

subcellular location and membrane organization provide clues about protein function

Experiments to directly determine these can be time consuming and expensive

hierarchical protein sorting - essential to correctly predict the first decision

Use of Ensemble predictors

Hidden Markov Model

Artificial Neural Network

rules of physiochemistry

Outputs are normalized and combined using bootstrap aggregator

Prediction servers - SignalP (ANN and HMM), THMM (HMM)

Prediction of subcellular location varies in accuracy depending on the compartment

Protein Domains

MemO - Predicting membrane organization using combinations of features

Topology is hence established during translation by presence of signal peptides and signal anchors

Domain databases/models - PFAM, InterProScan,PROSITE, PRINTS, HAMAP

A domain is associated with a particular function, due to sequence/structural conservation

Can contain phylogenetically diverse data

Domain curation - collection of proteins that do and don't contain the function (including mutated versions)

Refinement of models - wet-lab based mutation studies by knocking out, studies that characterize function in a broad range of organisms

Domain architecture - collection of domains found in a protein sequence

Families of protein have similar domain architecture == similar functions

Differences in domain architecture = differences in protein function

Functional impact of variation in non-coding regions

When non-coding regions are conserved,

Indicator of evolutionary constraint

Used in discovery of protein binding sites

SNPs in intergenic regions can have phenotype impact - E.g. brown/blue eye