Please enable JavaScript.

Coggle requires JavaScript to display documents.

BIOINFORMATICS AND PROTEINS - Coggle Diagram

- - - - Speciation: evolution of a new gene that is genetically independent of the ancestral gene
      - Homolog: gene related to a second one by a common ancestral gene by specification.
      - Ortholog: genes in different species that come from a common ancestor and retain the same function.
      - Paralog: genes related by duplication of a common ancestor that evolves new functions.
      - Convergent evolution: similar properties in genes of different genetic lineages.
    - - Neutral theory (Kimura): most evolutionary changes at the molecular level are the result of random genetic drift of mutant alleles that are selectively neutral. Neutral = not affect to survive or reproduce. At molecular level.
      - Natural selection (Darwin): traits that enhance survival and reproduction become more common in successive generations of a population. It drives to an adaptive evolution. At phenotypic level.
      - Both coexists, are complementary theories.
  - - - Find a template: BLAST to search for homologous protein sequences in the PDB
      - Make an alignment: align the target with the template by MSA tools to identify structurally conserved regions.
        
        Sequence alignments of proteins are much more complicated but are more informative because they involve 20 degrees of freedom
        
        BLOSUM 62: specific substitution matrix that helps in identifying homologous sequences by scoring alignments based on likelihood of one aa being substituted for another. 62% strikes balance between sensitivity and specificity.
        
        Multiple Sequence Alignments
        
        Local alignment: regions of similarity in larges sequences. Useful or finding conserved domains, motifs or functional sites.
        
        Global alignment: sequences of similar length to analyze overall similarity. To compare entire sequences to understand their overall similarity and evolutionary relationships.
        
        Clustal Algorithm
        
        Pairwise comparison
        
        Guide tree creation: UPGMA or NJ
        
        Final alignment: a MSA that reflects the evolutionary relationships and similarities between the sequences.
      - Create a homology model
        
        Backbone modeling: backbone based on structurally conserved regions
        
        Loop modeling: based on the template and fragment libraries, knowledge-based potentials and constraints from the aligned structure.
        
        Side chain modeling: rotamer optimization to ensure the side chains are in the most favorable conformations.
        
        Energy minimization: refine the model to minimize energy and improve accuracy. Optimíze the constraints using molecular dynamics with simulated annealing.
      - Validate your structure: quality of the model, compare models generated by different prediction methods, reliability by MQAP, amount of 2 structures, ensure the model resembles true protein structures.
    - - The accuracy is proportional to the similarity in primary sequences
        
        <25: no homology enough
        
        25-50: accuracy limitation factor
        
        50-75: the problem is the quality of the model
        
        75: speed of modeling
        
        As higher homology, higher accuracy
  - - - A score is needed to evaluate how well a sequence fits onto a structure (potential).
  - - - Molecular dynamics: proteins in water that naturally fold into the native structure. Problems: atoms, huge number of time steps.
      - Minimal energy: the folded form is the minimal energy conformation of a protein.
  - - - Experimental data: improved accuracy, validation and refinement.
      - Co-evolutionary information: MSA + direct coupling analysis. Contant prediction and functional insights
      - Combining both can improve the accuracy, especially for regions where experimental data are sparse.
- - - - Mutation impact analysis: PolyPhen-2, SIFT and PROVEAN
      - Active site prediction: residues involved in ligand binding
      - Structural annotations: 2 structure, disorder prediction, surface accessibility.
    - - Homology-based function inference
      - Gene Ontology
      - Critical assessment of function annotation (CAFA)
    - - Protein-Protein interactions (PPI)
        
        Docking
        
        Sequence-based predictions
        
        Interface prediction
        
        Critical assessment of predicted interactions (CAPRI)
      - Function prediction
        
        Sequence-based approaches: homology and motifs (BLAST)
        
        Structure-based approaches: 3D structures and docking (MODELLER)
        
        Motif-based approaches
        
        Guilt-by-association approaches: leveraging AL and large datasets
        
        Integrative approaches: combining multiple data sources