Please enable JavaScript.
Coggle requires JavaScript to display documents.
DATABASES AND STRUCTURAL DATA FORMATS - Coggle Diagram
DATABASES AND STRUCTURAL DATA FORMATS
SEQUENCE DATABASES
Uniprot
Most comprehensive catalog of proteins
Includes: general data, functional (GO), FASTA sequences, references...
Combined SwissProt y TrEMBL. Well connected with other databases.
NCBI
SEQUENCE MOTIF DATABASES
HMMER
For sequence homologous (remote homolog with the highest sensitivity) and perform sequence alignments. Uses Hidden Markov Profile Models.
DOMAIN DATABASES
Pfam
Large collection of protein domain families. Each family is represented by MSA and hidden Markov models.
Components: A) high quality, manually curated and B) automatically generated entries.
Interpro
Central database in collaboration among many of the protein databases curators.
Prosite
Prediction of protein functions based on identified sequences domains.
COGS
STRUCTURE DATABASES
PDB
3D structure of protein and nucleus acids obtained by X-ray crystallography or RMN by researches.
Main comprehensive DB for macro-molecular structures (ID: 4 letters)
CATH
Classifies domains into approx. 700 fold families.
More directed toward structural classification
SCOP
Description of structural and evolutionary relationships between all proteins.
More attention to evolutionary relationships
ALPHAFOLD
AI system (developed by DeepMind) prediction of protein structures with unparalleled accuracy and speed. Is free and open-source.
Benefits: accuracy and speed, comprehensive DB, solves long-standing challenges, broad applications, collaboration and open science.
Process: input sequence - MSA - pair representation - neural network predictions - iterative refinement - final structure (3D model)
Uses: drug discovery, disease understanding, biotechnology.
PROTEIN-PROTEIN INTERACTION DATABASES
String
METABOLIC PATHWAY DATABASES
KEGG
Integrates genomic, chemical and systemic functional information.