L20 - Genetic Variation


  1. Understand the 1000 genomes project and what data can be accessed
  2. Understand genetic variation may/may not have functional consequences
  3. Describe the various databases used to assess functional genetic variants
  4. Define the term population genetics and the forces that influence genetic change in populations
  5. Describe the effect of mutation on allele frequencies in populations
  6. Understand that mating patterns have a large effect on allele frequencies in populations*

Building Variant Maps for Gene-Finding

1. Human Genome Project

  • Good for consensus, not good for individual differences
    1

The Discovery of SNPs prompted the estabilishment of the SNP CONSORTIUM

2. SNP Consortium

  • Identify genetic variants
  • Anonymous with respect to
    traits

Goals

  • Identify 300 000 SNPs
    *Determine the Allele frequency of SNPs

After the discovery of SNPs, the need to characterise these gave rise to the HapMap Project

3. HapMap

  • Assay genetic variants
  • Verify polymorphisms, catalogue correlations amongst sites
  • Anonymous with respect to
    traits*

E.g. 'Demonstrates correlation => These two alleles are located far apart but theyre always located together
12

HapMap birthed the 1000 Genomes project


  • Variation is rare and thereby the ffects or significance of variation is difficult to determine


  • If the allele is rare across multiple different ethnicities suggests that the variation might be dysfunctional


  • Also delineates wich alleles are characteristic of certain ethncities and which may therefore be responsible for ethnic characteristics

Phase 1

  • Genotyping 1092 individuals
  • 14 populations – Europe, East Asia, sub-Saharan
    Africa, Americas
  • Whole-genome Sequenced
    • low coverage; 2-6 x fold and exome sequenced deep coverage; 50-100x

Phase II

  • 1721 individuals

Exome Aggregation Consortium (ExAC)

The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonise exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community.

Genome Aggregation Consortium (gnomAD)

Goal was to aggregate and harmonise both **exome and genome** sequencing data. Again it was a resource developed by an international
coalition of investigators


*125,748 exome sequences and 15,708 whole genome sequences from unrelated individuals sequenced as part of various disease-specific and population genetic studies.*

Functional Genetic Variation & Protein Polymorphism


  • Most genetic variation has a neutral effect on the phenotype but small fraction is harmful and/or beneficial?
  • Functional variants that are primarily studied are those that have an effect on gene function
  • Estimating how much of genome is functionally important is not straight forward
  • Even within the small target of sequences that are important for gene function, many small DNA changes may still have no effect (coding, regulatory, noncoding RNA).*

Functioal Change and Mutation


There are two extremes with regard to mutation and attendant functional change

  1. Changes in which virtually all the amino acids in the allele can be replaced while maintaining original function
  2. Single mutation may give rise to new function

OMIM (Online Mendelian Inheritance of Man


Concerned with mapping the muations in protein coding regions which result in monogenetic disorders

Taking Stock of Human Genetic Variation

  • 75% of DNA variations are due to SNP's

After studying 1092 individuals from 14 populations, 38 million SNPs were deleted (1 per 100 Nts)

  • Structural Variation accounts for 1/4 of mutational events, dominated by CNV

Goals

  1. Ultimately produce a fine scale genetic map (HapMap) which would serve as a common resource for all biomedical research
    • Genotype 600,000 – 1,000,000 SNPs genome-wide
    • Four populations: CEPH (Europe), Yoruban (Africa), Japanese/
      Chinese (Asian)*

Phase I

Genotype 1 million SNPs

Phase II

Genotype an additional 4.6 million SNPs

Phase 3 (Completed in 2015)

  • 2504 individuals
  • 26 populations
  • Both exome and whole-genome data
  1. Genome analysed by multiple different types of instruments


  2. Allows evolutionary dynamics to be nferred from this DNA data

SNPs cn easily be searched in the 1000 Genomes browser

60,706 unrelated individuals sequenced. Released 2014 and updated to 2016

Challenges of this; - research conducted by different labs and thereby quality control was difficult to ensure, as was ensuring that the data was presentable

Mutation Nomenclature...

"Encode"

Attempting to decipher the purpose of non-genic regions of DNA which are transcribed


  1. In 2007 the project was expanded to include
    similar assays in mice
  2. Project continues to create a comprehensive catalogue of gene elements and functional elements in the human and mouse genomes by:
  1. Measuring RNA expression levels
  1. Identifying proteins that interact with RNA
  1. DNA (e.g. modified histones, transcription factors & RNA-binding proteins)
  1. Measuring the levels of DNA methylation
  1. Identifying regions of DNA hypersensitivity