GWAS

COMPLEX DISEASE

Multifactorial causes

  • Genetics
  • Environment

SNPs

  • Small additions from SNPs - complex disease often has lots of SNPs
  • No pedigree but can run in families (not really predictive)
  • THIS IS WHY OBESITY IS NOT A DISEASE IT IS A TRAIT
  • Locus explains A LOT OF VARIABLITY

Why drug treatment/response is very complicated - many causes - many systems - many things to fix with drugs


Model of liability - set point of tolerance?


ALSO WHAT ARE THE CONSEQUENCES?

Penetrance: Individual with 'risk' genotype having 'risk' phenotype
(/) Genotype -> (/) Phenotype ?


Phenocopy Rate: Probability an individual without the 'risk' genotype has the 'risk' phenotype.
(x) Genotype -> (/) Phenotype ?


Relative Risk: Risk of someone having disease if they have genotype A compared to genotype B

  • Someone developing disease compared to someone else with different exposure

Odds Ratio:

  • Someone developing disease based off one exposure alone

(minor allele is used as the control allele)

Genetic Mapping

  1. Recognise trait (/disease) (generally additive-dominant)
  2. Find genomic locus implicated in the trait
  3. Find a gene implicated in the trait
  4. Understand if the gene is predictive
    5a. IF YES -> create genetic test to use in practice
    5b. IF NO -> Learn pathway to help understand treatment and improve drugs/absorbtion/environment


DONE THROUGH TEST FOR ASSOCIATION

  • chi-square (is there a significant difference between cases and controls)
  • Difficult to test when there are arbitrary cut-offs
  • How do you determine independence? Is anything in biology independent?
  • Common variants tend to have small effects

Problems with this

  • Under-power (too small sample sizes)
  • Small effect sizes (low odds rations)
  • Publication Bias (more likely to publish positive findings
  • Differences in errors

GWAS?

What is a GWAS?


  • Tests SNP markers across the Genome
  • Tests INDIRECT ASSOCIATION


MAYBE WE HAVEN'T FOUND ALL THE VARIANCE FROM SNPS BECAUSE WE'RE NOT TESTING ALL OF THEM!?????

TAGGING


  • 3 bil base-pairs - GWAS tests 1 mil (1 in 3000)
  • Genome roughly codes 1 SNP per 100-300bp (roughly 10 mil SNPs)
  • GWAS codes for roughly 1/10 SNPS
    .
  • Use of Linkage Disequilibrium - some SNPs are highly correlated - find one, you find the other.

BUTTTTT reference SNPs are usually middle class Caucasians.. less info on Africans and Asians.


Annoying about Africans as there is less LD as there has been more recombination


AS GENOTYPING BECOMES CHEAPER - TAGGING IS LESS REQUIRED


If a 'causal' SNP has LD of 1 - how can you know it's causal?

HOW TO DO ONE


  • Sample collection
    • Ethnicity v important
    • Need a big sample size (power)
  • Data generation
    • DNA extraction
    • Genotyping / Imputation
  • Analysis
    • Association Testing
    • Logistic Regression
  • REPLICATION
    .
  • Quality Assurance - Planning Experiment to minimise problems with data
  • Quality Control - Analysing the data to detect problems

IMPUTATION

  • when data is missing, you can use the haplotype from reference data to 'fill in the blanks'.
  • Algorithms can decide the most appropriate reference haplotype to use.
  • Based of the LD theory (haplotypes will be based down together etc0

CRITQUE

ALWAYS THINKING ABOUT THE NULL HYPOTHESIS

  • Bonferroni Adjustment = a / number of tests
    • Probability of rejecting one null < a
    • ONLY WORKS IF TESTS ARE INDEPENDENT (r=0)
  • False Discovery Rate
    • Set the proportion of false positives to be <a
    • More powerful
    • More coherent interpretation

PRO

  • Lots of loci have been discovered

CON

  • Missing heritability
    • Imperfect tagging
    • Rare variants of medium effect
    • Many common variants with tiny effect
  • Not good for prediction (association)
    • Predict Odds Ratio (but they're still low)
  • Determining causality is very long-winded.
    • Some things can be problems within regulatory regions

What's next?


There are now over 1000 traits which have been significantly associated with SNPs

FINE-MAPPING

  • Learning the difference between causal gene and causal variant
  • Defining different signals in GWAS and the number of variants within that signal.

Different populations- African populations have had more recombination events and therefore have less LD


  • Harder to find significant associations as there is less LD HOWEVER, these SNPs are more likely to be 'true' associations.
  • Fine mapping can help
    .
  • Bye et al 2012 - oesophageal cancer
    • High LD in chinese populations, little to no LD in black south africans
  • Liu et al 2017
    • Glycaemic QTL
    • 57k Eur, 20k Afr-Am
    • Credible set reduced from 40 to 2 SNPs in 4kb

SUMMARY:

  • Success is dependent on extent of linkage disequilibrium and on effect size
  • Large sample size is still necessary
  • Using different ethnicities with lower LD can help narrow the search.

Low frequency, rare variants

  • GWAS screens for common variants (under common disease hypothesis)
  • But common variants may only account for part of heritabililty
  • Low freq (Minor allele frequency 1-5%) or rare (MAF <1%) may contribute to the missing heritability.
    .
  • VERY VALUABLE FOR MECHANISM

WHAT ABOUT BENEFICIAL TRAITS

ANALYSIS:

  • Imputation
    • Use reference panels to umpute rare variants
  • Custom array chips: immunochip/metabochip
  • Whole exome or genome sequencing
    .
  • deCode - Iceland
  • Sardinia Project
  • T2D consortium - there are some monogenic forms
  • UKIBD - rare variant but small contribution

FUNCTIONAL ANALYSIS

Causality:

  • Don't know direction of association
  • Don't know the function of what has been tagged
    Majority of SNPs found through GWAS are not in coding regions
  • Likely to affect gene function by altering gene expression in relevant tissues
  1. Need to prioritise genes and variants which may be causal
  2. Design experiment to prove causality

Variant could be in coding region, but:

  • May be in intronic region and effect splicing
  • May be in the promoter region and therefore effect the binding.
  • Some may be in intergenic space and have no effect
  1. Identify all strongly associated variants by mining 1000 genomes data
  2. High-density SNP genotyping across region in a large case/control panel (10k or more)
  3. Use imputation from ref panels to infer more variants.
  4. Identify strongly associated SNPs and add to the 'credible set'.
  5. Use cell animal models to establish effect/expression/function of gene
  6. Assess predictive value
  • IBD Huang 2017
  • BUT REMEMBER CAUSAL SNP MAY BE IN A REGULATOR REGION FURTHER AWAY

PATIENT CARE:

  • Disease behaviour
  • Pharacogenetics
  • Screen high risk groups - nutrition interventions
  • New drug targets

Polygenic risk score calculated for risk alleles BUT THIS DOES NOT INCLUDE PROTECTIVE SNPS