GWAS

COMPLEX DISEASE

Multifactorial causes

Genetics
Environment

SNPs

Small additions from SNPs - complex disease often has lots of SNPs
No pedigree but can run in families (not really predictive)
THIS IS WHY OBESITY IS NOT A DISEASE IT IS A TRAIT
Locus explains A LOT OF VARIABLITY

Why drug treatment/response is very complicated - many causes - many systems - many things to fix with drugs

Model of liability - set point of tolerance?

ALSO WHAT ARE THE CONSEQUENCES?

Penetrance: Individual with 'risk' genotype having 'risk' phenotype
(/) Genotype -> (/) Phenotype ?

Phenocopy Rate: Probability an individual without the 'risk' genotype has the 'risk' phenotype.
(x) Genotype -> (/) Phenotype ?

Relative Risk: Risk of someone having disease if they have genotype A compared to genotype B

Someone developing disease compared to someone else with different exposure

Odds Ratio:

Someone developing disease based off one exposure alone

(minor allele is used as the control allele)

Genetic Mapping

Recognise trait (/disease) (generally additive-dominant)
Find genomic locus implicated in the trait
Find a gene implicated in the trait
Understand if the gene is predictive
5a. IF YES -> create genetic test to use in practice
5b. IF NO -> Learn pathway to help understand treatment and improve drugs/absorbtion/environment

DONE THROUGH TEST FOR ASSOCIATION

chi-square (is there a significant difference between cases and controls)
Difficult to test when there are arbitrary cut-offs
How do you determine independence? Is anything in biology independent?
Common variants tend to have small effects

Problems with this

Under-power (too small sample sizes)
Small effect sizes (low odds rations)
Publication Bias (more likely to publish positive findings
Differences in errors

GWAS?

What is a GWAS?

Tests SNP markers across the Genome
Tests INDIRECT ASSOCIATION

MAYBE WE HAVEN'T FOUND ALL THE VARIANCE FROM SNPS BECAUSE WE'RE NOT TESTING ALL OF THEM!?????

TAGGING

3 bil base-pairs - GWAS tests 1 mil (1 in 3000)
Genome roughly codes 1 SNP per 100-300bp (roughly 10 mil SNPs)
GWAS codes for roughly 1/10 SNPS
.
Use of Linkage Disequilibrium - some SNPs are highly correlated - find one, you find the other.

BUTTTTT reference SNPs are usually middle class Caucasians.. less info on Africans and Asians.

Annoying about Africans as there is less LD as there has been more recombination

AS GENOTYPING BECOMES CHEAPER - TAGGING IS LESS REQUIRED

If a 'causal' SNP has LD of 1 - how can you know it's causal?

HOW TO DO ONE

Sample collection
- Ethnicity v important
- Need a big sample size (power)
Data generation
- DNA extraction
- Genotyping / Imputation
Analysis
- Association Testing
- Logistic Regression
REPLICATION
.
Quality Assurance - Planning Experiment to minimise problems with data
Quality Control - Analysing the data to detect problems

IMPUTATION

when data is missing, you can use the haplotype from reference data to 'fill in the blanks'.
Algorithms can decide the most appropriate reference haplotype to use.
Based of the LD theory (haplotypes will be based down together etc0

CRITQUE

ALWAYS THINKING ABOUT THE NULL HYPOTHESIS

Bonferroni Adjustment = a / number of tests
- Probability of rejecting one null < a
- ONLY WORKS IF TESTS ARE INDEPENDENT (r=0)
False Discovery Rate
- Set the proportion of false positives to be <a
- More powerful
- More coherent interpretation

PRO

Lots of loci have been discovered

CON

Missing heritability
- Imperfect tagging
- Rare variants of medium effect
- Many common variants with tiny effect
Not good for prediction (association)
- Predict Odds Ratio (but they're still low)
Determining causality is very long-winded.
- Some things can be problems within regulatory regions

What's next?

There are now over 1000 traits which have been significantly associated with SNPs

FINE-MAPPING

Learning the difference between causal gene and causal variant
Defining different signals in GWAS and the number of variants within that signal.

Different populations- African populations have had more recombination events and therefore have less LD

Harder to find significant associations as there is less LD HOWEVER, these SNPs are more likely to be 'true' associations.
Fine mapping can help
.
Bye et al 2012 - oesophageal cancer
- High LD in chinese populations, little to no LD in black south africans
Liu et al 2017
- Glycaemic QTL
- 57k Eur, 20k Afr-Am
- Credible set reduced from 40 to 2 SNPs in 4kb

SUMMARY:

Success is dependent on extent of linkage disequilibrium and on effect size
Large sample size is still necessary
Using different ethnicities with lower LD can help narrow the search.

Low frequency, rare variants

GWAS screens for common variants (under common disease hypothesis)
But common variants may only account for part of heritabililty
Low freq (Minor allele frequency 1-5%) or rare (MAF <1%) may contribute to the missing heritability.
.
VERY VALUABLE FOR MECHANISM

WHAT ABOUT BENEFICIAL TRAITS

ANALYSIS:

Imputation
- Use reference panels to umpute rare variants
Custom array chips: immunochip/metabochip
Whole exome or genome sequencing
.
deCode - Iceland
Sardinia Project
T2D consortium - there are some monogenic forms
UKIBD - rare variant but small contribution

FUNCTIONAL ANALYSIS

Causality:

Don't know direction of association
Don't know the function of what has been tagged
Majority of SNPs found through GWAS are not in coding regions
Likely to affect gene function by altering gene expression in relevant tissues

Need to prioritise genes and variants which may be causal
Design experiment to prove causality

Variant could be in coding region, but:

May be in intronic region and effect splicing
May be in the promoter region and therefore effect the binding.
Some may be in intergenic space and have no effect

Identify all strongly associated variants by mining 1000 genomes data
High-density SNP genotyping across region in a large case/control panel (10k or more)
Use imputation from ref panels to infer more variants.
Identify strongly associated SNPs and add to the 'credible set'.
Use cell animal models to establish effect/expression/function of gene
Assess predictive value

IBD Huang 2017
BUT REMEMBER CAUSAL SNP MAY BE IN A REGULATOR REGION FURTHER AWAY

PATIENT CARE:

Disease behaviour
Pharacogenetics
Screen high risk groups - nutrition interventions
New drug targets

Polygenic risk score calculated for risk alleles BUT THIS DOES NOT INCLUDE PROTECTIVE SNPS