GWAS (What's next?
There are now over 1000 traits which have been…
- Recognise trait (/disease) (generally additive-dominant)
- Find genomic locus implicated in the trait
- Find a gene implicated in the trait
- Understand if the gene is predictive
5a. IF YES -> create genetic test to use in practice
5b. IF NO -> Learn pathway to help understand treatment and improve drugs/absorbtion/environment
DONE THROUGH TEST FOR ASSOCIATION
- chi-square (is there a significant difference between cases and controls)
- Difficult to test when there are arbitrary cut-offs
- How do you determine independence? Is anything in biology independent?
- Common variants tend to have small effects
Problems with this
- Under-power (too small sample sizes)
- Small effect sizes (low odds rations)
- Publication Bias (more likely to publish positive findings
- Differences in errors
What is a GWAS?
MAYBE WE HAVEN'T FOUND ALL THE VARIANCE FROM SNPS BECAUSE WE'RE NOT TESTING ALL OF THEM!?????
- Tests SNP markers across the Genome
- Tests INDIRECT ASSOCIATION
- 3 bil base-pairs - GWAS tests 1 mil (1 in 3000)
- Genome roughly codes 1 SNP per 100-300bp (roughly 10 mil SNPs)
- GWAS codes for roughly 1/10 SNPS
- Use of Linkage Disequilibrium - some SNPs are highly correlated - find one, you find the other.
BUTTTTT reference SNPs are usually middle class Caucasians.. less info on Africans and Asians.
Annoying about Africans as there is less LD as there has been more recombination
AS GENOTYPING BECOMES CHEAPER - TAGGING IS LESS REQUIRED
If a 'causal' SNP has LD of 1 - how can you know it's causal?
- when data is missing, you can use the haplotype from reference data to 'fill in the blanks'.
- Algorithms can decide the most appropriate reference haplotype to use.
- Based of the LD theory (haplotypes will be based down together etc0
HOW TO DO ONE
- Sample collection
- Ethnicity v important
- Need a big sample size (power)
- Data generation
- DNA extraction
- Genotyping / Imputation
- Association Testing
- Logistic Regression
- Quality Assurance - Planning Experiment to minimise problems with data
- Quality Control - Analysing the data to detect problems
- Bonferroni Adjustment = a / number of tests
- Probability of rejecting one null < a
- ONLY WORKS IF TESTS ARE INDEPENDENT (r=0)
- False Discovery Rate
- Set the proportion of false positives to be <a
- More powerful
- More coherent interpretation
- Lots of loci have been discovered
- Missing heritability
- Imperfect tagging
- Rare variants of medium effect
- Many common variants with tiny effect
- Not good for prediction (association)
- Predict Odds Ratio (but they're still low)
- Determining causality is very long-winded.
- Some things can be problems within regulatory regions
- Small additions from SNPs - complex disease often has lots of SNPs
- No pedigree but can run in families (not really predictive)
- THIS IS WHY OBESITY IS NOT A DISEASE IT IS A TRAIT
- Locus explains A LOT OF VARIABLITY
Why drug treatment/response is very complicated - many causes - many systems - many things to fix with drugs
Model of liability - set point of tolerance?
ALSO WHAT ARE THE CONSEQUENCES?
Penetrance: Individual with 'risk' genotype having 'risk' phenotype
(/) Genotype -> (/) Phenotype ?Phenocopy Rate: Probability an individual without the 'risk' genotype has the 'risk' phenotype.
(x) Genotype -> (/) Phenotype ?Relative Risk: Risk of someone having disease if they have genotype A compared to genotype B
- Someone developing disease compared to someone else with different exposure
(minor allele is used as the control allele)
- Someone developing disease based off one exposure alone