GWAS

Sample Selection

Replication and Functional Studies

Statistics

Genotyping and Quality Control

Case Control

Cohort

Trio

Assumptions: Case and control participants are from same population; genomic and epidemiological data are collected similarly in cases and controls

Disadvantages: Prone to a number of biases including population stratification, overestimate relative risk for common diseases, cases are usually prevalent (may exclude fatal/short episodes or mild/silent cases)

Advantages: Short time frame, large numbers of case and control participants can be assembled, ideal for studying rare diseases

Individuals with disease phenotype and unaffected individuals that are equally matched to other variables as much as possible

Functional Studies: When GWA suggests candidate disease gene or SNP that confers disease risk, experiments are performed to determine function e.g. cell culture, knockout/knock-in mice

Genotype affected individuals and their parents to assess transmission.

Assumptions: Disease related alleles are transmitted in excess of 50% to affected offspring from heterozygous parents

Advantages: Controls for population structure (immune to population stratification), logistically simpler, does not require phenotyping of parents

Disadvantages: May be difficult to assemble both parents and offspring (e.g. late onset diseases), highly sensitive to genotyping error (needs vigorous quality control and highly accurate genotyping)

Diseased or pre-diseased individuals genotyped first, monitored for disease, split into groups based on whether or not they show the phenotype.

Advantages: Direct measure of risk, fewer biases than case-control studies, cases are incident and free of survival bias

Disadvantages: Large sample size needed for genotyping if incidence is low, poorly suited for studying rare diseases, expensive and lengthy follow-up

Assumptions: Diseases and traits are ascertained similarly in individuals with and without the gene variant; participants are more representative of the population from which they are drawn

SNP Arrays: Isolate, fragment, and label DNA. Hybridize to chip containing SNP probes. Detect fluorescence to determine SNP genotypes for each locus tested.

Quality Control

SNP Call Rate: Proportion of total samples studied for which a particular SNP is reliably genotyped. Remove SNPs from analysis if call rate < 95%

Minor Allele Frequency: Proportion of the less common of 2 alleles in a population (<1% to 50%). Very rare alleles are hard to reliably analyze, cutoff for GWAS is >1%.

Samples are not mixed up (documentation e.g. XY status). 80-90% of SNPs should be successfully genotyped within each sample. Remove samples from analysis if they do not meet these criteria.

Linkage Disequilibrium: The non-random association of alleles. SNPs located near each other on a chromosome tend to be inherited together more often than would be expected by chance. Alleles of SNPs in high LD are almost always inherited together. Not necessarily on the same chromosome.

R2 Statistic: r is the correlation coefficient between loci. R^2 measures correlation of linked SNPs in the population (the proportion of variation of one SNP, explained by another). 0 = no association; 1 = perfect association.

Test of Association: Evaluates the number and extent of observed associations between SNPs and phenotype compared to expected associations (due to variation from sample)

Bonferroni Correction: Used to reduce the false positive rate that results when conventional significance level (p < 0.05) would result in inflated numbers of SNPs being associated with disease.


a' = a/m1 where a = significance level, m1 = number of markers.


-log10 of this value yields the minimum threshold for association between SNP and disease phenotype.

Replication

Confirmation of results in independent sample necessary to rule out false positive results. Can be part of multistage design.

Multistage Design

Carry forward a decreasing number of SNPs in a larger population or a constant size population.

Advantages: Reduce false positives, save money by performing fewer GWAs

Disadvantages: Decreasing number of SNPs analyzed could limit results to detect only SNPs with large effect.