Please enable JavaScript.
Coggle requires JavaScript to display documents.
Week 6: Disease Gene Mapping (Genome-wide association studies (Gene…
Week 6: Disease Gene Mapping
Linkage
Measurement
Recombination fraction
Proportion
of variants
Complete linkage
RF = 0
No linkage
RF = 0.5
Linkage analysis
Requires
informative meioses
Many of these needed to establish evidence of linkage between two loci
Difficult to obtain enough family material to test meioses for rare disease, or when RF is higher
Genome scan
Test markers that are evenly spaced across entire genome (every 7-8 cM, ~400 markers)
Purely positional or location-based approach to finding
susceptibility genes
, following from linkage analysis
Lod analysis
Statistic that describes strength of evidence for linkage, at any chosen value of RF, given family data available
"Log of
the odds"
Calculation
Calculate two probabilities for obtaining a specific set of recombinants observed in a family
Assume firstly independent assortment
Assume secondly specific degree of linkage
Calculate ratio of probabilities
Log of value = Lod score
Compare to statistically significant ranges
Z > 3
Evidence of linkage
RF value is 1000x as likely as no linkage
2 < Z < 3
Suggestive of linkage
-2 < z < 2
Uninformative linkage analysis
Z < -2
Exclusion of linkage
Recombination mapping
Problems
Cannot do controlled crosses
Solution: combine results of many identical matings, i.e. combine pedigrees
Humans produce a very small number of progeny
Crosses equivalent to test-cross are extremely rare
Mapping
Multipoint linkage mapping
Use 1000s of markers to construct genetic map across whole genome
Uses several markers at once to localise disease gene relative to other markers in the map; more efficient
Genetically-complex disease
Identified chromosomal regions through linkage mapping
Diabetes
Alzheimer's disease
Breast cancer
Bipolar disorder
However, not every finding has led to convincing replication
Linkage analysis
Follows meiotic events through families for co-segregation of disease and particular genetic variants
Could be different marker in each family
Test large families, sibling pairs
Works well for Mendelian diseases
Yields broad chromosome regions harbouring many genes
Resolution comes from recombination events (meioses) in families assessed
Good as needs few markers, poor as hard to find specific variants involved (although disease gene and markers reasonably close, ~1Mb apart)
Linkage disequilibrium
LD analysis
Must be same marker across families
Case-control, cohort designs, parents-affected child trios (TDT)
Detect association between genetic variants and disease across families (populations)
May be more appropriate for complex diseases
Yields fine-scale resolution of genetic variants
Resolution comes from ancestral recombination events
Good for finding specific variants (LD is detected for markers 10-20kB away), poor as needs many markers
Types of association studies
Population-based
Case and control study
Collect affected subjects (
cases
) with unaffected subjects (
controls
), and compare frequency of genetic components between two groups
Uses
Advantages
Quite powerful to detect relatively small genotypic effects, even in modest samples of cases and controls (e.g. 100-500 of each)
Easy to collect the cases and controls or general population samples
Disadvantages
Population stratification
: if there are underlying differences in the cases and controls unrelated to disease risk, false positives are more likely
_
Spurious association
Differences
2 more items...
Solutions
2 more items...
Family-based
TDT: Transmission disequilibrium test
Collect affected child and their parents, and compare distribution of transmitted allele to that of non-transmitted allele from parents
TDT
Looks at the transmission of alleles from
heterozygous
parents to
affected
children, to test if there is deviation from
Mendelian
segregation ratios
If there is transmission distortion, this suggests an etiologic association between the allele and the disease
Uses
Advantages
Resistant to potential bias from population stratification
Disadvantages
Requires at least one parent to be heterozygous at marker being tested, hence power of this approach is significantly lower
Allelic association
Markers remain in LD with the ‘founding’ mutation over many generations, hence trait correlates with marker allele in a population even when individuals are unrelated
Over generations, the conserved segments around mutated locus become shorter, so the closer a marker gene is to the disease gene, the more likely it will stay linked over time
Founder effect
Terminology
The non-random association of alleles in the population; alleles at neighbouring loci tend to co-segregate
In
LD mapping
(population-based), look for variant allele in LD with disease – if most affected individuals in a population share same mutant allele, then LD used to locate chromosomal region harbouring mutant allele
Haplotypes
Some alleles do not assort independently
Give rise to ~10-50kb haplotypes, i.e. combination of specific SNPs (alleles) on a chromosome
All genetic markers are inherited together; if one is in LD, then all others in LD
_
1 more item...
Over time, the region of disequilibrium (characterised by recombinations between the mutant and marker alleles) dissipate
Genome-wide association studies
Enablers: Mid-2000s
International HapMap/1000 Genomes
projects delivered hundreds of thousands, and then millions, of mapped SNP loci
Extension of
microarray
technology
allowed automated genotyping of huge numbers of SNPs across the genome (SNP Chip)
Method
Extract DNA (genotype)
Calculate which of the ~300-500k SNPs and/or haplotypes if more frequent in case than control :
Cost: $1000 per individual by standard methods
Whole genome chips: cheaper, higher throughput
Concept
Designed to identify common variants, i.e. assume common complex diseases are caused by
common variants
In
case-control studies
, panels of affected individuals and matched controls are genotyped at hundreds of thousands of common
SNPs
(>0.05)
SNPs
Distinct features
SNPs
Multi-allelic, high heterozygosity
Informative, complex genotyping assays
Linkage studies: 300-600 markers (~1 Mbp)
1 per 50k bp
Whole genome linkage study
Around 500k SNPs in both gene-chips
6k Illumina SNPs, 10k Affymetrix SNPs
Microsatellites
(Di-, tri-, tetra-nucleotide repeats)
Less informative, simplified genotyping platforms (+/- calling)
Bi-allelic and less informative
1 per 1000 bp (most common variant)
Steps in SNP association studies
Compare
allele frequencies
for each SNP in the two groups
Genotype SNPs associated with disease (
statistical threshold
) in second independent cohort, and determine which associations are robust
Genotype tagged SNPs in disease cases/controls using
micro-arrays
Use
HapMap data
(map LD) and select representative SNPs that differentiate (tag) common haplotypes at each locus
Visualisation
Q-Q plots
Case-control
Chi-squared comparison of absolute genotype counts if calculated for each variant
Quantile-quantile; 2 types of distributions of observed test statistics
Gene identification
Study gene functions in tissues and tumours
Check all candidates (try mutations, expression in cell lines, knockout in mice)
Map all candidate genes (experimentally or in silico)
Screen 100s of patients to find minimal region associated (refine area)
Locate target chromosomal area
Evaluation
Postulates
Common variant hypothesis
Different combinations of variants at multiple loci aggregate in specific individuals to increase disease risk, explaining steep falling-away of disease risk in relatives of pro bands with common disease
Common variants expected to be of ancient origin, merely susceptibility factors with weak deleterious effects
Mild missense mutation
Changes in gene expression
Rare variant hypothesis
An alternative explanation of common disease fuelled by doubts about how much common variants really contribute to disease susceptibility
Rare variants that originate by comparatively recent mutations do not appear on common haplotype blocks (which have ancient origins)
Given great mutational heterogeneity in Mendelian disorders - possible this applies in complex diseases
Large-scale DNA sequencing has been launched to seek out rare variants associated with complex disease (family studies)
Limitations
Available GWAS data explain only a small proportion of genetic variance of complex diseases (the 'missing heritability' issue)
Explanations of
missing heritability
Rare variants
have large effect, but GWAS restricted to identifying associations with common variants; heterogenous disease,
endophenotypes
Gene-gene and gene-environment interactions
means concept of heritability is flawed due to assumption of additive effect of loci and non-consideration of genetic interactions
Large numbers of
common variants have weak effects,
which are missed in GWAS using 1000s of cases/controls; need larger numbers in OG study or in meta-analysis data from multiple studies
Common disease variants have weak effects, even when cumulative
Exception: novel factors that strongly predispose, e.g. age-related macular degeneration
Susceptibility genes
Alzheimer's disease
Early-onset (rare, dominantly inherited) genes and susceptibility factors for common late-onset belong to same biological pathways
Both forms have same brain pathology
Amyloid beta (A-beta) peptides (formed by cleavage of amyloid-beta precursor protein APP; peptides considered causative agent)
Intracellular tangles of tau protein
Abundant extracellular plaques
Lupus disease
An autoimmune disease with around 60-100 susceptibility factors now identified