Lect 3: Models of CD
Hugo Menchen: For every complex human problem there is a neat and simple answer that is wrong
1) Planning stage: Gathering basic knowledge
2) Choosing a strategy to identify risk alleles of CD
a) Linkage studies
- traditionally gold standard
- if primary target to study families
- identifies chromosomal regions that co-segregare with disease
- Est likelihood of physical link bet an allele at locus & disease
- Linkage degree -> LOD score
- Z> 3 is sig +ve linkage & z < -2 exclude linkage
- Successful if penetrance high
- Now post-genome -> precise location of human genes known + HapMap + SNPMap -> mutations & polymorph known -> testing candidate genes for r/s with disease susceptibility easier
- Allele sharing method
- Used in most mendelian disease to identify cause
- Families studies & affected sib pairs (x many studies based on sib pairs)
- Entirely computer dep
Non-parametric linkage studies
- model-free -> applied to complex diseases
- ignores unaffected individuals
- looks at sharing of alleles in affected individuals only
- method of choice in CD
- sufficient fam members available
- use extended fam
- use affected sib-pair analysis
- x case control studies
- alleles/haplotypes: identical by descent (within family) or state
Limitations
- Req informative families -> since CD sporadic -> statistical power low
- Even if sig family available -> common diseases -> complex & phenotype det by environement + genetics -> x cluster in Mendelian fashion
b) Association Studies
- Sporadic cases analysed
- Family cases may be collected but intro at end -> avoid bias
- vital tool to identify risk alleles in CD
- x physical link
- 2 grps -> from same popn -> cases & healthy controls observed
- 1st major -> MHC on chr 6p21.3
- Association with increased risk -> OR
- Used in complex diseases
- To identify risk alleles
- Commonly used -> Popn based case- control studies -> compare freq of 1 or more alleles -> higher in controls vs cases -> reduced risk if statistically sig.
Limitations
- False +ves -> proved difficult to replicate
Small studies -> weak by impt associations mssed
OR values -> estimates & shld be considered in Confidence Interval range
- Strategy used in CD (case control association studies)
- Unrelated cases & healthy controls
- Compare distribution of candidate alleles
- Result -> genetic association
- RISK -> measured as odds ratio
-1 = normal; <1 = reduced risk >1 = increased risk - E.g. WTCCC1 -> 1st huge study -> 7 diseases -> T1D
- More apt for ID where transmission is horizontal
G1: Identify risk alleles as an aid to diagnosis
- Genetic test -> increase probability of knowing which disease is which for diagnosis in clinics
- Risk -> measured as odds ratio -> low: x diagnostic use but good for pathogenesis knowlege
G2: Identify risk alleles & pathways involved in disease pathogenesis -> better understand disease pathology
- Early pathogenesis of diseases unknown
G3: Identify risk alleles (involved in disease pathogenesis) – as an aid to patient mgmt & therapy choice -> to target for therapy more effectively & develop new therapies
Genome wide studies -> GWAS or GWLS dep on families available
- Human Genome Mapping Project + SNPmap +Hapmap -> extensive study
- Impute missing date from from databases easily -> extending data
- Sufficient fully genotyped samples with complete haplotypes in database
- E.g. Impute missing genotypes based on expected linkage pattern -> exploits linkage diseqm
- Imputations -> PLINK software
- Greater the no of WGS -> greater the quality of imputed data -
- Success of method dep on degree of linkage diseqbm between tagged SNPs on target haplotypes
- x req hypothesis initially -> flexible; hypothesis generating
- Hence hypothesis free GWAS -> data -> identifies regions arnd genes that contribute to biological systems -> impt in understanding disease pathology
Each strategy -> sub-strategy - info to det plan reqd -> genome wide or selective approach
- if selective -> whole genome or single candidate gene looked ar
- dep on knowledge of previous & current studies & disease pathology
Studying extended haplotypes - lengthy DBA sections where multiple alleles inherited in specific grps
- Linkage diseqm (alleles of 2 or more genes found tgt more often than normal although equally segregated)
- Several impt genes all related to same pathway -> e.g. MHC contains key genes for T cell immunity -> widespread polymorph -> precise susceptibility allele -> difficult to identify
- But data set large -> x issue
- Combining haplotype data with GWAS data using imputation to assign HLA haplotypes -> interesting results in autoimmune liver disease primary biliary cirrhosis
Field of statistics vital - Sample size (stats confidence), case & control selection, sampling errors, publication bias
a) Prevalence & Incidence
- Incidence = no. of new cases/time
- Prevalence = total no. of cases of disease at time X (new & pre-existing cases)
- Diff reflect environmental factors
b) Family, twin, adoption studies -> check signs of heritability
i) Family Studies - Informative but families in complex disease rare & x conform to Mendelian patterns
- Incidence of traits
- Inheritance pattern
- Geography vital -> consider migration
- Beware heritability -> when talking about genes
ii) Twin Studies
a) MZ twins (identical)
b) DiZ (non-identical)
c) Identify genetic variation levels
- higher concordance in MZ vs DZ -> sig genetic component
c) Linkage analysis -> map susceptibility loci -> families and several affected individuals
d) Association analysis -> narrow down region -> x several affected individuals do this
e) Identify DNA seq variants conferring susceptibility
f) Define biochemical action
Before starting a study
- Simple measures: Risk ration; Concordance in twins; disease, characteristic, disorder/trait freq
- Indicators: Familial aggregation; geographic clustering
Studying selected chromosomal region - 2nd phase process & undertaken if prior studies identified regions of interest
- Tagged SNPs indicate association with specific gene but more likely across an area
- High resolution analysis using tagged SNPs -> identify association peaks in/arnd genes
- However commercial SNP chip unavailable/costly to apply GWAS to single chromosome
Investigating pathways
- Systems biology -> focuses on study of pathway interaction
- With prior data -> able to study CD
- E.g. Identification of NOD2 (CARD15) link to pathways associated with immune tolerance to commensal gut bacteria -> Crohn's disease -> value of styuding processes instead of individual genes in CD
Candidate gene approach - single gene - Hypothesis driven -> so limited to observing associations at specified loci
- loci selected -> hv some fun_al r/s bet gene pdt & disease
- limitations:
- lack understanding of human biology & disease pathology
- Narrow view -> miss other associations
- But if prior studies solid -> high resolution genotyping of gene to identify polymorph associated with disease susceptibility relevant
- Early pathogenesis of diseases poor -> late onset of disease -> poor hypothesis -> gene candidate selection poor -> gene studies rarely successful
- Pathology changes but genome same -> so genome wide then observe pathways, haplotypes & candidate genes
- Biological defects known: target specific gene or genes in biochemical pathway
- If unknown biological defect or characterised with uncertainty
o Whole genome scanning –> microsatellite/SNPs
o Multiple candidates -> multiple pathways
Positional approach to select candidate
- Single gene (selected candidate)
- Chromosomal region (e.g. MHC)
- Whole chromosome & GWAS
Functional approach to select candidates -> hypothesis driven
- Candidate region/gene -> MHC on chromosome 6p21.3 -> genetic association with disease -> high as impt with immunity
- Pathway (HSCR)
- Both -> Complementary -> CARD15 (protein)/NOD2(gene) in Crohn’s Disease