L28 - Disease Mapping

  1. Be able to describe the steps in a positional cloning experiment
  2. Be able to define the term linkage analysis and how it is used to determine location of a disease gene within the genome
  3. Be able to describe how linkage analysis can be used in conjunction with next generation sequencing techniques to aid in disease gene identification
  4. Be able to describe LOD score analysis and understand the significance of various LOD scores
  5. Be able to describe association analysis including the types of study design and individuals studied (families vs populations)
  6. Be able to describe the concept of linkage disequilibrium and how it applies to association studies
  7. Be able to describe the limitations of association studies, including GWAS
  8. Be able to describe genome wide association studies (GWAS) and how the common disease common variant hypothesis underpins GWAS
  9. Define and understand the concept of haplotypes and their use in GWAS
  10. Be able to describe what the data from a GWAS plot (broadly) means

Positional Cloning - Cystic Fibrosis

Cystic fibrosis was the first genetic disease to be determined using positional cloning

Identification of Disease


Remember;

  1. Two loci are linked if they appear nearby in the same chromosome.
  2. The task of linkage analysis is to find markers that are linked to the hypothetical disease locus

Multipoint Linkage Mapping

  • Multipoint mapping uses several markers at once to localise a disease gene relative to the other markers in the map

More efficient process than using one marker at a time

Informative Meioses


Obtaining enough family material to test multiple meioses is difficult for rare diseases


Higher RF between two loci, more meioses needed to obtain evidence that
they are linked


SLIDE 17

Newer Methods for Identifying Genetic Dseases

Micro-Array Disease Identification

Expression Array(No details given.

Next Generation Sequencing for Disease Identification

(Basic understaning of what's happening, not needed to know in detail)


Sequence in parallel, hundreds of thousands DNA fragments simultaneously.

  1. Sanger sequencing = synthesise the same sequence 100s of times (differing in length by 1nt, ending in different fluro dNTP)
  2. In Next Gen sequencing = fluro dNTPs used, but synthesis of the same strand continues
    • Many newly sequences are synthesised in parallel ad read at once

Targeting a Diseased gene - (Disease Target Panel)

Whole Exome Sequencing

Requires a comparison genotype, (e.g.family member).
Largely uninformative if used on a single individual.

Whole genome

  1. Not commonly used for diagnosis
    • Costly
    • Sensitive to splice site variants (intronic)
    • Not useful for determining effect of non-coding sequence
  2. Limitations
    • Repeats, >50% of human genome is repeats, these do not sequence/align well
    • Difficult to identify structural variation - aligns to duplicates more than one
    • Data storage and analysis huge computational power and cost
  1. Genomic DNA is fragmented
  1. Fragment DNA is hybridised with RNA library ‘baits’
    • These are designed to anneal to
      the genes of interest*
  1. The RNA baits contain a biotin label which makes them discernible
  1. The target DNA is purified from the total DNA by addition of streptavidin beads which coat the hybrids and which are magnetic
  1. Relies upon a Biotin-streptavidin-bead-magnet
  1. Target DNA can then be
    amplified and sequenced*

Probes rely upon hybridisation seletcion to capture the coding portion of the genome for high throughput sequencing

Reduced cost compared
to whole genome

Can pick up mutations in
genes not already known
to cause disease

Data Interpretation

As there are ~ 3 million SNPs between each person and a reference geneome, it's difficult to determine which SNP are benign and which aren't


Difficult to interpret the sequencing DATA

How to determine;

Variants and deleterious mutations in unknown gene(s)

Synonymous variants

Missense variants of uncertain significance in
known genes

Interpreting this enormous number of SNPs is understandably difficult => Necessary to refine regions of the genome needed to be searched


Achieved through;

Linkage

Linkage

  1. Linkage is measured by the recombination fraction, θ= probability that, in any meiosis, there will be a recombination between adjacent loci.

  • θ = 0.0: complete linkage
  • θ = 0.5: no linkage

Determining Linkage between marker and disease Loci


  1. Collect families with affected individuals
  2. Genome Scan Test markers evenly spaced across
    the entire genome (~every 7-8cM, ~400 markers)
  3. Lod score ( log of the odds ””) what are the odds of observing the given family marker data, if the marker (1) is linked to the disease (less recombination than expected) compared to if the marker (2) is not linked to the disease

Prolems with Mapping Humans

  1. Can't do controlled crosses (unethical)
  2. Idealic crosses equivalent to "test-cross" (homozygous dom with heterozygous) are very rare in population
  3. Humans produce very small number of progeny

Solution
To get statistically significant evidence for linkage, => combine the results from many identical mating's ie. combine pedigrees

Lod Score

Lod score is a statistic that describes the strength of evidence for linkage, at any chosen value of the Recombination Fraction,


=>
Is it likely that this value is true if the marker was actually linked/unlinked?

Calculating LOD

Calculate two different probabilities for obtaining a specific set of recombinants observed in a family

1. First probability is calculated assuming independent assortment (No linkage)

2. Second probability is calculated assuming a specific degree of linkage (RF = 0.0 - 0.5)

Lod Score (Z)

A test to estimate whether the likelihood of two loci being linked is greater than likelihood that the same two loci are unlinked

Equation (Slide 23-25)

Successes and Limitations of Linkage Mapping

Successes.

This technique has been widely used to identify chromosomal regions linked to

  1. Diabetes
  2. Breast Cancer
  3. Alzheimer s disease
  4. Bipolar disorder

Limitations

  1. Genetic Heterogeneity: the phenotype is affected by many loci (or even different loci in different families)
  2. Incomplete Penetrance: individuals carrying a gene may not show a phenotype.*

Linkage Analysis narrows suspected region of DNA

1


Determine if disease is inherited with certain molecular markers

Identified Clones sensitive to region of interest


Researchers consulted the genomic library to identify clones - they isolated these by chromosome "walking" and "jumping"
2

Examination of Clonal Sequences

Examing the clonal sequences revealed four candidate genes in the region


Use existing knowledge to determine which candidates most plausibly contribute to diseased phenotype

Additional Studies Eliminated three genes


Further DNA sequencing in unaffected family memnbers and patients with the suspect genes revealed a 3-bp deletion in the gene of the CF patient
3

CF Mechanism of Disease

Gene for CF encodes a membrane protein that controls chloride movement into & out of cells
Mutations cause channel to stay closed = chloride builds up in
cells, thick mucous build up

SNP Genotyping - Microarray SNP-Chip

  • Probeswhich differe from one another at one base psosition will bind to different complimentary sequences. They fluoresce at unique base and allow the sequence to be read

Variants in non coding regions

The Closer LOCI are together, the less likely it is that they will be separated through recombination

Thereby LOCI which are close together, are inherited together

LOCI which are linked are in Linkage Disequilibrium

Once Calculated;

  1. Log of ratio is the lod value
  2. Repeat calculation for a range of different RF (degrees of linkage)

Analysis of Log score derived from Multipoint Mapping

  1. Lod>3 = Statistically significant data for linkage
  2. Gene responsible for the disease, located under peak
    1

Strategies for Enriching information obtained through Linkage Analysis can be applied in any of these two scenarios

Genetic Heterogeneity

Reliance on low incidence families, as high incidence implies that there is more than one gene variant affecting disorder

Incomplete Penetrance

If low penetrance, than high incidence families are more informative = more individuals to detect the disease in.