Please enable JavaScript.
Coggle requires JavaScript to display documents.
WEEK 3: GENETIC VARIATION (Nature of variation (change to base sequence)…
WEEK 3: GENETIC VARIATION
Small-scale variation
Single nucleotide substitution
SNV
= single nucleotide variants
2+ DNA variants exceeding frequency of 0.01 in population
SNP
= single nucleotide polymorphism (two alleles)
RFLP
(
Restriction fragment length polymorphism
) due to gain/loss of RE (restriction endonuclease) caused by SNP subset
For any SNP loci, many individuals with two haploid genomes will be homozygous
~1/100 NTs (vast majority rare in population)
Non-random patterns
Different regions undergo different mutation rates
Mitochondrial DNA > nuclear
Excess of CT substitutions (methylation)
Evolutionary ancestry
Alternative SNPs
mark alternative ancestral chromosome segments common in present day population
Certain NTs polymorphic, others rarely show variants
1.1 x 10^-8 per generation, 1 per 100 Mb
1 per 1000bp between maternal/paternal (personal sequencing)
Indels
Technically should be
copy number variants
, but modern convention defines as deletions/insertions up to 50 nucleotides
1/10th frequency of single nucleotide substitution
Short insertions
more common than long
90% are 1-10 nucleotides
9% are 11-100 nucleotides
1% are greater than 100 nucleotides
From population-based genome, ~75% DNA changes are
single nucleotide changes
(i.e. most common variation type)
Consequences of variation
Neutral
Majority neutral effect on phenotype
Many DNA changes no effect (
coding, regulatory, non-coding RNA
) even within small target of sequences important for gene function
Functional genetic variation
(i.e. variants with effect on gene function)
Difficult to estimate how much of genome is functionally important
Extremes
Virtually all amino acids can be replaced while maintaining original function
New function gained
Single mutation may be sufficient
If >1 mutation needed, order of mutation events may be important (many evolutionary failures)
Mutation nomenclature
Base replacement
5162GA = guanine to adenine at base position 5162
Indel
197delAG
2552insT
Amino acid replacement
R197G = R to G at AA position 197
Research
Databases
dbVar
Genomic structural variation
DGV
dbSNP
SNPs and other short genetic variations
ALFRED
Allele frequencies in human populations
Building variant maps for gene-finding
Human Genome Project
Good for consensus
Not good for individual differences
Identify genetic variants
Anonymous with respect to traits
Assay genetic variants
Verify polymorphisms, catalogue correlations amongst sites
SNP Discovery
Two phases
Phase 1: SNP Discovery
Phase 2: SNP Characterisation
Goals
Identify 300,000 SNPs
Determine allele frequency of SNPs
Need reference genome to find SNPs: HGP
Projects
HapMap
Produce fine-scale genetic map: common resource for biomedical researchers
Genotype 600,000-1,000,000 SNPS genome-wide
Four populations: CEPH (Europe), Yoruban (Africa), Japanese/Chinese (Asian)
Phases
Two:
Additional 4.6M SNPs genotyped
One:
1M common (minor allele freq. >= 0.05) SNPs (every 5kb across genome) genotyped in 269 DNA samples from four populations
1000 Genomes
Phase 1
14 populations: Europe, East Asia, sub-Saharan Africa, America
Genotyping 1092 individuals
Whole-genome (low coverage; 2-6x) and exome sequencing (deep coverage; 50-100x)
Phase 3
Most recent
2535 individuals
26 populations
Exome
and
whole-genome
data
OMIM
System of cataloging human genes and genetic diseases
ENCODE
Encyclopedia of DNA Elements (2007)
Preceding
projects
2003: Human genome complete
2005: Human Epigenome Project
(aimed to identify, catalogue, and interpret genome-wide DNA methylation patterns of all human genes in all major tissues)
2006: International Human Epigenome Project (HIEP)
(aimed to decipher at least 1000 epigenomes within 7-10 years, and provide high resolution maps of histone modifications/lDNA methylation/transcription start sites/non-coding + RNAs
Progress
Began as pilot project on 1% of genome
2007: Effort scaled to whole-genome assays followed by expansion to similar assays in mouse
Comprehensive catalogue of gene and functional elements in human and mouse genomes
Measure RNA expression levels
Identify proteins that interact with RNA/DNA e.g. modified histones, transcription factors, RNA-binding proteins
Measure levels of DNA methylation
Identify regions of DNA hypersensitivity
Mid- to large-scale variation
Repetitive DNA accounts for large fraction of human genome
Tandem copies
(1-200bp) are common
Multiple repeats sections are prone
to variation
Minisatellite DNA
Telomeres, subtelomeric regions
100bp - 20kb
Diversity
Meiotic recombination
between misfired repeats change unit number
Misaligned chromatids on homologous chromosomes
Unequal
crossover
Misaligned chromatids on sister chromatids
Unequal sister chromatid exchange
resulting in two chromatids (one with extra repeat, one with unit missing)
Microsatellite DNA
<100bp
Euchromatin
Have multiple alleles (unlike SNPs)
Markers
More informative than SNPs for distinguishing between individuals or following chromosome segments through pedigree
Early years HGP devoted to defining and mapping microsatellites (~150 000 identified)
Genetic marker of choice since 1990s
Not as easy to automate as SNPs
Satellite DNA
20kb - 100s kb
Centromeres, heterochromatic regions
Repeat sequence instability
Variants differ in number of repeats
Copy number variation
Results from
replication slippage
or
unequal crossover
Slippage causes insertion when template strand loops out
Slippage causes deletion when sense strand loops out
Structural variation
Balanced
DNA variants have same DNA content but differ in some DNA sequences located in different positions in genome
Chromosomes break and fragments are incorrectly rejoined, without loss or gain of DNA (i.e. inversions/translocations)
Unbalanced
DNA variants differ in DNA content: rare case where person has gained/lost chromosomal region often resulting in disease
Also includes commonly occurring CNV (copy number variants) along moderately to very long DNA sequence, some contributing to disease
= 25% of mutation events, dominated by CNV
Nature of variation
(change to base sequence)
Human
populations
Within populations
Frequencies of alleles may vary, esp. for morphological traits
Between populations
Nucleotide diversity in introns, regulatory sequences, flanking sequences
Comprises 85% of total genetic variation
33% of protein-encoding loci are polymorphic
Types
Do not affect DNA content
Net loss/gain of DNA sequence
Change in copy number of sequence (large or small)
Abnormal chromosome segregation
Indel of single NT or short sequence to Mb DNA
Affect DNA content
Number of nucleotides unchanged
Multiple nucleotides move location without net loss (rare)
Translocation
Inversion
Single nucleotide
replaced
DNA
variants
Alternative form of DNA produced by mutation
0.01 frequency
Polymorphism
<0.01 frequency
Rare
Venter & Watson diploid genome sequencing
compared to reference
3.2M SNPs
290k heterozygous indel variants (1-571 bp)
559k homozygous indel variants (1-82,711 bp)
90 large inversions
62 large-copy-number variants
Total 12M+ nucleotides different (majority non-coding)
44% Venter genes had sequence variant (17% encoded altered protein)
Origins
Errors in
replication
or
recombination
Unavoidable
Usually quickly corrected by DNA polymerase
Damage and chemical alteration of DNA by
endo-/exogenous
sources
Errors in
chromosome segregation
Abnormal gametes
Fewer/more chromosomes
MHC polymorphism
Pathogen-driven
: strong selection pressure due to emergence of mutant pathogens that seek to evade MHC-mediated detection
Gene duplication
: multiple MHC genes with different peptide-binding specificities
Many MHC genes extraordinarily polymorphic; most of all proteins
Most polymorphic loci
A
B
C
DPB1
DRB1
DQB1
Population genetics
#
Areas of investigation
Genetic variation within population (genetic composition)
Comparison of populations
Processes that lead to genetic composition changes
Causes of genetic change in populations
New alleles introduced by mutation
Migration
changing population composition
Differential reproduction by different genotypes resulting in natural
selection
Mating
may be random/assortative with in-/outbreeding
Recombination produces new allele combinations
Random fluctuation
in reproductive rates resulting in genetic drift in allele frequencies
Mutation rates
Probability that a copy of an allele changes to another allelic form in one generation
Increase in frequency of a mutant allele = mutation rate x frequency of non-mutant allele
Mating
Assortative mating
(+ or -)
Trait-specific
Alleles identical by state
(alike in structure/function but not origin)
Inbreeding
Whole genome
Increase in homozygosity
Causes departure from
Hardy Weinberg frequencies
Alleles identical by descent
(copies descended from single allele present in ancestor)
Statistics
Inbreeding coefficient F
Probability two alleles are identical by descent
0 = mating occurs randomly in large population
1 = all alleles identical by descent
Measured by pedigree analysis or reduction of heterozygosity in population
Japanese study 1965
10% increase in F = 6pt drop in IQ
Children of 1st cousins = 40% increased mortality
Coefficient of relationship R
#
Proportion of alleles shared by two persons due to common genetic descent from one or more recent common ancestors
= 2F
Self-
fertilisation
Repeated generations of inbreeding splits a heterozygous population into series of completely homozygous lines
Consanguineous marriage
1st degree relative
share 1/2 genes
Parents (always)
Full siblings (on average)
2nd degree
share 1/4 genes
Grandparents/children, uncles/aunts, nephews, half-siblings (on average)
3rd degree
relative
Share 1/8 genes
on average
Darwinian Evolution
Principle of heredity
Offspring resemble parents more than individuals to which they are unrelated
Principle of variation
Variation exists in morphology, physiology, behaviour among members of population
Principle of
selection
Some variants more successful at surviving and reproducing than other variants in given environment; variants of higher fitness are
naturally selected
Sickle cell anaemia
Autosomal recessive disease
AA (normal)
SS
Severe anaemia
Hb crystallises at low O2 levels causing RBCs to become sickle-shaped and rupture
AS
Mild anaemia
does not allow malaria entry; higher fitness in
malaria
areas
Altered environments adaptation
Malaria-infested environment
RBC physiology alterations affecting transmission of P. falciparum or P. vivax and increased resistance to malaria
Pathogenic mutations in
HBB
or
G6PD
for P. falciparum; inactivated
DARC
variants not expressing Duffy antigen in P. vivax malaria
Lifelong intake
of fresh milk
Persistence of lactase production in adults allowing efficient digestion of lactose
13910T
allele about 14kb upstream of lactase gene LCT
High-altitude
(low O2 tension)
Lowered haemoglobin levels and high density of blood capillaries provide protection against hypoxia
EPAS1
variants (key gene in hypoxia reponse)
High dietary starch
Increased production of enzyme needed to digest starch efficiently
High
AMY1A
copy number
Reduced sunlight (low UV)
Decreased pigmentation allowing more efficient transmission of depleted UV to deep layer of dermis to synthesise Vitamin D
SLC24A5
variant replacing ancestral alanine at position 111 by threonine
Selective sweep
Variant becomes fixed in a population