Tools for understanding of complex disease (Quantitative Genetic Traits ( …
Tools for understanding of complex disease
A statistic term for extent individual differences for a trait in a population can be explained by genetics
Variance (Phenotype) = Variance (Environment) + Variance (Genetic)
Heritability2 = VarG / VarP
Variance split by:
Additive genetic effects (VarA)
Non-additive genetic/epistatic effects (VarD)
Variance split by:
Individual Environment (Var E)
Common Environment (Var C)
VarA/VarP = NARROW-SENSE HERITABILITY
VarP = VarA + VarD + VarC + VarE
Behavioural genetics does not include non-additive effects (epigenetics) -> VarP = a2 + c2 + e2
BUT HOW CAN YOU RULE OUT ONE EFFECT - SURELY THIS WOULD NOT BE ACCURATE?
Two independent people can have very similar environment... socioeconomic status, city/rural, healthcare etc
Estimate effect of
of relative with
e2 (0) = c2 + a2 + VarP
Heritability = MZ correlation - DZ correlation x 2
If the trait is purely genetic, in theory, MZ correlation should be twice DZ.
If it's not twice, common environment must play a role.
VarE = 1 - MZ correlation
rMZ = a2 + c2 = 0.80
rDZ = .5a2 + c2 = 0.50
rDZ-rMZ = 0.30
a2 = 2(0.3) =
0.6 = genes
c2 = 0.8 - 0.6 =
0.2 = common e
1 - 0.6 - 0.2 =
0.2 = unique e
No interaction or correlation between genetics and environment
Generalisability to the general population
CRITIQUES OF ABOVE
rGE - Can genetic factors affect the choice of environment?
GxE - Genetic control of the sensitivity to the environment - e.g. some are more likely to get depressed than others. (Bigger family input can give more info)
No environment is equal anyway, but twins have an even greater shared environment as they are treated more similarly - therefore heritability is overestimated. Labelling twin studies can show this criticism.
Not in all populations - partners who choose each other other on specific traits can decrease heritiability - e.g. delinquency.
Are twins generalisable to the population?
SOLUTIONS TO CRITICISMS
Including parents can:
enable estimation of
dominant genetic variance
split common environment by sibling effects
VarC = 1, VarA = 0.5
-RSIB = 0.5A2 + 1C2
MONOZYGOTIC TWINS (REARED APART)
VarC = 0, VarA = 1
VarC = 1, VarA = 0
VarC = 1, VarA = 0.5
VarC = 1, VariA = 1
rMZ-rDZ = 0.5a2
a2 = 2(rMZ-rDZ)
c2 = rMZ - a2
e2 = 1 - (a2 + c2)
Explains the % of variation with SNP and trait by estimating the genetic relationship between all individuals.
The genetic relationships are then correlated with the phenotypic similarity between the individuals and used to estimate variance.
'burden of liability' greater with families
difficult to measure continuous traits are not affected or unaffected e.g. height.
POPULATION MEASURE - can not be used for individuals
Expression is a seesaw between heritability and environment.
TwinsUK and Phenotypes collected
Compares 12k MZ and DZ twins - stays with the critique of shared environment.
Healthy, 80% female, aged 18-102, highly engaged and recallable
Determine the contributory factor of the genes associated with common complex diseases of aging and associated traits
Integrated biomedical resource for research community
• Data is available to bona fide researchers upon application
• Many projects are collaborator driven
• Healthy, unselected population allows use as controls in many studies
Receive questionnaires once a year
Linked with medical history
Clinical visit 6 hours, every 4 years (continuous)
They take Metabolomic (whole genome, microbiome, epigenetics etc), Biochemical, Physiological, Lifestyle, Family history, and hospital/death records.
GWASs with over 100 traits and 500
- prioritised understanding the mechanism
MuTHER - expression project
Multiple Tissue Human Expression Resource
Gene expression is an essential cellular function whose regulation determines a significant amount of
Gene regulation is under genetic, epigenetic
and environmental control
Tissue-specificity of gene expression
regulation points to the importance of
studying appropriate cell types (preferably in
vivo primary tissue)
Took adipose tissue, whole blood, lyphoblasoid cell lines, and whole skin.
There is single tissue expression but most expression overlaps in all genes.
Regulation of genes can be local (cis) or distal (trans)
High heritibility in most twins but not always a unique environment effect (but could be prone to batch bias)
eQTL studies have a large effect size, have a proximal endophenotype, and a reduced search space (multiple testing)
Integration of 'omics data and phenotypes
Intergenic variants in the FTO gene are robustly associated with obesity but JUST IN BRAIN NOT IN ADIPOSE - regulated by IRX3 (touching in loop)
GWAS signals mediated by cis-eQTLs are highly tissue specific - therefore sample needs to be done in the appropriate tissue.
Maternally expressed transcription factor KLF14 but only expressed in adipose tissue BUT causes transcription factors to bind in various places in adipose tissue across the genome and change regulation. PROTECTS FORM T2D BY SHIFTING FAT DISTRIBUTION.
Metabolomics can help measure things to avoid questionnaires e.g. nicotine / caffeine.
There are many metabolite - disease associations
One SNP can explain 30% variance
. Different things are metabolised at different rates - this is linked to eQTLs
Can use mendialian randomization to look at causal SNPs
Microbiome sequencing to find heritibility and associations
Found one heritable microbe assoicated with lean twin - christensenellaceae - gave it to mice and they lost weight (Goodrich et al 2014)
Big data projects to disease.
Try to link genetic variation to disease, either through GWAS or QTL.
OR link common genetic variation
with the environment
to disease. Through transcriptome (eQT) -> Proteome (pQTL) -> Metabolome (mQTL) ->
CARTaGENE - Quebec
COLLECTS from the ENTIRE COHORT:
Blood and urine -> for DNA and PMC
Questionnaires -> gender/age, Demographics, location (pollution), life habits (nutrition), mental state, psychosocial environemt (stress/life events), disease history (individual/family/medication)
Blood data - cell counts/ cell measures / serum measures
Physical tests / cognitive tests / anthropometric tests
- RNA Sequencing - transcriptome (1k ppts)
Deep Sequencing - DNA Seq - exome sequencing (1k ppts)
Genotyping - DNA - whole genome (96 ppts) - does 2.5 mil SNPs.
1% urban population of Quebec (60k)
Complex diseases require deep phenotyping
PROSPECTIVE cohort of healthy ppts (there is self-reported disease/risk traits)
Can access medical, phrama, and geneology
use variation in gene expression to explain phenotype effect
Big differences between city ppts and rural ppts - a lot of involved genes are involved in o2 transport - pollution?.
Integration between DNA and RNA - differences in alleles are associated with expression of the gene
RNA sequencing data gives more than just expression:
GWAS with SNPs from RNA Seq can find mutually significant results.
Allele Specific Expression - different alleles have different expression - SNPs or bias in this expression can be associated with disease.
Ongen 2014 sequenced tumour cells and normal cells form the same individual - there was significant ASE in the cancer cells - gene dysregulation.
RNA Methylation - can change base pairs post-transcription - changes to proteins.
Mitochondria patterns - only 16.5k bps long (more DNA has moved over to nucleus) - codes for 13 proteins for ETC - associated with 580 diseases (really bad diseases as well).
Each gene is separated by a tRNA - important for cleavage.
rs 11156878 significantly correlates with mRNA expression - explains 22% variation - is also associated with BMR
Quantitative Genetic Traits
Traits which can be
and show a
resemblance between relatives
whereby they can be determined by genetic and environmental factors.
Have a population distribution,
Different quantitative traits can affect the effect of susceptibility/resistance to complex diseases
The background the the genetic differences in popuations can be increasingly useful in mixed populations where ethnicity is less black and white. GENETIC MARKERS CAN REPLACE ETHNIC LABELS.
Highly Complex Traits (polygenic)
Less Complex Traits (Oligogenic)
Variable Gene Activity (Mono/oligogenic)
Altered Gene Function (Monogenic)
You can learn the effect of one allele using the regression coefficient between homo/heterozygous
The r2 and p-value are dependant on the variability of effect.
Study the process of one trait to work out the variability of the effects of the disease.
This can be the 'hidden heritabillity'
If you find significant effects then the variant is most likely a
QUANTITATIVE TRAIT LOCUS
you can work out the QTLs to find the missing heritibility
a person's drug response based on genetics
leading to stratified medicine
Can be fatal
2 billion NHS funding
6.5% hospital admissions
Reactions can vary with:
Important to understand:
Want to split people into different gentotypes and the effect the drug will have
- 44 DRUGS REQUIRE GENETIC TEST IN USA OR EU
Adverse skin reactions
DRESS: Drug rash with eosinophilia and systematic symptoms
Fever / Facial oedema / Rash
Hepatic / pulmonary / cardiac issues
10% people with HIV get very severe reaction
AGEP: acute generalised exanthematous pustulosis
TEN: Toxic Epidermal Necrolysis
Blistering / inflammation / can't swallow / death
Screening for genetic sensitivity for HLA-B*5701 can help reduce DRESS in abacavir
(what the body does to the drug)
Pharmacodynamics (what the drug does to the body)
Genetic determinants of response / non-response
Genetic predictors of serious adverse events
Optimised drug development
PGx study design
Informative PGx studies already fulfil the following criteria:
Evidence that the PGx trait has a heritable basis
Unambiguous diagnostic criteria
Low background incidence
Availability of powerful cohort
Good biological knowledge of drug action
CAN TAKE 15 YEARS TO DO.
effect of genetic variation
on response phenotypes
Cases vs. controls
• Selected from patients who receive drug
• Efficacy: cases are responders, controls are non‐responders
• Safety: cases have an adverse event, controls do not
Continuous/quantitative trait analysis
• All patients receiving drug
• Efficacy: cholesterol levels, blood pressure, glucose, FEV1, virus levels, etc
• Safety: liver enzymes, kidney enzymes, ECG QT interval, etc
Mayo clinic did introduce this but only 30% clinicians actually used it...
Warfarrin - Blood thinner
Therapeutic dose varies from patients to patient (20-fold
difference in effective dose among Caucasian patients)
Can trigger fatal haemorrhaging if dose too high, or stroke
if dose too low
Warfarin is carefully monitored via regular INR
(International Normalised Ratio) measurements (of blood
Narrow therapeutic window - INR needs to be between 2.0-3.9 Hylek 2003
- dose can vary between 0.5 - 7mg.
There is interindividual variability in dose requirement for one population
Many drug-food interactions e.g. mango/fish oil
40% of dose-variability can be explained with
Splitting ppts into genotype by a point-of-care test significantly improved their outcomes (by 10%) (Pinmohamed 2013) now in 2017 guidelines.
2013 Kimmel trial - didn't work, but then didn't start until 3-5 days after drug taking.
Azathioprine - Anti-inflammatory
Induces T-cell apoptosis
Used as an immunosuppressant agent (transplantation / inflammatory disease)
Is either broken down to inactive metabolites or active metabolites (homo/heterozygous for breaking down TPMT) (drug interactions can do the same thing)
Test is available but isn't used for half of ppts - and only account for 27% of myelosuppression but doesn't predict adverse events.
There are so many other factors involved.
Treated by Methotrexate
5% toxicity in withdrawal
IL36RN mutations is different psoriasis
Half of melanomas have an active mutation in the BRAF gene.
Drug pathway took 2 years through speciftiy
There is a specific change in melanoma tumours which can be used to identify specific drug targets and have a greater effect.
High risk of muscular myopathy (20%)
2008 GWAS found significant SNP with only 85 cases.
SLCO1B1 - regulates hepatic uptake of statins
Odds ratio of myopathy is 16.9
Carbamazepine in Hong Kong
Can induce TEN in 15% Asians
2011 HK introduced mandatory screening for HLA0B*15:02
BUT prescriptions just dropped and gave different drugs which gave lower ADRs but then increased ADRs for other drugs.
Better outcomes for patient
More cost effective
Genotype use - prediction for therapeutic response
RNA/protein expression - can change - good for monitoring response
SHOULD BE PREEMPTIVE
eQTLs - Expression Quantitative Trait Loci
Levels of gene expression are highly heritable
Sequence variants that influence gene expression are known as eQTLs -
they can locate regulatory DNA
Many causal variants may be exerting their effects by
altering the expression levels of a gene
GWAS signals are enriched for eQTLs in a tissue-specific or cell type-specific manner
Key is analysis in physiologically relevant cell types, and in cells exposed to relevant stimuli
e/p/mQTLs can be better used to understand casual genes
Does the SNP significantly affect the expression?
Does the effect of the SNP vary in diff tissues - YES
Different treatments and time can also change expression
Important for immune disorders
Most cis-eQTLS are about 100kb upstream (not far)
Some are MASTER REGULATORS and can have an effect ACROSS THE ENTIRE GENOME (but these are hard to find)
Different splicing can change eQTLs
eQTL - SORT1 and risk of myocardial infarction
Variant creates binding site for C/EBP with regulates OSRT1
Variant upregulates SORT1 in the liver - reduces risk of MI
Variant downregulates SORT1 in liver - increases risk of MI
SNP WASN'T IN CAUSAL GENE - 40 KB AWAY
rs4731702 effect on KLF14 a master trans-regulator - also associated with T2D and HDL cholesterol
To use GWAS you have to convert SNP info into gene info into pathway info
most SNPs are found in NON-coding regions
may not even be close in location -> therefore you can't really gauge the effect of the SNP
IMPORTANT TO UNDERSTAND WHAT THE NON-CODING BPS DO. - Same number of coding genes but different in regulation.
PATHWAY ANALYSIS - interactions of biological processes - essential tool to understand biology
Can be very simple or very complicated
Free online database resource
Curated resource of core pathways an reactions
Use human data generally but mouse/other animal models is used to infer
Curated to maintain truth
Focus on reactions:
Largest and best known database
Metabolic networks - provide energy and materials
Signalling networks - sense the outside/coordinate activities within and between cells.
Regulatory networks - control processes/ set limits/ control molecular composition of cells.
SCALE OF PATHWAYS:
Early pathways individually pieced together by studying biochemical reactions and systematically encode them.
New technology allows the measurement of thousands of different molecules
Many biochem pathways are connected - biological networks
Pictures are incomplete and can omit details/truth, BUT are very useful to understand network.
There's no way to understand the direction of proteins and traits - use pathway analysis
Finding gene significantly associated in pathway and whether they are up or down-regulated (hypergeometric test - similar to chi square).
Create gene sets
- USE GWAS DATA - PATHWAY ASSOCIATION - DRUG TARGET
Creation of gene sets
Have a source of info - protein-protein interaction/co-expression
Need young brains to understand psychiatric pathways - before tainted by medication
Need to map GWAS locus to gene
See if it's a good drug target after
Gene expression can give general pathology
Understanding of etiopathogenesis
use metanalysis for SNPs in gene
MUST CORRECT FOR LD
Derive a gene-wide statistic
Assess gene set individually or treat a pathway as a whole gene set
SNP analysis - Take GWAS for single SNPs
Gene-based analysis - SNP-set analysis with gene as unit
Gene-set analysis - SNP-set analysis with sets of genes as unit of analysis
Targeted gene-set pathways
All known gene-set pathways
INDIRECT: map drug features on to network of known genes.
DIRECT: set of genes with proteins with known effect of drugs.
HOW MUCH MORE EFFECTIVE IS ONE GENE SET THAN ANOTHER? Rank them.
1/4 people will have a major mental health event or chronic illness
Heritability of psychiatric disorders is very high e.g. schizo, Autism, bipolar.... BUT TWINS SHARE VERY SIMILAR ENVIRONMENTS WHICH COULD EFFECT THIS.
Rare mutations don't play a major role - genetic loading more important
most work done in SNPs but there's much more in terms of CNVs, insertions/deletions/ variable number tandem repeats)
People studied genetics of height to understand the complex traits. (But also height is very easy to measure).
Hard to define mental disorders
250k of schizo need to find most variants
Common mind consortium found 20 hits from over 100 and found evidence of function of expression in schizo and bipolar.
1 mil for MDD cases (but common disease so it's fine)
Understanding disease mechanism (functional organisation of the genome)
3 bil bps which define the products of life - variation enables diversity
HOW TO FIND FUNCTIONAL ELEMENTS?
Identify everything involved in:
transcription factor association
where they bind
how they've been modified
80% of human genome is associated with at least one biochem function
Protein coding genes start with a methionine codon
Continue in frame (3 bp codons)
End with a stop codon
Interupted by splice sites (which have a conserved sequence)
it is relatively straightforward to computationally identify regions of the genome that are consistent with gene models
BUT Low accuracy, since complexity from splicing leads to many false positives
vertebrate gene annotation relies on computational alignment of transcriptome data to genome sequence to define genes.
High accuracy, but false negatives since transcriptomics data incomplete and can be limiting. - could be missing certain transcripts in certain cell types at certain times e.g. foetal growth.
Gene models can be checked manually and through experimental validation to improve accuracy
FOUND 20K coding genes, 16K non-coding genes.
Functional annotation of the human genome
Using databases like encode to find a suggested functional SNP from the lead SNP
Testable hypothesis of the biological mechanism underlying the observed association (knock-out)
OTHER FUNCTIONAL ANNOTATIONS
DNAasel hypersensitivity sites:
Open chromatin is transcriptionally active and sensitive to DNAsel
Such regions can be assess through enzyme digestion with DNAasel to sequence the resulting fragments
ATAC-sequencing is starting to replace this though.
Addition of methyl group to the 5' of cytosine nucleotides
Broadly asssociated with transcriptional silencing at promotors and transcriptional activity within genes
Assessed by reduced representation bisulfite sequencing (RRBS) in the encode project
Long range interactions between distant chromosomal regions
Uses chromatin immunoprecipitation and parallel sequencing to locate genome-wide protein-DNA binding events.
Proteins touching DNA are fixed in place with a cross-linking agent.
DNA is fragmented and complexes are harvested with targetted antibodies.
Cross-links are broken and only DNA fragments from binding sites remain which are sent for sequencing.
Mapping of the sequence reads back to the genome and defines loci where the antibody targeted protein is bound.
Explains the functional role.
RNA-Sequencing - Transcriptomics
2nd generation sequencing to catalog RNA in cell
The reads are re-alligned
Number of reads for each exon is proportional to the number of copies in each cell
Identify RNA transcripts
Identify regions of the genome that bind proteins
Identify regions of the genome that do not bind proteins
INTERSPECIES SEQUENCE COMPARISON
Many functional regions of the genome are likely to be conserved across evolution
Comparisons of genomic sequences will identify such regions
Functional regions may contain sequence motifs that define their function
Identification of such motifs can inform this
EXPERIMENTAL -> TRANSCRITION APPROACHES
Nucleosome = group of histones. DNA is wrapped around nucleosome.
Hard to find everything in humans - could be missing micro-exons.
99% of the genome is within 1.7 KB of a biochemical event
PREDICTION AND RISK MODELLING
CALCULATING POLYGENIC RISK SCORE
Using data from GWAS, create a TARGET STUDY:
Independent of GWAS
Construct polygenic risk scores for individuals
PRS = Sum of (risk allele SNPs * log odds ratio)
Individual-level measures of genetic loading for disorder or trait
(BUT WE DON'T KNOW IF THE SNPS ARE FUNCTIONAL, JUST ASSOCIATED... BUT THERE'S HIGH LD, IF YOU HAVE THE LEAD SNP YOU PROBS HAVE THE FUNCTIONAL ONE TOO)
Uses a log scale as it creates a better symmetry of risk (0-1,1+) rather than <0 and >0
It's used for individuals but you can compare to the population and screen those at highest risk.
FAMILY HISTORY INCLUDES GENETIC AND ENVIRONMENT
Talmud 2010 - using genetics with non-genetic risk factors to predict diabetes
Used the Framingham risk scores (odd that the risk score doesn't include physical activity)
Used 20 genetic risk factors (0-40 risk alleles)
5.5K people, 303 developed T2D in 10 years
use baseline info and genetics
Without adjusting for weighting of SNP - FOUND NO SIGNIFICANT RISK
SNPs only adjust account for 10% BUT MZ IS 70%
SNPs might not be the causal variants
Need larger sample sizes
Rare variants not yet tested
Resilience not tested
Physical activity not accounted for
GENE-ENVIRONMENT NOT INCLUDED
77 Variants for polygenic breast cancer (NOT ABOUT THE MONOGENIC FORM BRAC1)
Bottom quintile have lifetime risk of 5.2%
Top quintile have lifetime16.6% risk
Top 1% have 30% risk
Women aged 47-73 every three years
Aimed for when risk reaches over 2.5%
There are often follow-up tests for benign biopsies
Women at high risk from family history have a separate programme
Using genetic screening could help find the people at higher risk earlier and reduce wasted time and cost for those at lower risk
Coronary Heart Disease
Do genetic and environmental risk factors increase risk of coronary artery disease events?
Khera et al 2016
52 Variants for CHD
Healthy lifestyle score (smoking / PA / diet)
Those with low genetic risk + low environmental risk = 1, those with high for both had 3.5 BUT NO INTERECTION EFFECT - CUMMULATIVE
23andMe (reputable end of the market)