Please enable JavaScript.
Coggle requires JavaScript to display documents.
GENOME-WIDE ASSOCIATION STUDIES IN BACTERIA - Coggle Diagram
GENOME-WIDE ASSOCIATION STUDIES IN BACTERIA
definition: association analysis performed with a panel of polymorphic markers adequately spaced to capture most of the linkage disequilibrium information in the entire genome in the study population.
usually: 500k - 4m SNPs
population structure
does the affected or control group exhibit population stratification?
population stratification is when subpopulations exhibit allelic variation because of ancestry
can cause false positives in an association study if there are SNP differences in the case and control population structures
control for this artefact by testing control SNPs for general elevation in X^2 distribution between cases and controls
sometimes you get a gene that isnt driving the phenotype but is associated with a gene that is
linkage disequilibrium
organizes the genome into haplotype blocks
genes around positive adaptive genes are often co-inherited - they hitchhike on
nucleotides within a gene are more likely to be found together
physical proximity means that close genes are more likely to cross over together
genes that are found together are likely to transfer together
SNPs found together = linkage disequilibrium
campylobacter populations are structured by host
most common cause of bacterial gasterentoritis in first world countries
by looking at a population of campylobacter and which hosts are close to clinical hosts you can tell where the human caught it from
bottom up approach
starts with DNA sequence (genes) and tests the effect om the phenotype
more antiquated approach
top down apprach
starts with phenotype and associates it with particular genomic elements
GWAS approach
old vs new signals of adaptation
old signals
host associated clonal complexes
new signals
host associated mobile elements
when bacteria changes host it adapts to the environment it finds itself in
sometimes the whole gene brings in an adaptation, not just the allele
most important adaptations stick in the population
looking at SNPs in bacteria isn't as effective as looking in humans because of this - might miss the whole gene
association study method
sort Kmers and see which hosts some are more common in
look at hotspots of genetic variation plated on a chromosome - can find host associated variation
example
:
Campylobacter
survival through the food chain
investigating the function of candidate associated elements
deletion mutants have altered function
nuoK
is involved in NADH activity and switching from anaerobic to oxygen rich environment
mutants grow better in enhanced O2
example
:
Staphylococcus epidermidis
, an accidental pathogen?
an under reported pathogen
the most commonly cultured bacteria in clinical microbiology laboaratories are the coagulase-negative staphylococci (CoNS), especially
Staphylococcus epidermis
despite their importance as nosocomial pathogens (>40% of cultured isolates from cerebrospinal fluid or blood samples) not routinely surveyed
contrasting evolutionary models of infection and associated variation in genomic data
has clearly adapted to different niches - how has this happened?
pathogenic clones
specific pathogenic differences, all present in skin but only some can go into blood
true opportunistic pathogenicity
any can go onto any niche
divided genome
this is the true one
thing about movement of genes into different clones- HGT etc.
genes associated with infection are moving to different genetic backgrounds therefore can cause disease
sample and sequence populations - clear that pathogenic clones is not the case
do GWAS next, shows that there are determinants of disease and some strains have genetic variation associated with infection - not opportunistic pathogen
sampling program
infection strains are distributed across the tree
contain genomic determinants of pathogenicity
disease causing
s epidermis
are a pathogenic sub-population
found on everyones skin and nasal passages
one of most common organisms cultured in clinical microbiology
we have increasingly large data sets and need techniques to understand them
historically microbiology was bottom up
by following top down approach and factoring in population structure to model, we think about the tree
study design
- family based, case control, case-cohort design
cases
presumed to have a high prevalence of susceptibility alleles
ascertained for the phenotype of interest
focus on extreme cases?
controls
not ascertained for the phenotype
presumed to have a high prevalence of such susceptibility alleles
misclassification bias - eg. failure to identify latent diagnoses in the control group. leads to loss of power, especially for common traits eg. hypertension
significance of hits
contingency tables - fishers exact test
sum all probabilities for observed and all more extreme values with same marginal totals to compute probability of null hypothesis