Please enable JavaScript.
Coggle requires JavaScript to display documents.
Lecture 2: Analysis of genome variation - Coggle Diagram
Lecture 2: Analysis of genome variation
Why is looking at genome variation practically relevant?
Clinical diagnostics, personalization
research (e.g.: cancer, mapping, breeding)
Variant effect prediction interprets variants in a local mechanistic context & association studies try to relate the variant state to a measurable phenotype
How is genome variation defined and how can it be extracted from raw data?
Variant calling (e.g.: MAQ algorithm, haplotype caller of
GATK)
MAQ (mapping quality) maps shotgun reads to reference genome, producing a consensus sequence where each consensus genotype is associated with a phred quality score (error probability)
haplotype caller of GATK (genome analysis toolkit): whenever the program encounters a region showing signs of variation, it discards the existing mapping information and completely reassembles the reads in that region.
NGS produces short reads which have to be aligned to a reference genome (difficult due to ambiguity)
What are the quality criteria and representation formats of variation data?
Linkage blocks, phasing variants, germline vs. somatic variation
in somatic mutation calling, determining the allelic fraction (number of times a mutated base is observed, divided by the total number of times any base is observed at the locus) along with tumor purity and ploidy, can be used to estimate the required depth of coverage to detect mutations with a given power
terms
genotype calling:
determine the genotype for each individual at each site
Phasing:
determining which variants are from the same copy of a chromosome (in “cis”) and which are from different copies (in “trans”)
separate consensus sequence (calculated order of most frequent residues) into separate sequence strands to identify which variants occur together, or in phase
using short read data the output sequence is unphased (represented as a consensus sequence). for example, if you have two mutations, you phase the consensus strand by separating it into two separate identifiable strands and maybe see that one mutation occurs on each strand, or both on one.
How do we gain knowledge about the biological effect of these variants?
variant effect prediction: local mechanistic effect
missense, splice site, promotor variant
malignant, benign, VUS
GWAS, see slides for pro/con
Regression
Linear mixed models