Please enable JavaScript.
Coggle requires JavaScript to display documents.
SNPs, SNPs (Single Nucleotide Polymorphisms) - base difference - Coggle…
SNPs
SNPs (Single Nucleotide Polymorphisms) - base difference
Major Allele - SNP occurring in general population on an average
Minor Allele - SNP occurring in few individuals
Caused by random mutations passed on hereditarily
Documented for at least 5% of the population
Types
somatic SNPs
- leading to changes within an individual over a lifetime - such as tumours, cancers, etc.
germline SNPs
- leading to changes between individuals through phenotypic differences
Can occur in protein coding and regulatory regions of genes
Non synonymous change
- SNP leads to change to translated protein sequence
Synonymous change
- SNP leads to no change in translated protein sequence
Linkage
- when 2 or more SNPs get co inherited almost always due to lying close together on the same strand
Applications/Computational challenges
Variant Calling
- discovering new SNPs causing phenotypic differences
Genotyping
- identifying variants that exist in current database and have been characterised and hence grouping genomes based on the SNPs -
heterozygous or homozygous for a major/minor allele
Probabilistic Genotyping
Approaches to SNP discovery
Genome wide SNP/SNV microarrays
Probes of variant and reference oligo ssDNA sequences representing different genes attached to a surface in an order, to form microarray
Fragments of fluorescent tagged target ssDNA flown through array.
Strong flourescence detected in array positions where target DNA strongly bound with complementary DNA probes, indicating presence/absence of SNPs.
Lecture 8, slide 13; lecture 10 - slide 20-22
Cannot detect novel SNPs/SNVs
Genome wide SNP/SNV calling by DNA sequencing
WGS using NGS
Read Mapping to reference
Evalution using curated SNP databases
Tool for calling SNPs
Somatic sniper
Identifies SNPs between normal and tumour sequences, using Bayes' model.
Considers minimum mapping and base quality
Can do joint genotyping
Modelling assumptions
Ploidy
Clonality
Snippy - a straightforward way to run multiple alignment and variant calling (SNPs) on the command line (SNVs, indels and rearrangements)
Can detect both novel and known SNPs/SNVs
Lecture 10, slide 25-26
Database - dbSNP
Lecture 8, slide 15
Record the locations of SNPs in reference genomes
Record the phenotype change caused by SNP
Viewing SNPs
IGV
Lecture 8, slide 18
SNP can be misread due to
read errors and misaligned reads
MPILEUP -
SAMtools
Lecture 8, slide 20
SNP can be misread due to poor quality base calls in a read (
read errors
),
alignment errors
,
multi mapping reads
and
read coverage
affecting expected frequencies (homozygous, heterozygous)
Deterministic SNP genotyping
- If a SNP frequency is between 20% and 80%, the genotype of the SNP can be interpreted as heterozygous, else homozygous.
Probabilistic SNP genotyping
- Using Bayes posterior probability, taking into consideration common population alleles, errors, more likely allele changes, etc.
Detection characteristics
Sensitivity
Higher sensitivity =
lower false negatives
=
Out of all the actual variants, how many have been found?
Specificity
Higher specificity =
lower false positives
=
Out of all the false variants, how many of them have been rejected?
Confusion Matrix
Lecture 10, slide 29-30
False Discovery Rate (FDR)
Of all the variants reported, how many of them are actually true variants?
While viewing SAMtools output
Supplementary relates to reads that are split and the fragments map to different loci (usually not allowed).
Duplicate reads can arise from PCR and can be identified with certain tools (this will count the number with a duplicate flag).
The 'with itself and mate mapped' field only counts pairs where both the forward and reverse read are mapped.
The 'properly paired' field is more stringent and requires that both the forward and reverse read are mapped on the same contig, in the correct orientation and at an expected distance apart.
Secondary relates to multi-mapping reads.