Improving Variant Calls, Improving variant calls - Coggle Diagram
Improving Variant Calls
Improving variant calls
after variant calling
- based on quality metrics, etc.
and take only the consensus variants
Increase in specificity, but decrease in sensitivity
By filtering poor quality reads
Example - GATK protocol
Lecture 11, slide 4
before variant calling
Local Realignment around SNVs/indels
Why this is required?
True indels near the end of reads are usually not captured in alignment, because mismatches are cheaper at ends than indels. This leads to incorrect variant calling.
Reads are aligned one at a time
Can do local assembly
- sequence those parts again and reassemble
Local multiple realignment
- after reads are aligned, select sets of reads around indels (e.g. by referring to dBSNP) and do multiple alignment again.
Removing duplicate reads
Why do duplicates occur?
- due to camera/scanner, etc. reading a sequence cluster multiple times, like in Illumina
PCR amplification bias
- Some DNA fragments are amplified more than others (especially with short fragments)
remove reads that are of same length and map to the same location
Would lead to lower mistake removals if the reads are paired-end
when read depth is a measure of expression - like in RNA-seq
when sequencing is targeted with high depth for a small region
Base quality score recalibration (BQSR)
For a particular position in the sequence, if some reads forming the consensus are very different from other reads, then removing them improves quality score for that position.
For high read depth, the software automatically does this.
- chromosome unknown in database
Assigned contigs in the database whose chromosomal location is unknows
Mainly there to increase read mapping accuracy and decrease false positive variant calls
Guest lect. functional genomics, slide 3