Please enable JavaScript.
Coggle requires JavaScript to display documents.
Week 3 Bioinformatics tools for WGS analysis of infectious diseases and…
Week 3 Bioinformatics tools for WGS analysis of infectious diseases and Kmerfinder
KmerFinder
K is any positive integer or number
Sequences with a high similarity must share k-mers
A k-mer is a contiguous sequence of K bases
Unknown Sequence is aligned up against a database where the k-mers are used in order to find the one that matches the most
Using almost all information in the WGS data
16 mers database where the sequence is run in 16 lmers and then it finds the bacterias that matches the most
Query coverage = Score (total number of kmers in query sequence that match kmers in template sequence) / (Divided with )
Total number of kmers in query sequence
Depth
= Score (total number of kmers in query sequence that
match kmers in template sequence)
Divided with
Total number of kmers in template sequence (database
seqeunce)
Template coverage
= Score (total number of kmers in template sequence that
match kmers in query sequence)
(Divided with )
Total number of kmers in template sequence (database
seqeunce)
The outcome of the result using the website to un your sample
tot_query_coveragge
tot_coverage
Depth
tot_depth
Queary_coverage
Q_value
Template length
p_value
Expected
Accession number
Score
description
Num
TAXID
Assembly
Taxonomy
Template
TAXID species
Species
PlasmidFinder
Most of the known plasmid have been identified because they confer phenotypes that are subject to positive selection on bacterial host such as the presence of antimicrobial resistance or virulence genes
it is important not only to study the molecular epidemiology of different bacterial clones but also study and understand the molecular epidemiology of transferable plasmids
Plasmid are double-stranded circular or linear DNA molecules. they can replicate and transfer between different bacterial speies and clones
For this specific purpose, plasmid typing are needed
Hieratical typing strategy
Common genes PMLST
Common resistant gens resfinder
Replicon - plasmidfinder
identical plasmid sequence WGS alignment
With an unknown sequence upload it into the tool and the match it up with already known plasmid finder replicons and then it will display the matched replicons that are found in the databse
ResFinder
It is a toll and a database
ative and updated regularly
Accesibility: both on website but also as standalone by using computerome
Approach: assembly-base and read based tool
ResFinder details
ResFinder will detect the presence of whole resistance genes, AND chromosomal point mutations causing resistance in the whole genome sequence data (raw reads or assembled genomes)
ResFinder 4.0 provides in silico antibiograms as reliable as those obtained by
phenotypic antimicrobial susceptibility testing
The ResFinder is a web-friendly interface and freely accessible tool (It is also a
stand-alone tool)
High concordance (>95%) between phenotypic and predicted antimicrobial susceptibility was observed. Discrepancies were mainly linked to criteria for interpretation of phenotypic tests and suboptimal sequence quality, and not to ResFinder 4.0 performance.
ResFinder is based on curated database, public databases as well as on scientific papers
ResFinder 4.0 contains 4 databases
Translation of genotypes into phenotypes
Species-specific panels for in silicon antibiograms
Chromosomal point mutation (PointFinder)
AMR genes (ResFinder)
you take raw reads or assembled genome load it in the tool and then the tool look at known resistances and using the raw reads to find the resistant genes in the sample you have uploaded
Outcome of the website
The light green colour indicates a warning due to a non-perfect match, % ID < 100%, alignment length = resistance gene length.
The grey colour indicates a warning due not being a perfect match, alignment length is shorter than resistance gene length %ID<= 100%
MLST typing (Multiocus sequencing typing) the golden standard for typing
The nucleotide sequence of internal regions of
app.7 housekeeping genes are determined by PCR
followed by Sanger sequencing
Different alleles are each assigned a random number
First developed in 1998 for neisseria menigitis
The unique combination of alleles is the sequence type (ST)
For most species, seven genes are used
Automatic, local download of MLST database once a week
For each species, we get an ST profile table file and an allele file
Implementation of MLST using WGS data
Method implementation of MLST
For each gene, the perfectly matching allele is picked (all nucleotides must
match across the whole length of the allele)
If there is no perfectly matching allele, the closest matching allele is outputtet
along with warnings
For the specified species, all alleles for all genes are aligned to the database (using blast for assembled genomes/ using KmerFinder (KMA) for raw reads)
The ST is determined from the combination of alleles
The genome is converted to a blast database
The vizuliazation of the result on hte webpage
Green = perfect match
Red = not a perfect match '
VirulenceFinder
Bacterial pathogenicity and virulence
Virulence
signifies the degree of pathogenicity of the given strain.
virulence, therefore is an index of the qualitative individual nature of the pathogenic microorganism, basically it means how much trouble can it cause.
Pathogenicity
This is the potential capacity of the given species of microbes to cause infectious process
E.coli can be one example, the different types are not all equal virulent, some are worse than others. they all have different types of virulence factors
Types of virulence and non virulence
VTEC, STEC (EHEC): A hall-mark virulence factor is the Shiga toxin. This is encoded by the stx2A and the
stx2B genes.
EAEC: Many different virulence factors (especially aggregative adherence fimbriae (AAFs) located on a 100-kb pAA plasmid, mycolycins such as those encoded by the pic gene and toxins such as those encoded by the pet and the astA gene) are believed to be responsible for the EAEC phenotype. However, it has recently been found that the regulator encoded by the aggR gene also located on the pAA plasmid is coordinating the virulence factors. Therefore, detection of the aggR gene is a good marker for EAEC.
UPEC: Adhesion factors needed to avoid being flushed away from the urinary tract Also siderophore proteins
such as the one encoded by the iroN gene for iron chelation in urine can be relevant due to the iron-limiting environment in the bladder. Presence Pap (P) fimbriae (papG adhesion) can be a sign of increased virulence, as these fimbriae are associated with progression of a urinary tract infection into pyelonephritis (Dan:
Nyrebetændelse).
ETEC: This pathotype is known for its production of Heat-Stabile Toxin (ST) or Heat-Labile Toxin (LT). The former can be encoded by the sta1 or stb genes and the latter by the elt or ltcA genes.
Non-pathogenic: Few or no obvious virulence factors (you already have information regarding which this could
be).
Salmonella TypeFinder - Accept only FASTQ
SerotypeFinder
Advantages of WGS
By allowing the entire genome of a person to be sequenced, every gene can be turned into digital data for analysis. While this results in a large amount of data, the genetic variations also result in big opportunities.
Genomic information has been instrumental in identifying inherited disorders, characterizing the mutations that drive cancer progression, and tracking disease outbreaks.
Whole-genome sequencing (WGS) is a comprehensive method for analyzing entire genomes.
Disadvantages of WGS
Most physicians are not trained in how to interpret genomic data.
Once an organism's genome sequence has been determined, how do scientists generally start identifying all the genes within the genome?
WGS has difficulty with repeated sequences because there is no corresponding physical map.
Although our knowledge in genomics is growing, the roles of many genes are still undetermined and huge numbers of variants across the genome have not yet been distinguished as being benign or pathogenic
Center for Genomic Epidemiology
Client side
Server Side
K-Mers
Limitations: the computer may not have memory to store all the kmers
K-mer based method works well for species identification
One of the first issues that emerges when a bacterial organism of interest is
encountered is the question of what it is
That is which species it is
The 16S rRNA gene formed the basis of the first method for sequence-based
taxonomy
The 16S rRNA has been found to have a number of shortcomings
KmerFinder, which examines the number of cooccurring k-mers has the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets
gyrB gene, rMLST, species-specific functional protein domain profiles
MLST tool
Updates from the MLS databases are downloaded monthly
The best matching MLST alleles of the specified MLST scheme are found using Blast based ranking methods
We developed a web-based tool for MLST based on WGS data
The sequence type is then determined by the combination of alleles identified
The tool and how it works
then you use 7 allele types
when combining these two things then you will get the sequence type.
First you upload a unknown sequence with Blast
SNP detection
it happens in the four letters that our DNA consist off
often these mutations are refereed to as noise when looking at samples
The noise is actually due to errors and how much of the noise represents real genomic variation
Variant calling format (VCF)
Concatenated SNPs
.