Comparative genomics via rRNA operons and their neighbor genes
Working process
1.Get database table
DATABASE TABLE
complete_genome
_Biosample_GCF_FTP.txt
.Biosample.txt
.Nucinfo.txt
click to edit
_Biosample_SRS.txt
SRSID.txt
_SRS_SRR_Sequencer.txt
_Biosample_NucID_Genome.txt
2. Filter genome
click to edit
seq by
Illumina
PacBio
3. Create artificial read
4. Assembly with short read
5. Find gap pattern and flanking region in same bact
SPades ?
label gap assembly
complete genome
6. Identify flanking of conserved repeat
Can be used for identification of bact ?
neighbor
conserve ?
position
base
number
same pattern in four different bacteria?
Genome database
Extracted Biosample ID
Nucleotide ID
Assembly ID
RSA run
nanopore ?
gene annotation
functional genomics studies
isolate individual strains
catch small plasmid?
correct resolution of all large plasmid sequences
Background and objective
Previous study
generate read
Gap
rRNA operon
Cas9
tRNA ( 16s)
transposable elements
genes
tree ?
resistance gene ?
mutation rate -> evolution
sequencer
SRA run
type of genomic
high sequence identity
high copy number
strongly affect genome function and evolution
large phage-mediated repeats
segmental duplications or large tandem arrays
16s
microbial community diversity
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies --> identify and verify gap from long and short reads
finishing using
super assembly
supporting Illumina data
to obtain high-quality genome assembly
long read
post-assembly polishing steps
gap closure strategies
indentify gap from complete genome of PacBio
GC content
all gap are similarly
gap length
longer than Illumina read length
read coverage
lower than recommend coverage (>100x)
ability to form strong secondary structures
randomly distributed gap -> rejected
corresponding annotations
active transposon
interfere circulation process
multi rRNA operon
phage integration
mageplasmid
Transposon-related proteins
1 kb flanking -> self blast
blast to several region -> > 95% similarlity > repetitive DNA sequences contribute gap assembly
RiboFR-Seq: a novel approach to linking 16S rRNA amplicon profiles to metagenomes -> combination of gap and neighbor region to provide consensus classification
RiboFR-Seq (Ribosomal RNA gene flanking region sequencing)
Advantages
limitation
short 16s RNA (shorter than recognition site) might be miss cut -> fail
can correct errors in traditional 16S rRNA based taxonomic classification
required much less memory
long runtime
classification by clustering the non-ribosomal reads of BRPs
provide 16s copy number more accurate than rrnDB database
unbiasedly classify 16S amplicons and metagenomic contigs.
short bridge -> multiple alignments