Comparative genomics via rRNA operons and their neighbor genes

Working process

1.Get database table

DATABASE TABLE

complete_genome

_Biosample_GCF_FTP.txt

.Biosample.txt

.Nucinfo.txt

click to edit

_Biosample_SRS.txt

SRSID.txt

_SRS_SRR_Sequencer.txt

_Biosample_NucID_Genome.txt

2. Filter genome

click to edit

seq by

Illumina

PacBio

3. Create artificial read

4. Assembly with short read

5. Find gap pattern and flanking region in same bact

SPades ?

label gap assembly

complete genome

6. Identify flanking of conserved repeat

Can be used for identification of bact ?

neighbor

conserve ?

position

base

number

same pattern in four different bacteria?

Genome database

Extracted Biosample ID

Nucleotide ID

Assembly ID

RSA run

nanopore ?

gene annotation

functional genomics studies

isolate individual strains

catch small plasmid?

correct resolution of all large plasmid sequences

Background and objective

Previous study

generate read

Gap

rRNA operon

Cas9

tRNA ( 16s)

transposable elements

genes

tree ?

resistance gene ?

mutation rate -> evolution

sequencer

SRA run

type of genomic

high sequence identity

high copy number

strongly affect genome function and evolution

large phage-mediated repeats

segmental duplications or large tandem arrays

16s

microbial community diversity

A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies --> identify and verify gap from long and short reads

finishing using

super assembly

supporting Illumina data

to obtain high-quality genome assembly

long read

post-assembly polishing steps

gap closure strategies

indentify gap from complete genome of PacBio

GC content

all gap are similarly

gap length

longer than Illumina read length

read coverage

lower than recommend coverage (>100x)

ability to form strong secondary structures

randomly distributed gap -> rejected

corresponding annotations

active transposon

interfere circulation process

multi rRNA operon

phage integration

mageplasmid

Transposon-related proteins

1 kb flanking -> self blast

blast to several region -> > 95% similarlity > repetitive DNA sequences contribute gap assembly

RiboFR-Seq: a novel approach to linking 16S rRNA amplicon profiles to metagenomes -> combination of gap and neighbor region to provide consensus classification

RiboFR-Seq (Ribosomal RNA gene flanking region sequencing)

Advantages

limitation

short 16s RNA (shorter than recognition site) might be miss cut -> fail

can correct errors in traditional 16S rRNA based taxonomic classification

required much less memory

long runtime

classification by clustering the non-ribosomal reads of BRPs

provide 16s copy number more accurate than rrnDB database

unbiasedly classify 16S amplicons and metagenomic contigs.

short bridge -> multiple alignments