Please enable JavaScript.
Coggle requires JavaScript to display documents.
2016 - De novo transcriptomic analysis of Chlorella sorokiniana reveals…
2016 - De novo transcriptomic analysis of Chlorella sorokiniana reveals differential genes expression in photosynthetic carbon fixation and lipid production
6 cultivations conditions:
Nitrogen-limited: A, B, E, F
Nitrogen-replete: C, D, E
Glucose: A, B, C, D, E
CO2: F
37C: A, B, C, D, E
Room temperature: F
Light: F
Dark: E
220 rpm: A, B, C, D, E
0 rpm: F
Sequencing and de novo assembly
llumina Hiseq2000 paired-end sequencing
244,291,069 raw reads were generated and are available at the NCBI SRA database
All the raw reads were subjected to trimming based on base
quality score and read length
, and 229,288,757 clean reads were generated (Additional files 1 and 2), which were de novo assembled into 72,902 contigs with N50 of 2,502 bp.
After clustered, 63,811
non-redundant contigs
, ranging from 200 bp to 15,932 bp, were generated with an average length of 1,022 bp (Fig. 2a, Additional file 3), which was used for the following analysis.
The Transcriptome Short Assembly project has been
deposited
at DDBJ/EMBL/GenBank under the accession GAPD00000000
Samples SRA accession number: Sequence Read Archive (SRA) data, available through multiple cloud providers and NCBI servers, is the largest publicly available repository of high throughput sequencing data. The archive accepts data from all branches of life as well as metagenomic and environmental surveys. SRA stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries
Sample A: SRX352462
Sample B: SRX354137
Sample C: SRX354139
Sample D: SRX354141
Sample E: SRX354143
Sample F: SRX354142
Annotation of contigs
After compared against the NCBI’s Nr database using Blastx, 23,496 contigs (36.8 % of total contigs) were found having homologous sequence in Nr database (Fig. 3, Additional file 4).
Due to the lack of genome information, a large proportion of the contigs (40298, 63.2 %) could not be matched to homologous sequence in any database, among which 10,471 potential coding regions were predicted using Transdecoder (Additional file 5). These predicted coding regions seem to be new genes, and their functions should be further confirmed
EC number and KO identifier were also assigned from the annotation results of KEGG, and 2,789 contigs were assigned with EC number (Fig. 3, Additional file 4).
There were 2,371 contigs which were all matched with homologous sequences in all the databases used (Fig. 3).
Most genes of C. sorokiniana were in the range of 500 bp and 2500 bp.
The homologous sequences matched in Nr came from closely related green microalgae species, including:
C. variabilis (80.24 % of all annotated contigs)
Coccomyxa subellip- soidea C-169 (6.86 %)
Volvox carteri f. nagariensis (2.03 %) (Fig. 2c),
Based on which we selected Chlorella sp. NC64A as the candidate for predicting transcription factors.
Function classification and Transcription factor analysis
The most abundant transcription factor family was SBP family related to flower development in plant [23].
Nitrogen-limited condition:
20 different transcription factors were found at least 2-fold up-regulated
4 different transcription factors were at least 2-fold down-regulated in nitrogen-limited condition.
Light condition:
12 transcription factors were at least 2-fold up-regulated
17 transcription factors were at least 2-fold down-regulated
Dof-type transcription factor and bHLH family have the function of regulating lipid accumulation in plants [25–27].
In this study, two transcription factors (IGS.gm_27_00071 and IGS.gm_8_00085) in bHLH family were identified and found both up-regulated in nitrogen-limited condition
Genes expression quantification
FPKM
Comparing Sample A (nitrogen-limited condition, 48 h) to Sample C (nitrogen-replete condition, 48 h):
533 genes were at least 2-fold up-regulated and 219 genes were at least 2-fold down-regulated in nitrogen-limited condition
To determine the gene expression abundance
, high quality reads from each condition were mapped to the non- redundant contigs to calculate the FPKM value [35] using the RSEM (v1.2.7)
Methods
C. sorokiniana (UTEX 1602)
Cultivation: Kuhl medium
de novo assembly method
The 100 bp paired-end raw reads generated from Illumina Hiseq2000 were analyzed by FastQC tool (v0.10.1) [37] for quality assessment and preprocessed using Python scripts (Additional file 8), including:
(a) remove low quality bases with Phred score < 20,
(b) remove ambiguous base ‘N’,
(c) discard short reads with length < 25 bp.
Followed by the high quality reads were de novo assembled using Trinity (v2.0.6) [38] with default parameters to construct contigs.
Final clustering of contigs were conducted using the Cluster Database at High Identity with Tolerance (CD-HIT) EST suits [39] with minimum similarity cut-off of 90 % to generate the non- redundant contigs used for the following analysis.
For the functional annotation, the non-redundant contigs were searched against with the NCBI's non- redundant (Nr) database and Clusters of Orthologous Groups (COG) database [40, 41] using Blastx algorithm [42] with E-value ≤ 10−5 and 10−10, respectively, and other default parameters.
Basltx, Nr, COG, KEGG, KAAS, EC number, PlnTFDB (Plant Transcription Factor Database )
Due to
the lack of biological replicates,
we selected genes whose FPKM value was greater than 0 in all six conditions to study the dif- ferential expression and genes with the change of FPKM value greater than 2-fold in comparison of two different conditions were identified as differential expression.