Lecture 3: Transcriptomics

RNA-Sequencing protocol and implications for alignment

Gene expression quantification

Common goals of RNA-Seq based transcriptomics

Experiment design > RNA preparation > library preparation > sequence > analysis

Optional: cDNA conversion, fragmentation, amplification

biases due to cDNA library construction, sequencing, read alignment

direct and long read sequencing reduce biases

image

gene-level vs transcript-level expression counting

alignment- vs assembly-based transcript reconstruction

alignment vs pseudoalignment (of a read is simply a set of target sequences that the read is compatible with)

Expression normalization

normalize by a) length of transcript and b) total number of reads
-> reads / fragments per kilobase per million mapped reads RPKM / FPKM

Transcript quantification and reconstruction

if transcript structure is known: (pseudo-)alignment-based

no known transcripts available: optimization of transcript weights

Differential analysis of gene expression

Linear models, p-value, confounding factors, multiple testing correction, QQ-plot

Transcriptome = complete RNA content of a cell at a given time (varies over time unlike genome)