Please enable JavaScript.
Coggle requires JavaScript to display documents.
Alternative empirical Bayes models (Methods (Dataset description (Pathway…
Alternative empirical Bayes models
Abstract
Combining datasets beneficial
Processing / reagents batches, experimenters, protocols, profiling platforms
Confound true biological relationship
May lead to spurious results
New batch compensation methods
Less severe
Reference-based
Background
Combining studies
Often wanted, but difficult
Old approaches
:mag: Singular Value Decomposition (SVD)
:mag: Machine learning classification methods (DWD)
Supervised methods
:mag: Block linear models (XPN)
Supervised
New approaches, even for the unbalanced
ComBat
Control probes (RUV)
Unsupervised data decomposition (SVA)
Howevers
ComBat removes batch effects impacting both means and variances of each genes
If one batch superior - Beneficial to use that as reference
Set bias - Samples are influenced on who they are processed together with
ComBat influence variance - influencing test statistics
Methods
ComBat
Model: Y_ijg = alpha_g + X_ij
Beta_g + gamma_ig + delta_ig
epsilon_ijg
alpha_g: Overall gene expression
X_ij: Known design matrix for sample conditions
beta_g: Vector of regression coefficients for X
gamma_ig: additiven batch effects of batch i for gene g, with impact on mean and variance of genes within batch i
epsilon_ijg: Error terms, assumed to follow normal distribution
delta_ig: Multiplicative batch effects
Empirical Bayes
Assumes parametric or nonparametric hierarchical Bayesian priors
Estimates the parameters
Shrink towards overall mean of estimates
Moment-based diagnostics
Gene-level movement
Check hyper-movements within each gene
Compare movement across batches
:question: More specifically following ComBat hierarchical model assumption,
by first estimating parameters drawn from hyper-distribution
Robust F-test
Testing with many parameters- Artificially low p-values
Modification: Add variance inflation factor in F statistics
:question: In particular useful for gene-level test - Degree of freedom: Number of genes times number of batches
Overview
Sometimes we need more or less complex batch compensation
:bulb: We can standardize data and estimate its shape: Z_ijg = Y_ijg - alpha_g - X_ij * beta_g / sigma_g
We assume that all genes mean, variance, skewness and kurtosis have same origin
Testing distribution either on sample or gene level
Gene-level based more powerful when many samples present
Sample-level not applicable for quantile normalization
Sample level movement
Process
Summarize sample movements
Conduct standard or
robust
F-test
:bulb: So: Kind of testing whether there are different types of movements present?
Example: For mean
Calculate mean gene expression for each sample
Then F-test for gamma-values between batches
Mean-only batch adjustment
If only mean variance present
If variance variation is expected
Reference batch adjustment
Y_ijg = alpha_rg + X_ij
Beta_rg + gamma_rig + delta_rig
epsilon_ijg
alpha_rg: Average gene expression in chosen batch
Software implementation
Dataset description
Pathway simulation
Bladder cancer
Nitric oxide
Oncogenic signature
Lung cancer
Overview
Evaluation procedure
Mean, variance, skewness, kurtosis
Solution for when harmonizing mean is enough
Reference batch: Allow your training sample to stay constant
Results
Moments-based tests of significance for batch effect
Mean-only batch adjustment
Selecting appropriate ComBat for each dataset
Higher order moment-based batch adjustment
Batch adjustment based on a reference batch
Simulation study
EGFR signature and drug prediction
Discussion
Reference approaches
Conclusions