Please enable JavaScript.
Coggle requires JavaScript to display documents.
How to deal with missing values? (methods (evaluation criteria (Supervised…
How to deal with missing values?
data
Why are certain values missing? Rubin DB (1976). "Inference and missing data"
missing-comppletey-at-random (MCAR
occurs completely random
analysis of MCAR data is unbiased but MCAR is generally rare
missingness is independent from both observable variables and unobservable parameters
f(r|y_obs,y_mis,0) = f(r|0)
not-missing-at-random(NMAR)
aka nonignorable nonresponse
values of the variable that is missing is related to the reason why it is missing
EXAMPLE: men fail to fill in depresion survey BECAUSE of their level of depression
missing-at-random(MAR)
missingness can be fully accounted for by observed variables with complete information (not missing variables)
not statistically to verify and can induce bias
EXAMPLE: males are less likely to fill in depression surveys but not because of there depression level
f(r|y_obs,y_miss,0) = f(r|y_obs,0)
MAR values should not be spatially concentrated
test for underlying missingness mechanism
Colubri A, Silver T, Fradet T, Retzepi K, Fry B, Sabeti P (2016)
MCAR test
Little RJA. A Test of Missing Completely at Random for Multivariate Data with Missing Values. Journal of the American Statistical Association. 1988;83(404):1198–202.
Jamshidian M, Jalal S. Tests of homoscedasticity, normality, and missing completely at random for incomplete multivariate data. Psychometrika. 2010;75(4):649–74. pmid:21720450
methods
evaluation criteria
Root mean square error (RMSE)
e.g. hierachical clustering with pearson correlation
Unsupervised classification error(UCE)
assessing preservation of internal structures by measuring how clustering for imputed/compete data set differs --> misclassified samples/all samples
Schmitt P, Mandel J, Guedj M (2015) A Comparison of Six Methods for
Missing Data Imputation.
Supervised classification error (SCE)
comparison of predicting subgroups in complete/imputed data set
EXAMPE: linear discriminat analysis(LDA) and then compare SCE=1-AUC
Schmitt P, Mandel J, Guedj M (2015) A Comparison of Six Methods for
Missing Data Imputation.
bias, coverage, width of the confidence interval, and estimated proportion
of the variance attributable to the missing data (Doove,van Buuren, recursive partinioning for missing data imputation, 2013)
imputing missing values
Machine Learning based approaches
Maximum Likelihood approach (Ibrahim et al. (2005))
naive bayesian classifier (Lin,Haug,2008)
unsupervised Bayesian clustering (Lakshminarayan,1996)
EM algorithm
k-NN (jerez,melina,2010)
Liu, Y.; Gopalakrishnan, V. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.
ANNs (Seffens et al. Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study. Bioinformatics and Biology Insights 2015:) (Richman,Trafalis,2009)
Extreme Learning Machine (Sovilj,Eirora,2015) (Huang,Liu,Yu,DE19250500000199961566 )
Multi-layer perceptron (jerez,melina,2010)
Decision Trees based
missForest (Stekhoven,Bühlmann, 2012) (Luo,2016) (Shah, Bartlett,2012)
DT (J48) Rahman, M. M. and Davis, D. N. (2013) “Machine Learning-Based Missing Value Imputation Method for Clinical Datasets”
C4.5 (Lakshminarayan,1996)
Liu, Y.; Gopalakrishnan, V. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.
Fuzzy unordered Rule Induction Algorithm Rahman, M. M. and Davis, D. N. (2013) “Machine Learning-Based Missing Value Imputation Method for Clinical Datasets”
SVM (Richmann,Trafalis, 2009) Rahman, M. M. and Davis, D. N. (2013) “Machine Learning-Based Missing Value Imputation Method for Clinical Datasets”
Self organizing maps (jerez,melina,2010)
Liu, Y.; Gopalakrishnan, V. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.
classical statistical approaches
standard polynomial regression
stochastic regression including error term
hot/cold-deck imputation (jerez,melina,2010)
mean/median imputation Rahman, M. M. and Davis, D. N. (2013) “Machine Learning-Based Missing Value Imputation Method for Clinical Datasets”
Multivariate Imputation with chained equations (Luo,2016) (Shah, Bartlett,2012)
Amelia II,hmisc, MICE :(Colubri A, Silver T, Fradet T, Retzepi K, Fry B, Sabeti P (2016)
listwise-/pairwise deletion (case deletion)
explicit modeling of "missingness" instead of imputation (Lin,Haug,2008)
feature selection
first prepossing using Mirador;Mine and then subsets of variables as inputs
: Derksen S, Keselman HJ. Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology. 1992;45(2):265–82. doi: 10.1111/j.2044-8317.1992.tb00992.x.
modeling of underlying distribution
Gaussian Mixture Model (Sovilj,Eirola,2015)