Please enable JavaScript.
Coggle requires JavaScript to display documents.
Missing Data: Basics Vol 1 - Coggle Diagram
Missing Data: Basics Vol 1
soruces of missing values
by type
unpurposely missing
forgot to input
technical extraction
error
purposely missing
e.g
no child is treated as NaN instead of 0
by severity
unit nonresponse
completely no data in a row, but still recorded
e.g
asking kindergarden data science questions. they don't answer
item nonresponse
missing data in some columns only
caused by partially information given in extraction, etc
missing value types
standard missing value
null values that pandas could detect
"NA" and blank space
non-standard missing value
null values that pandas could not detect
ways to overcome
play with na_values in pd.read_csv
"N/A", etc
unexpected missing value
out of context value
e.g
float in a boolean column
analogy
mekeengan
ways to overcome
manual loop
imputation methods
deductive
using logical correlation
pro
accurate
no inference needed
cons
time consuming
specific programming
regression
using regression line to impute
cons
distorts histogram
underestimates variance
often inaccurate
pro
simple
better than mean/median/mode
stochastic regression
using regression line + random variable to influence
pro
simple
better true variance
cons
still underestimates variance
hot-deck
donor
a full data
gives value to recipient
recipient
a missing data
receives value from donor
methods
random hot deck
deterministic hot deck
KNN, but not AVG