Please enable JavaScript.
Coggle requires JavaScript to display documents.
ML (Learn about data (Dimensions & attributes (nb of variable : ncol,…
ML
Learn about data
-
-
-
Missing data
is.na() return TRUE = 1, False =0
sum(is.na())
-
Basic descriptive Stats
Goal : describe main features of numerical and categorical infos with simple summaries.
they can be presented with single numeric measure, frequency distribution or summary tables
numerical data
-
-
-
percentiles
fivenum() provides min, 25%, 50%, 75%,max
quantile() 0%, 25%, 50%, 75%; 100%
custom :red_flag: quantile(mtcars$mpg, probs = seq(from = 0, to = 1, by = .1))
-
-
Outliers
Distort predictions, affect accuracy
if outliers present, which obsers are outliers
outlier() gets the most extreme observ from mean
scores() computes the normalized (z,t,chisq, etc...) score to find observ lie beyond a given value:
observations that are outliers based on z-scores greater than 2. In other words, these observations exceed 2 standard deviations from the mean.
z_scores <- scores(mtcars$mpg, type = "z")
which(abs(z_scores) > 2)
outliers based on values less than or greater than the "whiskers" on a
boxplot (1.5 x IQR or more below 1st quartile or above 3rd quartile)
which(scores(mtcars$mpg, type = "iqr", lim = 1.5))