Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 14: Feature Understanding and Selection (Descriptive Statistics…
Chapter 14: Feature Understanding and Selection
Descriptive Statistics
Var Type
Unique
Missing
Mean
Std. Deviation
Minimum
Maximum
*Any feature can be clicked for more details
*Use histograms
Notice Cutoffs between each value
()
Exclusive Range: doesn't include number next to (
[]
inlcusive range -- includes number next to [
Data Robot uses [1,3) --> 1,2
Data Types
Categorical
Boolean: True of False
Numerical
Carefully organized data that is standardized
Improve pattern detection
Be able to view features beyond index 50
Convert Measurements to standardized numbers
DataRobot examines whether auto-generated feature already exists and does not generate a new feature
Evaluations of Feature Content
DataRobot ignores features with min. unique values
Missing Values
Missing Column
Shows how many values are missing from specific feature
? is coded as missing call by dataRobot
Convert them all to one consistent type:
Analyst can avowing being categorized as must. unique values
avoids treating missing value as text
Algorithms that struggle with missing values
regression
neural networks
support vector machines
Nulls
Nulls are values DataRobot converts to nulls
Other Codes for Missing
N/A, na, n/a, #N/A
inf, Inf, INF
Empty fields
DETECT ERRORS EARLY ON