Please enable JavaScript.
Coggle requires JavaScript to display documents.
Ch. 14 Feature Understanding and Selection (Evaluations of Feature Content…
Ch. 14 Feature Understanding and Selection
Descriptive Statistics
Unique Column
shows how many unique values exist for each feature
Expand Feature Details
option shown when feature name is clicked
shows a bar chart
additional bar for missing values is added
click feature again to close visualization
Standard Descriptive Stats
to the right of the "Unique" column
mean, std deviation, median, min, max
only available for numeric data
show data visualization by clicking on feature name
histogram
can change bin size with orange arrow
hovering over bars show number of rows and bin range
[] brackets show inclusive range
() parentheses show exclusive range
Data Types
the nature of the data inside of a feature
categorical, numeric, etc.
make sure that categorical data coded as numbers are categorical, not numeric
Boolean: true or false
Text data
categorical
things DataRobot can automatically do
detect currency with existence of $ sign
detect measurement length based on words like "inches" and "meters"
extract days of the week and month
does not autogenerate feature if it finds that the feature already exists (ex: Year)
Evaluations of Feature Content
unique identifiers marked with [Reference ID]
[Duplicate] tag
if several other features contain the exact same values
[Target] tag
when target feature is selected
[Few Values]
[Too Many Value]
applies to non-numeric columns
Missing Values
codes
? Question Mark
null, Null, NULL
N/A, na
inf, INF