Feature Understanding and Selection (Descriptive Statistics (use the…
Feature Understanding and Selection
use the index number to specify features
unique feature shows how many unique values are in the data
feature name shows details about each feature when clicked on
each feature will have unique values shown in a bar chart
next to unique is information like mean, standard deviation, median, minimum, maximum
data will only be presented in the 3rd position
the data type indicates the data instead of a feature
variables only have two categories (binary) but then multi-class categories as well
common data type is numeric
any number, integer or decimal
some categorical features are seen as numeric features by ML tools
boolean is a type of categorical feature
always results in true or false
last data type is text type
1 more item...
Evaluations of Feature Content
if something has been noted as a unique identifier than the system will not use it to predict the target
some features look identical and it is hard to tell if they are or not
AutoML does not know which feature to remove correctly and then the results could get messed up
a question mark is coded as a missing value in DataRobot
there are many ways to deal with missing values so treat it as a case-to-case situation
convert them to one type of missing value so they aren't seen as unique
some algorithms will ignore missing values which could ruin results
other missing values are coded as "nulls"
sometimes when joining data you can get nulls in the data
if a value is non applicable then they will be stored as "N/A"
some equations that result in "infinity" that will be coded as "inf"