Please enable JavaScript.
Coggle requires JavaScript to display documents.
Feature Understanding and Selection (Descriptive Statistics (Feature Name:…
Feature Understanding and Selection
Descriptive Statistics
Feature Name: features listed in rows under feature name header
any feature name can be clicked on to display more details
Index: the order the data was read into DataRobot
used to specify which feature is being discussed
Unique: notes how many unique values exist for each feature
mean, st. dev, median, min, max: displayed for data which is numeric
brackets [ ] indicate an inclusive range, parentheses ( ) indicate an exclusive range
Data Types in DataRobot
indicates the nature of the data inside a feature
categorical data
binary categorical
multi-class categorical
numeric data
can be any type of number, including decimals and integers
always scrutinize the data types datarobot automatically assigns - can be wrong
boolean
categorical, but only ever holds one of two values
text type data
can also identify and tag currencies by unique currency symbols
datarobot detects measurements as well
datarobot proficient with extracting dates
Evaluations of feature content
datarobot finds unique identifiers, such as RowID, and wont include it in target analysis
datarobot will identify outliers, or features with low unique values, and not include them in model
too many values may also be encountered
Target feature will be selected and tagged as TARGET
Missing Values
Missing column outlines how many of the values are missing from the specific feature
convert all missing types to one value
Nulls: values that were never entered or retrieved
datarobot will automatically IMPUTE missing values when run through an algorithm