Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 13 - 15 (Ch 13: Startup Process (Local file is easiest to import,…
Chapter 13 - 15
Ch 13: Startup Process
Local file is easiest to import
Use data samples to improve calculation speed
Data Robot takes CSV files
Text data is less likely to be problematic in a tab separated file (.tsv)
Make sure feature names are unique and correct the data structure if they're not
Follow along with screen shots in ch13 for startup help
Ch 14: Feature Understanding and Selection
Descriptive Statistics
Unique, notes how many unique occurrences of a specific feature
Mean, min, max, mode, st dev
Data Types
numeric, both integers and decimals
Categorical features may be treated as numeric if not entered properly
Boolean: True or False
Text
DataRobot is capable of detecting and tagging currencies based on the existence of
146
currency symbols such as $ (dollar), € (euro), EUR (also euro), £ (pound), and ¥ (yen)
Missing Values
? or NULL or text"N/A"
Rows may be exclude if certain columns are ? or null, sub mathematical ave for numerical cells
Ch 15: Build Candidate Models
most of these models will serve primarily to improve understanding of what combinations of data, preprocessing, parameters, and algorithms work well when constructing models
LogLoss (Accuracy) simply means that rather than evaluating the model directly on whether it assigns cases (rows) to the correct “label” (False and True), the model is evaluated instead based on probabilities generated by the model and their distance from the correct answer
Read through advanced options 15.2