Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 13-15 (Chapter 14 Feature understanding and selection (descriptive…
Chapter 13-15
Chapter 14 Feature understanding and selection
once the project and data has been setup the next thing to do is to interpret the data contained in each feature
descriptive statistics
all features are listed in rows with their names under the feature name header and the order they were read into data robot under the index header
index number will be used to specify which features are being discussed
unique is listed after var type
notes how many unique values exist for each specific feature
will only present the data in the third position above
Data types
this indicates the nature of the data inside of a feature
one of the most common types is numeric
any number with integers and decimals
does examine whether and auto-generated features already exist in the dataset and in such cases, does not generate a new feature
evaluations of feature content
once target feature is selected, it will be tagged and that's it for the data set
Missing values
converting missing values to one consistent type, an analyst can avoid them being categorized as multiple unique values during and analysis
treats for missing values
Chapter 15 Build candidate models
first deep learning
Starting the process
select the target feature
once target is chosen the top window will change
automatically downsamples
can be trusted to chose a good measure
logloss means that rather than evaluating the model directly on whether it assigns cases to the correct label
Advanced options
random and stratified give identical options
once lock box has been filed with holdout cases and appropriatel locked , the remaining cases are split into n folds
partition feature is a method for determining exactly which cases are used in different folds
group approach accomplishes much of the same as with the partition feature, but with some key differences: it allow for the specification of a group membership
next it makes decisions about where a case is to be partitioned but always keeps each group together in only one partition
the date/time option deals with a critical evaluation issue; making sure that all validation cases occur in a time period after the time of the cases used to create models
starting the analytical process
this will prepare the data through the prescribed options
autopilot, quick, and manual
features can combine into great ways
model selection process
includes running of almost every effective algorithm invented for ML
Chapter 13. Startup processes
uploading data
easiest way to bring data into data robot is to read in a local file
recommend to down size the data set
accepts comma separated values files
files where the first row contains the column names with a comma between each
an important thing to do while the data is being processes is click on untitled project in the upper-right part of the screen and name the project
after creating multiple projects, consider creating tags to keep track of them
manage projects will list all prior and currently running projects