Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapters 13-15 (Chapter 14: Feature Understanding and Selection (14.1…
Chapters 13-15
Chapter 14: Feature Understanding and Selection
14.1 Descriptive Statistics
When looking at any feature name you can click on it to see more then displayed. For example, you can click on it to see the row Id which might show you how many of that specified name are found in a data set.
An easy way to remember what the numbers mean is to mentally place a less-than sign in front of each number. When it states a 3, turn it into a <3 (less than 3, not a heart).
14.2 Data Types
DataRobot shows the first 50 features by default, and it is good to be aware of how to view features beyond index 50. Find the last two features by using the navigation at the bottom of the window. There is an example of this in the reading.
14.3 Evaluations of Feature Content
An AutoML would not know which one to remove correctly, and it is possible that a researcher might draw incorrect conclusions about which drug had the positive or negative effe
14.4 Missing Values
Why care about missing values? The reality is that there are many ways for missing values to be handled even within the same dataset.
Chapter 15: Build Candidate Models
15.1 Starting the Process
When starting DataRobot does a lot of the work on its on thankfully, but one of the most important tools to look at is the Use the Target tool.
Note that DataRobot offers the option of which metric to optimize the produced models for. In general, DataRobot can be trusted to select a good measure.
15.2 Advanced Options
There is a button literally called advanced options. Although this is not supposed to be for first time users it is still important to look at.
The group approach is something critical to look at. First of all, it allows for the specification of a group membership feature. Secondly, DataRobot makes decisions about where a case is to be partitioned but always keeps each group (those with the same value in
15.3 Starting the Analytic Process
The analytic process deals with a few different variable and these will prepare the data through the prescribed options: Autopilot, Quick, and Manual. The informative features list is also another important list to look at.
The green bar in the Importance column indicates the relative importance of a particular feature when examined against the target independently of all other features. Each
15.5 Model Selection Process
Even though each different type of algorithm has very different run-times and processing needs, each is assigned here to a worker. We can think of a worker as a dedicated computer.
Chapter 13: Startup Processes
13.1 Uploading Data
The best way to upload data into DataRobot is to read through "Local File"
It is strongly recommended to stick to the smaller down sampled datasets while learning. There is a serious risk that the larger datasets will churn through the semester processing allocation in one analysis depending on the size of the dataset.
Data Robot will use commas to separate all the rows when you are looking at the data. The current data limit for DataRobot is that the uploaded dataset must be 100 data rows or more, less than or equal to 20,000 columns, and l