Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapters 13, 14, &15 (Chapter 15- Build Candidate Models (starting the…

- - - - generally several datasets are supplied
      - stick to downsampled datasets while learning, larger datasets have serious risks
      - While data is being processed, click on "Untitled Project"
        
        name project, can create tags to track projects, to do this click on the file folder inside the circle symbol to bring up a new menu where you can create projects and manage them->click manage projects-> will list all prior and currently running projects -> click tags and type in general name for type of project that can be applied to similar
        
        tags become more important with more projects completed
    - - these are files where first row contains the colunm names with a comma b/w each, following rows have data w/a comma b/w each
      - on left side of .csv file, anything listed before the comma on its line belongs in column 1
- - - - 68% of data is w/i -1 to 1 SD, 95% of data w/i -2-2 sd's, and so on
  - - - any type of number including integers and decimals
    - - two values, true or false
  - - - many ways for missing values to be handled; can converta all to one consistent type; even if missing value is treated correctly, some algorithims that will ignore a whole row if there is a single missing value in any cell of that row, leading to deleterious effects for the model
        
        algorithms that struggle with missing values: regression, nerual netowrks, support vector machines
- - - - once selection is made, top of window changes
    - - means rather than evaluating the model directly on whether it assigns cases (rows) to correct "label" (t or f), the model is evaluated based on probabilities generated by the model and their distance from the correct answer
  - - - before this, reset the options to where they originally were
    - - abbreviated version of autopilot that producses almost as good models by shortcutting DR best practice machine learning process
    - - reps all data features except for the ones automatically excluded and tagged
    - - relevant if the data is large (over 500mb) and all inital evals before this step will have been conducted with a 500 mb sample of dataset and now the rest of dataset is loaded
  - - - frequently used measure of the diff between values predicted by a model or an estimator and the values actually observed