Please enable JavaScript.

Coggle requires JavaScript to display documents.

Plural Sight: Understanding Machine Learning in R- Machine Learning…

- - - - Training data ~ 70%
      - testing data ~ 30%
      - Data Splitting Requirements
        
        divide into two data frames by user specified %
        
        make sure that the proportion of the rare event that occurs in the entire population of data is preserved once the data is split into training and testing data.
        
        can ensure this by using Caret's data partition function
        
        createDataPartition(data.frame$column.to.be.predicted, p = 0.70, list = FALSE)
        
        the first argument for this command is the column that contains the value I want to be able to predict with the model. and it's the column that the function will ensure has the same proportion of the rare event between training and testing data frames
        
        the 2nd argument specifies how much of the data needs to go into the training data. usually this is 70% of the data
        
        the 3rd argument
        
        the function returns a vector of row indices that are to be partitioned
  - - - decide which columns will be relevant to finding your predicted value. what factors might come into play?
  - - - data-splitting
      - pre-processing
      - feature selection
      - model tuning
- - - - the data contains "examples" of the value i'm trying to predict, and "features" that enable predicting the value
        
        the algorithm executes it's logic and produces a "trained" model this model is another program?
    - - the training algorithm produces a prediction model
        
        to call this function, real data is passed
  - - - 2) Result
        
        1) Regression
        
        Continuous values
        
        2) Classification
        
        Discrete values
      - 3) Complexity
        
        ensemble?
        
        container algorithms that have multiple child algorithms
        
        can be difficult to debug
        
        often used to fine tune the model to increase performance
      - 1) Learning Type
        
        1) Supervised
        
        2) Unsupervised
      - 4) Basic or Enhanced
        
        Basic
        
        easier to understand
        
        Candidate Algorithms
        
        Naive Bayes
        
        Based on Bayes Theorem
        
        assumes that all features are independent of each other and Every feature has the same weight
        
        based on likelihood and probability
        
        Requires a smaller amount of data
        
        Logistic Regression
        
        returns a binary result
        
        relationship between features are weighted
        
        works fast
        
        measures the relationship of each feature and weights them based on their impact on the result
        
        stable to data changes
        
        Decision Tree
        
        based on a binary tree
        
        each node in the tree contains a decision
        
        requires enough data to determine nodes and splits
        
        simpler
        
        Enhanced
        
        Variation of basic
        
        used for performance improvements
        
        additional functionality
        
        more complex