Please enable JavaScript.

Coggle requires JavaScript to display documents.

Machine Learning (Hypothesis Testing (t-Distribution (If sample size is…

- - - - Sigmoid - between 0 and 1.
        Not centered around 0. TanH solves this problem
      - TanH = (2/(1-e^-2x))-1
        Ranges from -1 to +1
      - RELU/Leaky Relu - Most used
        RELU returns 0 if < 0 and actual value if > 0
        Leaky RELU returns small -ve if < 0
- - - - Most Computationally Costly
        but the advantage is that it sees if the features contribute any predictability to the output. If not, it will weight those features to 0 so they are inconsequential
        sklearn.linear_model.lasso
- - - - Used for Regression Gradient boosting
      - In XGBoost, we create a simple regression model and calculate the difference between Actual and predicted output. Second round, we will try to create a diff regression function that can predict the diff for the same input features.
      - Shallow Decision Tree can be used for Regression
- - - - Each point becomes its own cluster
        Merge closest clusters into joint cluster and keep repeating the process.
        Build Dendrogram based on distance between clusters
        Plot the dendrogram in a 0-1 scaled graph
        height of the dendrogram is the distance between the cluster
        Cut the dendrogram at a desired level to get the number of clusters.
- - - - ARIMA - Auto Regressive Integrated Moving Average
      - Check if data is stationary using Dickey Fuller Test
      - For ARIMA we have 3 variables
        p - for AR Lag
        d - differencing
        q - MA Lag
        p & q we find using ACF and PACF
- - - - Finding probability of each occurrences
      - Probability AND = Prob1 * Prob2
      - Probability OR = Prob1 + Prob2
      - When calculating probability of 3 red balls, We should find probability of each permutation and add them.
    - - if there are only 2 possibility, say Red & Blue ball, Probability of Red = 1-Prob of blue
    - - Distribution probability additive - so final graph is always 1
        For example blue balls <= 3
    - - Area under the Probability Curve
    - - Bell Curve
      - Between Mean and +/- Std Dev = 68 % of population
      - Between Mean and +/- 2 Std Dev = 95 % of Population
        +/- 3 Std Dev = 99.7% of population
      - Z Score = (Value we are looking for - Mean ) / Std Dev
        We use Z Table to find probability %
    - - Standard error = Sampling Distribution = Std Dev / SQRT(Sample Size)
      - Confidence interval = Sample Mean +/- (Z-Score * Std Err)