Please enable JavaScript.

Coggle requires JavaScript to display documents.

Data science learning map, legend explanation: :red_flag: knowledge…

- - - - chain rule
- - - - relational data for a specific use, eg. transactional
      - usually dynamic, online data
      - :check: SQLs
  - - - can be anything
        
        objects + schema
      - :check: noSQLs
  - - - datasets merged from multiple DBs
      - static, historical data
      - :<3: Join
        
        left/ right/ inner/ outer
- - - - Gaussian / Normal
      - Student T
      - Bernoulli
        
        Binomial
        
        Categorical
      - Beta
      - Poisson
        
        frequency data
      - Chi
  - - - for categorical's count
        
        to test difference in groups' mean
      - :moneybag: A/B testing
    - - to test if set of Xs is related to Y
    - - :<3:confidence interval
        
        :<3: statistical interpretation
      - :<3: P-Value
  - - - prior
        
        P(Y==1)
      - likelihood
        
        P(X | Y==1)
      - posterior
        
        P(Y=1 | X)
    - - prior: A distribution + likelihood: B distribution = posterior: A distribution
  - - - Merkov Chain Monte Carlo
  - - - Variance/ Standard deviation/ Covariance/ Correlation coefficient
    - - skewness
- - - - :lock:Naive Bayes
        
        use Bayesian Chain rule
      - :lock:Logistic Regression
        
        Xs follow Gaussian
        
        log(odds) = WtX
        
        :moneybag:market basket analysis
      - :lock:Linear Discriminant Analysis
        
        Xs follow Gaussian, same covariance
        
        :lock:Quadratic Discriminant Analysis
        
        Xs follow Gaussian, different covariance
        
        :moneybag: discriminant analysis
    - - :lock:KNN
      - :lock:Linear Regression / Ordinary Least Squared Regression
        
        Xs follow Gaussian
        
        :<3: Regulization
        
        L2 method (squared)
        
        :lock: Ridge Regression
        
        Gaussian Prior
        
        L1 method (absolute)
        
        :lock: Lasso Regression
        
        Laplace Prior
        
        feature selection(sparsity),reduce collinearity
        
        reduce overfitting
      - :lock:Survival Regression
        
        probability of event not happens at time T
        
        :moneybag:customer life time value
  - - - :moneybag: segmentation
    - - :moneybag: understand your customers
  - - - :lock:ARIMA(X)
        
        :lock:SARIMAX
        
        :moneybag:financial forecast
    - - Lags of target (Y)
      - Seasonality
      - White noise
      - Random Walk
      - Difference
    - - multiple means, multiple time
        
        :moneybag: stores revenue forecast
      - fixed VS random effect
- - - - :lock:Bagging tree / Random Forest
        
        feature importance
    - - :<3:bagging
        
        Resampling on data / Bootstrapping
      - :<3:boosting
        
        learn from previous models' error
      - :<3:stacking
        
        Use multiple weak models to train a meta model
  - - - Loss function / Cost
      - Learning rate
      - Optimizer
      - Batch
    - - K-folds Cross Validation
      - Grid search Cross Validation
  - - - :lock:Thompson Sampling
        
        stochastic approach
        
        :moneybag: A/B testing
    - - :moneybag:self driving car
  - - - RGB / Polynomial / Linear
      - Mapping data into higher dimension
  - - - each model's target can be either discrete or continuous
- - - - :lock:Convolutional Neural Network
        
        Good for data that
        dimension reduction and feature extraction is super important
        
        image data
        
        :<3: Convolutional layer
        :<3: max pooling
        :<3: Flattening
      - :lock:Recurrent Neural Network
        
        Good for data that sequence matters
        
        time series data & text data(NLP)
        
        :<3:Long Short-Term Memory
  - - - :moneybag:segmentation
    - - :unlock:Auto-encoder
      - :unlock:Deep Belief Networks
        
        :moneybag:recommender system
- - - - ordinal
      - discritization
        
        down scale variance
      - Monotone revalue
      - Frequency
      - Probability ratio
      - Weight of evidence
    - - Mean/ Median/ Mode/ Random
    - - Smoothing
  - - - Normalized
        
        rescale by max and min to [0-1]
      - Standardized
        
        rescale by σ and μ to [-inf,inf], center at 0
  - - - list/ vector/ tuple/ matrix / array/ dataframe/ datatable
    - - character/ string/ integer/ float/ Null/ Boolean/ datetime