Please enable JavaScript.
Coggle requires JavaScript to display documents.
MACHINE LEARNING (NEURAL NETS AND
DEEP LEARNING (CONVOLUTIONAL NEURAL…
MACHINE LEARNING
-
ENSEMBLE METHODS
Multiple classifier systems,
committee of classifiers or
mixture of experts
Stacking
Bagging
- Diversidade obtida c/ uso diferentes subconjuntos
- Cada subconjunto treina 1 classificador do mesmo tipo
- Para uma data instância a classe que obtiver mais voto será a resposta
Bagging - variações:
- Random forest
- Variações da quantidade de dados e caracteristicas
- Usando arvores de decisao com diferentes inicializacoes
- Pasting Small Votes
- Mesma ideia do bagging, mas voltado para grandes volumes de dados
- A base é dividida em conjuntos chamados bites
Boosting
- Tambem cria EBS por meio da re-amostragem dos dados
- A re-amostragem e estrategicamente criada para prover o conjunto de treinamento mais informativo para cada classificador
Exemplos:
- AdaBoost
- CatBoost
- XGBoost
- LightGBM
XGBoost (Optimization)
- Train-test split, evaluation metric and early stopping
- Time to fine-tune our model
- Fill reasonable values for key inputs
- Run model.fit(eval_set, eval_metric) and diagnose your first run, specifically the n_estimators parameter
- Optimize max_depth parameter.
- Now play around with the learning rate and the features that avoids overfitting
- Other remarks
Look at the feature_importance table, and identify variables that explain more than they should. Your data may be biased! And both your model and parameters irrelevant.
Mixture of experts
- Similar ao Stacked Generalization onde existe um classificador extra ou meta-classificador
- Esse classificador no segundo nível é usado para atribuir peso aos classificadores
- Esse classificador é uma gating network treinada com gradiente descendente ou expectation maximization (EM)
CLASSICAL LEARNING
SUPERVISED
CLASSIFICATION
- K-NN (K-nearest neighbour)
- Bayesian
- Naive Bayes
- Averaged One-Dependence Estimators (AODE)
- Bayesian Bellef Network (BBN)
- Gaussian Naive Bayes
- Multinomial Naive Bayes
- Bayesian Network (BN)
- SVM (support vector machine)
- Decision Tree
- Classification and regression tree (CART)
- Iterative dichotomiser 3 (id3)
- C4.5
- C5.0
- Chi-squared automatic interaction detection (CHAID)
- Decision Stump * Conditional Decision Trees
- M5
- Logistic regression
REGRESSION
(perfect when depends on time)
- Linear regression
- Polynomial regression
- Ridge / Lasso regression
AVALIACAO CLASSIFICACAO
- houdout (separa teste e treinamento)
- k-fold (aplica no conjunto de treinamento
(obter desempenho medio da tecnica)
- leave one out
UNSUPERVISED
CLUSTERING
- K-means
- K-medians
- Expectation Maximization
- Fuzzy C-Means
- Mean-Shift
- Agglomerative
- DBSCAN
- Hierarchical clustering
ASSOCIATION RULE LEARNING
MARKET BASKET ANALYSIS
PATTERN SEARCH
DIMENSION REDUCTION
(generalization)
- t-SNE (for visualization)
- UMAP (Uniform Manifold Approximation and Projection) (for visualization)
- PCA (Principal Component Analysis)
- LSA,pSLA, GLSA (Latent Semantic Analysis)
- SVD (Singular Value decomposition)
- LDA (Latent Dirichlet Allocation)
- NMF (non-negative matrix factorization)
-
INTERPRETING
1. Overall interpretation:
determine which variables (or combinations of variables) have the most predictive power, which ones have the least
- identify the variables with the best predictive power
- raise issues/correct bugs: variables that have too much importance compared to others.
- update your model with new variables
- compare different models
2. Local interpretation:
for a given data point and associated prediction, determine which variables (or combinations of variables) explain this specific prediction
REINFORCEMENT LEARNING
O objetivo é minimizar erros ao invés de prever todas as situações
- Genect Algoritm
- A3C
- SARSA
- Q-Learning
- Deep Q-Network (DQN)
-
FEATURE SELECTION METHODS
- Filter Methods
- Wrapper Methods
- Embedded Methods
- Feature Importance in all the Decision Tree based algorithms