Please enable JavaScript.
Coggle requires JavaScript to display documents.
Python ML (Technique (Regression / Estimation, Classification (A or B),…
Python ML
-
Dev. Rules
-
-
-
-
-
-
-
Main
- Don't be afraid to lunch a product w/o ML
- Make metrics design and implementation a priority
- Choose ML over complex heuristics
- Keep the first model simple and get the infrastructure right
- Detect problems before to export models (Check performance of the models before to move to production)
- Don’t overthink which objective you choose to directly optimize
- Make it Simple, Observable and Attributable
29: The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time.
-
ML Flow
-
- Step 2: Explore Your Data
Metrics
-
-
Number of samples per class: Number of samples per class (topic/category). In a balanced dataset, all classes will have a similar number of samples; in an imbalanced dataset, the number of samples in each class will vary widely.
-
Frequency distribution of words: Distribution showing the frequency (number of occurrences) of each word in the dataset.
-
- Step 2.5: Choose a Model*
“How do we present the text data to an algorithm that expects numeric input?” (this is called data preprocessing and vectorization),
-
Categories
sequence models
- convolutional neural networks (CNNs)
- recurrent neural networks (RNNs)
n-gram models
- logistic regression
- simple multi- layer perceptrons (MLPs / fully-connected neural networks)
- gradient boosted trees
- support vector machines.
-
- Step 3: Prepare Your Data (MLP)
-
-
Vectorization
-
One-hot encoding: [1,0,1,1,0,1,0]
Count encoding: [1,0,1,2,3,1,0]
Tf-idf encoding: (occurences/samples) * log(occurences/samples) [0.33, 0, 0.23, 0.45]
Feature selection
-
More than 20,000, less accuracy
Normalization
Objective: converts all feature/sample values to small and similar values. This simplifies gradient descent convergence in learning algorithms
Option for Sequence word
-
Word embeddings
As a result, we can represent word tokens in a dense vector space (~few hundred real numbers), where the location and distance between words indicates how similar they are semantically (See Figure 7). This representation is called word embeddings.
-
- Step 4: Build, Train, and Evaluate Your Model
-
-
-
Fine-tuned embedding: allow the embedding layer to also learn, making fine adjustments to all weights in the network
Option: Sequence
sepCNN the best in the test (others CNN, RNN and CNN-RNN)
Train
-
Loss function: A function that is used to calculate a loss value that the training process then attempts to minimize by tuning the network weights. For classification problems, cross-entropy loss works well.
Optimizer: A function that decides how the network weights will be updated based on the output of the loss function. We used the popular Adam optimizer in our experiments.
We repeat training using the dataset for a predetermined number of epochs????. We may optimize this by stopping early, when the validation accuracy stabilizes between consecutive epochs, showing that the model is not training anymore.
- Step 5: Tune Hyperparameters
- Step 6: Deploy Your Model
Algorithms
Regression
-
-
-
Main Algorithms
-
-
-
Linear, Polynomial, Lasso, Stepwise, Ridge
-
-
-
-
-
-
Classification
-
Main Algorithms
Decision Tree (ID3, C4.5, C5.0)
-
-
K-nearest neighbor
-
Use test data to define k, increasing until be fine
-
-
-
Evaluation Metrics
-
-
Log loss
Probability of a class label, instead of a label
Decision Trees
Decision trees are about testing an attribute
and braching the cases, based on results
of the test
-
-
-
Entropy: The amount of information desorder,
or the amount of randomness in the data
Use recursive information Gain(S, "Sex") to choose the better entropy
Logistic Regression
Predict class, not number as Linear regression
Binary category 0/1 , true/false
-
Applications
-
- You need the probability of you prediction
- if your data is linearly separable. The decision boundary of logistic regression is a line or a plane or a hyper-plane.
- You need to understand the impact of the feature
-
-
-
Definition: Logistic regression fits a special s-shaped curve by taking the linear regression and transforming the numeric estimate into a probability with the sigmoid function:
sigmoid(regression_result)
-
SVM
-
Process
- SVM works by first, mapping data to a high-dimensional feature space so that data points can be categorized
- A separator is estimated for the data (it is a hyper plan in a 3D space
Kerneling procedure: mapping data into a high dimention space
-
-
-
-
-
Python
-
Packages
Panda
hist(*, bins** ) set interval to reduction
-
-
-
-
-
-
-