Please enable JavaScript.
Coggle requires JavaScript to display documents.
Machine Learning - Concepts (Advanced (Big Data, TensorFlow : Deep…
Machine Learning - Concepts
Supervised learning
Supervised learning makes sense if you know typically what 'bad' values are. If you only know what is 'normal'/'good' but your 'bad' value can really be very different every time then this is a good case for anomaly detection.
Unsupervised Learning ???
Reinforcement Learning ???
Q-Learning ???
Steps
Data
Features
Column : feature or variable, dimension, attribute
Row : Data value y (already known in supervised learning), representing a target vector
Colunm are Vectors X, multiple column is a matrix
Table based
Feature engineering
Also called "Data Scrubbing"
Feature Selection : Select the right variable X to compare with the target y
Column and Row compression : reduce de combination of features and the similar row result
One hot encoding : translate non numeric to 0/1 values because most algo deals only with numeric
One hot encoding downside : increase the number of feature, though the processing time
Binning : translate numeric values not relevant to categories, true/false or 0/1
Missing data
Negative impact on algo
Use mode (most common value) for categorical and binary
Use mean for continous values (infinite result)
Remove the row, but less data to analyse is not good either
Data setup
80/20 or 70/30 For Training and Test dataset
Randomize before spliting
Train to determine the right model & test to determine precision of the chosen one
Cross validation ???
How much data ?
Apply algorithms
Regression ???
Clustering ???
Error calculation ???
Performance analysis ???
Bias & Variance ???
Neural Network ???
Decision Trees ???
Ensemble Modeling ???
Deep Learning ???
Tools
Language
Python
easy to learn
lots of librairies
But need to be compiled to work on GPU
C C++
Better for advanced
Faster run directly on GPU
Librairies
Python
NumPy : manage datasets and matrix
Scikit-learn : popular algorithms
Pandas : Spreadshit, extract csv
Seaborn : visualization
Ressources
Data
Kaggle.com : lots of datas as CSV
Roadmap
https://github.com/JsonChao/ML-Roadmap
http://cdn.ttgtmedia.com/visuals/LeMagIT/hero_article/ComparoML.jpg
TensorFlow ???
Kibana ?? Elastic ??
Advanced
Big Data
TensorFlow : Deep Learning & Neural Network
AWS
Cloud
Advanced Algorithm
Librairies : Torch, Keras, Caffe
Production : MLlib ou H20 d'Apache Spark for scalability
Automatic discovery : DataRobot & H20.ai. or AutoML Google Cloud
Algorithms
Regression
Logistic Regression ???
Multinomial Logistic Regression ???
SVM
Linear Regression ???
Clustering
K-nearest neighbours ???
K-means clustering ???
Neural Networks
Artificial NN ???
Black box dilemna ???
Building it ???
Multi Layer Perceptron ???
Perceptron ???
Common usage scenario and techniques ??? MLP , Convolution
Decision Trees ???
Ensemble Modeling ???
How to choose ?
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Project
FCA : Etudier la piste Kibana
OCH : Image docker avec les données
Connecteur Elastic => Table Python