Please enable JavaScript.
Coggle requires JavaScript to display documents.
ML (CV (Image Preprocessing, Visualizing Output, Transfer Learning, Data…
ML
-
Intermediate ML
Data Cleaning
-
Scaling & Normalizing
-
Normalization
-
The point of normalization is to change your observations
so that they can be described as a normal distribution.
In general, you'll only want to normalize your data if you're going to be using a machine learning or statistics technique that assumes your data is normally distributed.
Linear regression, Gaussian..., LDA, t-tests, ANOVA
Text cleaning
-
Inconsistent data entry
80%: .str.lower(), .str.strip()
-
Categorical values
-
-
One Hot Encoding
-
high cardinality: does not perform well if the categorical variable
takes on a large number of values (> 15 values)
-
-
-
-
-
-
Cross-validation
-
For small datasets, where extra computational burden
isn't a big deal, you should run cross-validation.
For larger datasets, a single validation set is sufficient.
Xgboost
n_estimators
-
-
Typical values range from 100-1000,
though this depends a lot on the learning_rate
-
-
-
Data leakage
-
train-test contamination
-
If your validation is based on a simple train-test split, exclude the validation data from any type of fitting, including the fitting of preprocessing steps
-
When using cross-validation, it's even more
critical that you do your preprocessing inside the pipeline
-
Embeddings
purpose
-
a categorical variable with lots of possible values (high cardinality), with a small number of them (often just 1) present in any given observation. One good example is words.
-
Embedding layers
maps each element in a set of discrete things (like words, users, or movies) to a dense vector of real numbers (its embedding)
A key implementation detail is that embedding layers
take as input the index of the entity being embedded.
You can think of it as a sort of 'lookup table'
An object's embedding, if it's any good, should capture some useful latent properties of that object.
It's up to the model to discover whatever properties of the entities are useful for the prediction task, and encode them in the embedding space.
-
Matrix Factorization
-
Dot product, vectors of same length
-
-
t-SNE
purpose
-
It learns a mapping from a set of high-dimensional vectors, to a space with a smaller number of dimensions (usually 2), which is hopefully a good representation of the high-dimensional space.
-
-
-
-
AI + AR + Mobile
-
-
-
Mobile AR components
-
-
-
Pattern Matching
-
Object, Room and Scene Recognition
Anchoring, Tracking, and Persistence
Data Visualization
Trends
sns.lineplot
-
show trends over a period of time, and multiple lines
can be used to show trends in more than one group
-
Relationships
-
-
Scatter plots
-
sns.lmplot
drawing multiple regression lines, if the scatter
plot contains multiple, color-coded groups.
-
-
if color-coded, we can also show the
relationship with a third categorical variable
-
Distribution
show the possible values that we can expect to
see in a variable, along with how likely they are
-
sns.kdeplot
-
show an estimated, smooth distribution
of a single numerical variable
-
sns.jointplot
simultaneously displaying a 2D KDE plot with the
corresponding KDE plots for each individual variable.
SQL & BigQuery
BigQuery
-
hints
-
-
-
For initial exploration, look at just part of the table instead of the whole thing.
-
-
-
-