Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 4 Building Training Sets + Preprocessing (Feature Scaling…
Chapter 4 Building Training Sets + Preprocessing
Missing Values Problem
kill missing values with pandas.df
.isnull().sum() counts missing values per column
dropna(axis=0) would kill rows with missing data. dropna(axis=1) kill columns. dropna(how='all') kills rows where all are na. (thresh=x) drop rows with fewer than x real values
dropna(subset=['Name']) drop rows with missing data in 'Name' feature
Mean Imputation, average the feature column and replace the missing value
sk.preprocessing.Imputer; imr=Imputer(missing_values='NaN',strategy='mean',axis=0); imr.fit(df.values)
L1 and L2 regularization
Generalization Error Problem
Collect more data
Introduce a complexity penalty
Choose a simpler model with fewer params
Reduce data dimensinoality
L2 Regularization introduces a penalty for large individual weights
L1 Regularization varies from L2 by replacing the square of the weights with their absolute values
L1 Reg yields sparse feature vectors, with most weights zero. Useful when many features are irrelevant.
.learnear_model.LogisticRegression(penalty='l1')
Estimators-classifiers with an API similar to transformer class
have predict method
have transform method
Feature Scaling
Important for scale dependant algorithms, like GD.
Normalization - scale 0-1
Standardization - divide by std dev
.preprocessing.MinMaxScaler
preprocessing.StandardScaler
Handling Categorical Data
2 Types: Nominal (category)
Use one hot encoding for each category but one, to avoid problems with non-invertible matrices (dependence)
.preprocessing.OneHotEncoder(categorical_features=[column_num])
Get_dummies in pandas creates onehots, use drop_first=True to kill multicollinearity
Ordinal (number)
Eg, size of clothing
create size mapping from size to numebr
sklearn.preprocessing.LabelEncoder; cle=LabelEncoder(); y=cle.fit_transform(df['label'].values)
scikit-learn Transformer classes, used for data transformation
Imputer, a transformer
key methods
fit- learn parameters from training data
transform; any array that is transformed needs to have n features=data array for fitting
LabelEncoder class in sklearn.preprocessing for encoding labels
Topics
Remove/impute missing values
Get categorical data into shape via one hot
Select relevant features
Dimensionality Reduction
Feature Selection-select a subset of features
Sequential Feature Selection Algorithms are Greedy, reduce d-dimensional space by selecting most relevant features
Sequential Backward Selection (SBS) minimizes dimensionality of feature subspace with minimum decay
Steps: Initialize, choose number of features for feature space
Determine x- to maximize J(X-k)
Remove x-
Terminate if k is small enough
Feature Extraction-derive information to create a new feature subspace
Assess feature importance with Random Forests
Measure feature importance from averaged impurity decrease computed from all decision trees
access by feature_importances attribute after fitting RFC.
:!: note that if 2 features are correlated, the more relevant one will be ranked highly, while the other will be undervalued
Ch5 Compressing Data via Dimensionality Redux
Unsupervised dimensionality reduction via principal component analysis
Main steps to PCA
Apps include exploratory data analysis and de-noising of signals
Identify patterns in data based on correlation between features, by finding directions of max variance in high dim data, project to new axes
Highly sensitive to Data Scaling
Steps
Standardize d-dimensional dataset
Construct covariance matrix
Decompose covariance matrix to ei’s
Sort eivals decreasing order
5,select k eivecs corresp. to k dimensionality
Construct projection matrix from W to k eivects
transform d dimensional input using projection matrix W.
Extracting PC step by step
Total and explained Variance
Variance explained ratios of eigenvalues is eival div by sum of eivals
Feature Transformation
PCA in scikit
A trasnformer class where we fit the model with training data, then transform the training data and test dataset with same model params.
Supervised Data Compression via linear Discriminant analysis
PCA vs LDA
LDA increases computational efficiency and reduces degree of overfitting from curse of dimensionality in non regularized madels
LDA finds feature subspace that maximizes class separability
LDA is supervised, PCA is unsupervised
Inner workings of LDA: steps
Standardize d-dimensional data
Compute d-dimensional mean vector for each class
Construct between-class scatter matrix and within class scatter matrix
Compute the eigenvectors and corresponding eigenvalues of Sw^-1*Sb
Sort Eivals by decreasing order
Choose k eivectors corresponding to largest eivals to construct dxk W
Project samples using transformation matrix W
Compute Scatter Matrices
Select Linear Discriminants for new feature subspace
Projecting samples onto new feature space
LDA via scikit
.discriminant_analysis.LinearDiscriminantAnalysis
Using Kernel principal component analysis for nonlinear mappings
Kernel functions and kernel trick
We define a nonlinear mapping function phi from Rd to Rk, Rk higher dimensional space
Mapping is expensive. Solution is Kernel Trick
Compute similarity between 2 high dimension feature vectors in the original feature space
Most commonly used kernels
Polynomial kernel
Sigmoid kernel
Radial Basis Function/Gaussian Kernel
Implementing KPCA analysis in python
Projecting new datapoints
KPCA in scikit learn
Summarize Data by transforming to lower dimensionality