ch 2 ML Simple ML Classification Algorithms (Introduce NumPy (random…
ch 2 ML Simple ML Classification Algorithms
Perceptron Learning Rule
compares true class labels to predicted class labels
Introduce One Versus All Technique
array - similar to python lists, vectorizes arithmetic operations; meaning an operation is applied to all elements (?) Faster.
x=where(bool,false,true) sets array of x true/false
meshgrid(x1,...xn) Return coordinate matrices from coordinate vectors
np.arange(min,max,resolution) Return evenly spaced values within a given interval
Demonstrate scatter plot, epochs line graph, decision boundary graph
Adaline Neural Network
compares true class labels with linear activation function’s continuous valued output to model error
Introduce problem of selecting eta, learning rate
feature scaling through standardization
shifts mean of each feature so that it centers at zero with std dev of 1. Reduces number of training epochs.
Stochastic gradient descent
update weights incrementally for each training sample, rather than based on sum of accumulated errors
used in online learning, as new data arrives, and to adjust for local minimum problems
Techniques and Libraries
Graphing and visualization concepts
scatter(x,y, color, marker, label) creates a scatter plot
xlabel, ylabel,legend(loc=''),show to display
cmap=ListedColormap(colorslist[:len(np.unique(y))]) to create a list of colors equal to length y
df=read_csv(html, header=None) pull dataset from csv file, specify if header; can also read_csv(local file)
df.iloc[row,column].values returns values
ch 3 scikit-learn Classifiers
Problem of choosing a classification algorithm
No single classifier works best across scenarios, so try several
Tune via regularization with reg param lambda
Approach to selecting a classifier
Select features, collect training data
Choose performance metric
Choose classifier and algorithm
Tune the algorithm
introduce scikit-learn, which conveniently has sample data sets
train_test_split method to choose split ratio
StandardScalar medthod which does our feature scaling for us
fit(x training, y training) does our PLR
predict(x test) predicts y dataset
RandomForestClassifier, good performance and ease of use; don’t have to worry about choosing good hyperparameter values
KNeighborsClassifier non parametric, adapts as we collect data, complexity grows with number of samples.
x.ravel() return a contiguous flattened array.
.contourf(x1,x2,x3, alpha) draw filled contours.
odds ratio: p/(1-p). approaches infinity as p->1
logit function: log(p/(1-p))
logistic sigmoid function: 1/(1+e^-z)
Performs well on linearly separable classes, and is easy to implement. Widely used.
implement with sklearn.linear_model.LogisticRegression
Kernel Trick: Find Separating Hyperplanes
mapping function phi, train SVM model to new space
Common Kernel: Radial Basis Function/Gaussian kernal
with parameter gamma to optimize cutoff
Decision Tree Learning
Good for interpretability
split data on feature resulting in largest information gain
prune trees to reduce overfitting
Random Forest DTree Learning
Trade some interpretability for performance
lazy, doesn’t learn, simply memorizes training data.
Algorithm grows with the size of data presented; non parametric = not fixed set of parameters. Kernel SVM and Trees are also non parametric.