Please enable JavaScript.
Coggle requires JavaScript to display documents.
Python - Machine learning (Process (Data Acquisition, Data Cleaning, Model…
Python - Machine learning
Types
Supervised (labeled data)
Unsupervised (unlabeled data)
Reinforcement (learn from experience)
Process
Data Acquisition
Data Cleaning
Model Training & Building
Model Testing
Model Deployment
Library
TensorFlow
SciKit Learn
Algorythms
Linear Regression
Linear Regression
Data creation
X = dataframe[['CHOSEN COLUMNS, FEATURES']]
y = dataframe['TARGET VARIABLE, VALUE']
Model training
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train, y_train)
Coefficients
lm.coef_
coefficient.example = a 1 unit increase in Avg. Area Income is associated with an increase of $21.52 .
Predicting
predictions = lm.predict(X_test) - predicting X_test data, where y_test are the correct answers
plt.scatter(y_test, predictions) - correctness of the predictions
Metrics
MAE - mean_absolute_error
MSE - mean_squared_error
RMSE - root mean squared error
from sklearn import metrics
Logistic Regression
Logistic
Data
delete/fill missing data
sex = pd.get_dummies(df['Sex'], drop_first = True) -> ex. male = 1, female = 0, string -> int
Train Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size, random_state)
Training and predicting
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(X_train, y_train)
predictions = logmodel.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, predictions))
K Nearest Neighbors
KNN - K Nearest Neighbors
Standardize the variables
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df,drop('TARGET CLASS', axis = 1)
scaled_features = scaler.transform(df.drop('TARGET CLASS', axis = 1)
df
new = pd.DataFrame(scaled
features, columns = df.columns[:-1])
Train Test Split
from sklearn.model_selection import train_test_split
X_train, X_test... = train_test_split(X, y)
KNN
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = k)
knn.fit(X_train, y_train)
pred = knn.predict(X_test)
Choosing K
for i in range(1, 40):
k set to i
error_rate.append(np.mean(pred_i != y_test)
Evaluations
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, pred)
print(classification_report(y_test, pred)
Decision Trees
Decision Trees and Random Forests
Train Test Split
Decision Trees
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)
Random Forests
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators = 200)
rfc.fit(X_train, y_train)
Predictions
rfc_pred = rfc.predict(X_test)
pred = dtree.predict(X_test)
Support Vector Machines
Support Vector Machines
Train Test Split
Train the SVC
from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)
Predictions
pred = model.predict(X_test)
Gridsearch
param_grid = {'variable':[value1, value2]} ex. {'C':[0.1, 1, 10, 100, 1000], 'gamma':[1, 0.1, 0.01]}
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(SVC(), param_grid, refit = True, verbose = 3)
grid.fit(X_train, y_train)
grid.best
params
K Means Clustering
K Means Clustering
Creating clusters
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters = x)
kmeans.fit(data)
Unsupervised learning algorythm
Values
kmeans.cluster
centers
kmeans.labels_
PCA
Principal Component Analysis
Unsupervised learning
Scaling the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df)
scaled_data = scaler.transform(df)
PCA
from sklearn.decomposition import PCA
pca = PCA(n_components = x)
pca.fit(scaled_data)
x_pca = pca.transform(scaled_data)
Natural Language Processing
Natural Language Processing - NLP
Text Preprocessing
import string
nopunc = [char for char in mess if char not in string.puntuation]
nopunc = ''.join(nopunc)
from nltk.corpus import stopwords
[word for word in nopunc.split() if word.lower() not in stopwords.words('english')]
Vectorization
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import MultinomialNB
pipeline = Pipeline([('bow', CountVectorizer(), ..........]) ->all steps
Deep Learning
Deeplearning with Tensorflow
import tensorflow as tf
Contrib Learn
train_test_split the data
import tensorflow.contrib.learn as learn
classifier = learn.DNNClassifier(hidden_units = [10, 20, 10], n_classes = 3)
classifier.fit(X_train, y_train)
classifier.predict(X_test)
Multi-Layer Perceptron
learning_rate -> how quickly to adjust the cost function
training_epochs -> how many training cycles to go through
batch_size -> size of the 'batches' of training data
n_hidden_1; n_hidden_2; n_input; n_classes; n_samples
x, y
FUNCTION
layer_1, layer_2, out_layer
Settings
weights = {'h1':...}
biases = {'b1':...}
COST AND OPTIMIZATION
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
RUNNING THE SESSION
for epoch in range(training_epochs)
batch_x, batch_y = mnist.train.next_batch(batch_size)