Please enable JavaScript.
Coggle requires JavaScript to display documents.
Machine learning - Coggle Diagram
Machine learning
Check
Mean absolute error
\( MAE = \frac{1}{n}\sum_{i=1}^{n} \left | y_{i}-\hat{y}_{i}\right |\)
np.mean(np.absolute(test
y
- test_y))
\(\hat{y}_{i}: predict y\)
Root Mean Squared Error
\(RMSE= \sqrt{\frac{1}{n}\sum_{i=1}^{n} (y_{i}-\hat{y}_{i})^2}\)
np.sqrt(np.mean((test
y
- test_y) ** 2))
Mean squared error
\(MSE= \frac{1}{n}\sum_{i=1}^{n} (y_{i}-\hat{y}_{i})^2\)
np.mean((test
y
- test_y) ** 2)
Relative Absolute Error
\( RAE = \frac{\sum_{i=1}^{n} \left | y_{i}-\hat{y}_{i}\right |}{\sum_{i=1}^{n} \left | y_{i}-\bar{y}_{i}\right |}\)
Relative Squared Error
\( RSE = \frac{\sum_{i=1}^{n} \left ( y_{i}-\hat{y}_{i}\right )^{2}}{\sum_{i=1}^{n} \left (y_{i}-\bar{y}_{i}\right )^{2}}\)
\(R^2\)
\( R^{2} = 1 - RSE\)
from sklearn.metrics import r2_score
r2_score(test_y , test
y
)
Supervised learning
Classification
K-Nearest neighbors
rom sklearn.neighbors import KNeighborsClassifier
k = 4
Train Model and Predict
neigh = KNeighborsClassifier(n_neighbors = k).fit(X_train,y_train)
Logistic regression
hypothesis function: logistic/sigmoid function
\( h_{\theta}(x) = g(\theta^{T}x) \)
\( z = \theta^{T}x\)
\( g(z) = \frac{1}{1+e^{-z}} \)
decision boundary:
khi \( \theta^{T}x >=0\) thì g(z) >=0.5 => y =1
và ngược lại
hàm cost function cho logistic function là hàm dạng log- trong đó cost sẽ là rất lớn nếu kq dự đoán khác so với thực tế
\( J(\theta) = \frac{1}{m} \sum_{i=1}^{m} Cost(h_{\theta}(x^{(i)}),y^{(i)})\)
\(Cost(h_{\theta}(x),y) = -log(h_{\theta}(x))\) if y =1;
\(Cost(h_{\theta}(x),y) = -log(1-h_{\theta}(x))\) if y =0
với 2 hàm trên thì: \(cost \to \infty\) khi \(h_{\theta}(x) \to (1-y) \)
2 hàm cost trên tương đương với:
\(Cost(h_{\theta}(x),y) = -(y)log(h_{\theta}(x)) -(1-y)log(1-h_{\theta}(x))\)
\( J(\theta) = -\frac{1}{m} \sum^{m}_{i=1}[y^{(i)}log(h_{\theta}(x^{(i)})) + (1-y^{(i)})log(1-h_{\theta}(x^{(i)}))] \)
với \( h = g(X\theta)\)
\( J(\theta) = \frac{1}{m} (-y^{T})log(h) - (1-y)^{T}log(1-h) \)
Gradient descent
\( \theta_{j} := \theta_{j} - \alpha\frac{1}{m} \sum_{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})x^{(i)}_{j} \)
\(\theta := \theta - \frac{\alpha}{m} X^{T}(g(X\theta) - \vec{y}) \)
Regression
Simple Regression
Linear
if Polynomial regression
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
train_x = poly.fit_transform(train_x)
train
from sklearn import linear_model
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit(train_x, train
y)
print ('Coefficients: ', regr.coef
)
print ('Intercept: ',regr.intercept_)
predict
test_x = np.asanyarray(test[['ENGINESIZE']])
test_y = np.asanyarray(test[['CO2EMISSIONS']])
test
y
= regr.predict(test_x)
Linear regression analysis is used to predict the value of a variable based on the value of another variable.
Non-Linear
Cubic
y = 1
(x**3) + 1
(x
*2) + 1
x + 3
Quadratic
y = np.power(x,2)
Exponential
Y= a + b*np.exp(X)
Logarithmic
Y = np.log(X)
Sigmoidal/Logistic
Y = 1-4/(1+np.power(3, X-2))
curve_fit
from scipy.optimize import curve_fit
popt, pcov = curve_fit(sigmoid, xdata, ydata)
print the final parameters
print(" beta_1 = %f, beta_2 = %f" % (popt[0], popt[1]))
predict using test set
y_hat = sigmoid(test_x, *popt)
Multiple Regression
Linear
from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(train[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION
COMB']])
y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (x, y)
print ('Coefficients: ', regr.coef
)
lý thuyết
Cost function:
\( J(\theta_{0} ,\theta_{1} ) = \frac{1}{2m} \sum_{i=1}^{m}(\hat{y_{i}}-y_{i})^{2}= \frac{1}{2m} \sum_{i=1}^{m}(h_{\theta }(x_{i})-y_{i})^{2} \)
gradient descent
Theta := theta - learningRate * (đạo hàm của hàm cost function theo theta):
\( \theta_{j} := \theta_{j} - \alpha\frac{\partial }{\partial\theta_{j}}J(\theta_{0} ,\theta_{1} ) \)
\( \theta_{0} := \theta_{0} - \alpha\frac{1}{m} \sum_{i=1}^{m}(h_{\theta }(x_{i})-y_{i})\)
\( \theta_{1} := \theta_{1} - \alpha\frac{1}{m} \sum_{i=1}^{m}(h_{\theta }(x_{i})-y_{i})x_{i} \)
Với \( x^{(i)}_{j} \) ta có value of feature j in the \(i^{th}\) training example
\( \theta_{j} := \theta_{j} - \alpha\frac{1}{m} \sum_{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})x^{(i)}_{j} \)
hypothesis function (linear)
\(h_{\theta }(x) = \theta_{0} + \theta_{1}x_{1}\)
\(h_{\theta }(x) = \theta_{0} + \theta_{1}x_{1} + \theta_{2}x_{2} + ... + \theta_{n}x_{n}\)
\(h_{\theta }(x) = \theta^{T}X\)
Feature scaling
\(x_{i}= \frac{x_{i}-\mu_{i}}{s_{i}}\)
trong đó \( \mu_{i}\) là avg(x) và \(s_{i}\) là (max(x) - min(x))
Learning Rate \( \alpha \): nhỏ thì làm chậm trainning mà lớn thì ko ra kq
Normal Equation: ngoài cách dùng gradient descent để dò dần ra \(\theta\) ta có thể tính trực tiếp ra bằng phép tính sau:
\(\theta = (X^{T}X)^{-1}X^{T}y\)
tránh overfitting bằng regularized
(+lamda/m*theta):
\( \theta_{j} := \theta_{j} - \alpha\frac{1}{m} \sum_{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})x^{(i)}_{j} + \frac{\lambda}{m}\theta_{j}\)
hàm normal equation sẽ trở thành:
\(\theta = (X^{T}X +\lambda L)^{-1}X^{T}y \)
Non-Linear
Prepare data
Split to train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)
print ('Train set:', X_train.shape, y_train.shape)
print ('Test set:', X_test.shape, y_test.shape)
msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]
Normalize Data
from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(X).transform(X.astype(float))
Un-supervised learning
Clustering