Please enable JavaScript.
Coggle requires JavaScript to display documents.
Bachelor Thesis: Algorithms of Machine Learning - Coggle Diagram
Bachelor Thesis: Algorithms of Machine Learning
Logistic Regression
used to find the discrete dependent variable from the set of independent variables
goal: find the best set of parameters
each feature is multiplied by a weight and then all are added
generates the coefficients to predict a logit transformation of the probability
result is passd to stigmoid function which produces the binary output
Linear Regression
used in which value of dependent variable is predicted through independent variables
relationship is formed by mapping the dependent and independent variable on a line
line is called regression line which is represented by
Y = a*X + b
Y = dependent variable (e. g. weight)
X = Independent Variable (e. g. height)
b = Intercept and a = slope
Decision Tree
belongs to supervised learning algorithm
can be used to classification and regression both having a tree like structure
best attribute of dataset is placed at the root, then training dataset is split into subsets
splitting of data depends on the features of datasets. This process is done until the whole data is classified and we find leaf node at each branch
Information gain can be calculated to find which feature is giving us the highest information gain
are built for making a training model which can be used to predict class or the value of target variable
Support vector machine
a binary classifier
raw data is drown on the n-dimensional plane
separating hyperplane is drawn to differentiate the datasets. The line drawn from centre of the line separating the two closest data-points of different categories is taken as an optimal hyperplane
This optimised separating hyperplane maximizes the margin of training data. Through this hyperplane, new data can be categorized
Naive-Bayes
technique for constructing classifiers
based on Bayes theorem used even for highly sophisticated classification methods
learns the probability of an object with certain features belonging to a particular group or class
probabilistic classifier
In the method: occurence of each feature is independent of occurence another feature
needs small amount of training data for classification, and all termes can be precomputed thus classifying becomes easy, quick and efficient
KNN
used for both classification and regression
simplest method of machine learning algorithms
stores the cases
for new data it checks the majority of the k neighbours with which it resembles the most
makes predictions using the training dataset directly
K- means Clustering
unsupervised learning algorithm
used to overcome the limitation of clustering
To group the datasets into clusters initial partition is done using Euclidean distance
Assume if we have k clusters, for each cluster a centre is defined. These centres should be far from each other
Then each point is examined thus added to the belonging nearest cluster in terms of Euclidean distance to nearest mean, until no point remains pending
A mean vector is re-calculated for each new entry. The iterative relocation is done until proper clustering is done.
Thus for minimizing the objective squared error function process is repeated by generating a loop
final results are
The centroids of the K-clusters, which are used to label new entered data
labels for the training data
Random Forest
supervised classification algorithm
Multiple number of decision trees taken together forms a random forest algorithm i. e. the collection of many classification tree
can be used as classification as well as regression
1.Each
decision tree
includes some rule based system. For the given training dataset with targets and features, the decision tree algorithm will have set of rules
in contrast to decision trees: no need to calculate information gain to find root node
one rule: each randomly created decision tree to predict the outcome and stores the predicted outcome
Futher it calculates the vote for each predicted target
high voted prediction is considered as the final prediction from the random forest algorithm
Dimensionality Reduction Algorithms
used to reduce the number of random variables by obtaining some principal variables
types
feature extraction
feature selection
can be done by PCA
a method of extracting important varaibles from large set of variables
extracts the low dimensionality set of features from high dimensional data
Gradient boosting and Ada Boost Algorithms
clustering (k-means), unsupervised learning, supervised learning, reinforcement learning