Please enable JavaScript.
Coggle requires JavaScript to display documents.
Unsupervised [Learning), Used widely in analytical science as such model…
Unsupervised [Learning)
-
ALVIN IMRAN AirBnB
Hierarchical Techniques
-The original data are separated into a few general classes. Each with still smaller groups until finally individual object is only remain.
-This techinque is very popular because their application leads to production of dendogram that can provide a two-dimensional pictorial presenting clustering process and final result.
-
Dimensionality Reduction
Dimensionality reduction methods aim to reduce the number of variables or features in a data set while preserving its essential structure.
Choice of Variable
-
-
Standardization is worthwhile but can omit the differences between objects that are required for classification
Principle component analysis produces a set of new, statistically independent variables
hazim
-in the case of unsupervised pattern recognition, often referred to as cluster analysis or numerical taxonomy, no class knowledge is available and no assumptions need be made regarding the class to which a sample may belong.
Supervised pattern : A training set is identified with which the parent class or group of each sample is known.
Danish
aim is to ensure that similar objects are clustered together with minimal separation between objects in a class or cluster, whilst maximizing the separation between different clusters.
Measures between objects
- Calculation of a matrix of similarities or dissimilarities between the objects.
- both the number of discrete clusters observed and the cluster membership, may depend on the similarity metric used.
- Similarity and distance between objects are complementary concepts for which there is no single formal definition.
- In practice, distance as a measure of dissimilarity is a much more clearly defined quantity and is more extensively used in cluster analysis.
Distance Measure
It is a measure only of colinearity between
variates and takes no account of non-linear relationships or the absolute
magnitude of variates.
Eg Euclidean distance,
Mahalanobis distance
-
Similarity measures
- associated with cluster analysis, commonly used is the colleration coefficient.
- the use of similarity or association coefficients in cluster analysis, specifically focusing on the correlation coefficient as the most commonly used measure.
- Other similarity measures are seldom employed due to their poor definition and lack of mathematical analysis
‘The data set consisiting of the original, or suitably processed, analytical data
characterizing our samples is first converted into some corresponding set of
similarity, or dissimilarity, measures between each sample.
one of the method of unsupervised is clustering.It clustering the group of data based on their similarities or inheren patternThe most well known is k-means algorithm in which partition the data into predefined number of cluster-Azmin
Hierarchical clustering, also known as hierarchical cluster analysis (HCA), is an unsupervised clustering algorithm that can be categorized in two ways they can be agglomerative or divisive. Agglomerative clustering is considered a “bottoms-up approach.” Its data points are isolated as separate groupings initially, and then they are merged together iteratively on the basis of similarity until one cluster has been achieved.
In contrast, divisive clustering starts with a single cluster, containing all samples, which is successively divided into smaller
partitions.
Agglomerative methods begin with the computation of a similarity or distance matrix between the objects, and result in a dendrogram illustrating the successive fusion of objects and groups until the stage is reached when all objects are fused into one large set.
-
Step 2. Find the smallest elements in the distance matrix and join the corresponding objects into a single cluster
Step 3. Calculate a new distance matrix, taking into account that clusters produced in the second step will have formed new objects and taken the place of original data points
Step 4. Return to Step 2 or stop if the final two clusters have been fused intothe final, single cluster
-
- Used widely in analytical science as such model fitting and hypothesis testing, data exploration, and data reduction
- The data set consisiting of the original, or suitably processed, analytical data characterizing our samples is first converted into some corresponding set of similarity, or dissimilarity, measures between each sample.