Please enable JavaScript.
Coggle requires JavaScript to display documents.
Learning for Parallel and Distributed Clustering Algorithms (Apt Distance…
Learning for Parallel and Distributed Clustering Algorithms
Platforms for Parallel and Distributed Computing
GPU, MapReduce, Hadoop, SPARK CloudStack, MPI, OpenMP
Parallel & Distributed System Validations
Speedup, Scaleup, Number of clusters v/s number of machines, Runtime, Time comparison for objects, Distance threshold & number of nodes
Apt Distance and Similarity Measures for Parallel and Distributed Clustering
Distance Measures
Standardized Euclidean, Cosine Distance, Pearson Correlation Distance
Similarity Measures
Jaccard Similarity, For data of mixed type
Apt Evaluation Indicators for Parallel & Distributed algorithms
Internal
Dunn Indicator, Silhouette Coefficient
External
Rand Indicator, F Indicator, Jaccard Indicator, Confusion Matrix
Popular Algorithms Implemented Parallel and Distributed for Large Data
K-means, BIRCH, CLARA, CURE, DBSCAN and Wavecluster
Algorithms Meeting all Criterion for Parallel and Distributed Implementation
CURE, STING
Characteristics Possessed by Clustering Algorithm for Parallel & Distributed Implementation
Low time-complexity, Highly scalable, support for large & high dimensional data,
suitable to arbitary data set, insensitive to noise & outliers, order insensitive
Challenges for Parallel and Distributed Implementation
Work-efficient algorithms do not exhibit massive parallel or distributed behavior, Algorithms with ample parallel or distributed behavior but questionable numerical stability and catastrophic load imbalance due to highly non-uniform data distribution