Please enable JavaScript.
Coggle requires JavaScript to display documents.
BI Stoffübersicht Block 2 (Preprocessing (Coding, Distances, Missing…
BI Stoffübersicht Block 2
Large -scale analytics
Map-reduce
Spark (General)
Spark Ecosystem
Spark Streaming
Discretized Streams
Structured Streaming
Distributed Machine Learning
Machine Learning
supervised
K-NN
SVM
Decision trees
Evaluation
Performance Evaluation
Statistical Significance Testing
unsupervised
Algorithms
k-means
hierarchial clustering
agglomerative vs divisive
min
max
group average
ward's method
self-organizing maps
types of clusters
well-separated
center-based
contiguous
density-based
propertyor conceptual
described by an objective function
distinctions between sets of clusters
inter-cluster similarity
fuzzy/non-fuzzy
partial/complete
heterogenous/homogeneous
exclusive/non-exclusive
Reproducability
Data analytics process
Fayyad's KDD Process
SEMMA
CRISP-DM
ASUM-DM
Preprocessing
Coding
Distances
Missing Values
Scaling
Sparsity