Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Mining Final Exam (Spring 2017) (Supervised Learning Clustering…
Data Mining Final Exam (Spring 2017)
Supervised Learning
Clustering
Density-based
Hierarchical
Grid-based
Partitioning
:star: K-Mean
2
Compute mean points
3
Assign objects to closest mean point
1
Partition object into subsets
4
Go to step 2 until there are no chnages
:star: K-Mediod
:red_cross: CLARANS
Clusters Distances
:star: Centroid
:star: Complete-link
:star: Medoid
:star: Single-link
:star: Average
:star: Clustering Feature
Unsupervised Learning
Classifications
Model Construction
Decision Tree
:star: ID3
Attribute Selection
Entropy
Conditional Entropy
Information Gain
Split
:red_cross: CART: Gini Index
:star: C4.5
Gain/Split Info
Mathematical Formula
:red_cross: Bayes Theorem
:star: Naïve Bayes
Classification Rule
#
Model Usage
Classify new data
Estimate Accuracy
Cross-validation
:!: Bootstrap
Random Sampling
:star: Confusion Matrix
Preprocessing
Cleaning
Incomplete
Ignore
Fill Automatically
Mean of attr
Mean of cluster
Global constant
Fill Manually
Noisy
Binning
By median
By boundries
By mean
Regression
Clustering to remove outliers
Human inspection
Inconsistent
Intentional
Integration
:star: Covariance Analysis
Correlation Analysis
:star: Nominal: Chi-square
:star: Numerical: Coefficient
Reduction
Numerosity
Parametric
Regression
Non-parametric
Histograms
Clustering
#
Sampling
Without Replacement
With Replacement
Random
Stratified
Data Compression
Lossless
Lossy
Dimensionality
Principle Component Analysis
Feature Subset Selection
Information Gain
Decision Tree
#
Wavelet Transform
Data
Transform
Feature Construction
Aggregation (Data Cube Const.)
Smoothing
Normalization
:star: Min-Max
:star: Z-Score
:star: Decimal Scaling
Intervals (e.g., age)
Discretization
Binning
#
Histogram Analysis
Clustering Analysis
Decision Tree
Correlation