Please enable JavaScript.
Coggle requires JavaScript to display documents.
DATA MINING (DATA MINING >> KNOWLEDGE DISCOVERY IN DTBASE (DM…
DATA MINING
DATA MINING >> KNOWLEDGE DISCOVERY IN DTBASE
KDD
6 STEPS
EVALUAN./ INTERPRETAN.
SELECTION
DATA
DATA MINING (20%)
PATTERN
TRANSFMN
REDUCN. VOL
PCA
CLASSIFIN.
PRE-PROCESSING (80%)
TARGET DATA
5 STEPS
TRANSFMN
NORMALIZAN.
SCALE IN RANGE
SCALE BY MEAN + STD
USEFUL W OUTLIERS / MIN - MAX UNKNOWN
GENERALIZATION
AGGREGATION
VARIABLE CONSTRUCN. / FEATURE EXTRACN.
REPLACE EXISTING VARIABLES
ADD NEW VARIABLES FROM EXISTING ONES
REDUCN.
REDUCE NO. VALUES
BINNING: GROUP INTO INTERVAL
CLUSTERING
AGG. / GENERALN.
REDUCE NO. OBS
SAMPLING
REDUCE NO. VARIABLES
AGGREGAN.: QUARTER >> YEAR
REMOVE IRRELEVANT ATTRIBUTES
PCA
INTEGRAN.
DISCRETIZAN.
REPLACE NUMERICAL VARIABLES W NOMINAL
CLEANING
MISSING VALUES
IGNORE
IMPUTAN. METHODS
FILL IN
2 more items...
OUTLIERS
IDENTIFY + SMOOTH OUT NOISY DATA
CLUSTERING
REGRESSION
BINNING
1 more item...
DISTRIBUTION BASED METHODS
CORRECT INCONSISTENT DATA
DOMAIN KNWLDGE
EXPERTISE
WHY?
INCOMPLETE
NOISY
INCONSISTENT
BIASED
OVERALL PROCESS DISCOVER USEFUL KNLEDGE
DM
MEASUREMT. SCALES
RATIO
NOMINAL
LABELS WTH ORDERS
BLOOD TYPES, GENDER, COLOR
INTERVAL
ORDINAL
NEED ORDER BUT UNCALCLAB.
RATING
DATA TYPES
QUAL.
QUANT.
COUNT
CONTINUOUS
DISCRETE
ALGORITHM >> PATTERNS
BIG DATA
EDX