Please enable JavaScript.
Coggle requires JavaScript to display documents.
Intell. Systems (Data Mining) (Questions (Phases of CRISP-DM Life Cycle,…
Intell. Systems (Data Mining)
Phases of CRISP-DM Life Cycle
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Make use of the models created.
Evaluate the one or more models for quality and effectiveness.
Determine whether the model in fact achieves the objectives set
Come to a decision regarding use of the data mining results.
Select and apply appropriate modeling techniques.
Calibrate parameters to optimize results
Several different techniques may be used for the same problem.
If necessary, loop back to the data preparation phase
Prepare the final data set
Select the records and variables you want to analyze
Perform transformations on certain variables
Clean the raw data
Initial data collection
Exploratory data analysis
Identification of the data quality problems.
The project objectives and requirements understanding
Data mining problem definition.
Prepare strategy for achieving these objectives.
Data Mining Tasks
Description
Clustering
Estimation
Prediction
Classification
Association
Type of Variables
Categorical (Qualitative)
Nominal
Ordinal
Numerical (Quantitative)
Interval
Ratio
Data Preprocessing
Data Integration
Possible Problems
Same person different spellings
Same person, different addresses
Homonyms
Synonyms
Different metrics
Schema Errors
Redudancy
Obtain and collect data from various sources
Data Cleaning
Tasks in data cleaning
Detect error
Fill in missing values
Smooth noise
Identify outliers
Correct inconsistency
Data Transformation
Smoothing
Binning, Clustering, Regression
Generalization
Data Selection
Normalization
Min-max normalization
Z-score normalization
Normalization by decimal scaling
Example :warning:
Data Reduction
Sampling
Data cube aggregation
Dimension Reduction
Data compression
Numerosity Reduction
Discretization and Consept Hierarchy Generation
Exploring Data
Descriptive Statistics
Summary Statistics
Frequencies and Mode
Quartiles, Percentiles
Measures of Location (Central Tendency)
Mean
Types
Arithmetic
Weighted
Trimmed
Geometric
Harmonic
Median
Measure of Spread (Dispersion)
Range
Standart deviation
Variance
Multivariate
Covariance
Correlation
Skewness and Kurtosis
Data visualization
General Concepts
Representation
Arrangement
Selection
Tecniques
Histograms
Box Plots
Scatter Plots
Contour Plots
Star Graph
Chernoff Faces
Stem Graphs
Leaf Graphs
Clustering
Partition Based
K-means
DBSCAN
Example :warning:
Supervised – Unsupervised Learning
Classification
Classification Techniques
OLAP vs Data Mining
OLAP
The OLAP analyst generates a series of hypothetical patterns and relationships and uses queries against the database to verify them or disprove them
OLAP analysis is essentially a
deductive process.
Data Mining
Data mining is different from OLAP because rather than verify hypothetical patterns, it uses the data itself to uncover such patterns
It is essentially an
inductive process.
Questions
Phases of CRISP-DM Life Cycle
Data Mining Tasks
Olap vs Data Mining
Normalization
KNN- Example
Bayesian Classifiers example
K-means example