Please enable JavaScript.
Coggle requires JavaScript to display documents.
INTRODUCTION TO DATA MINING - Coggle Diagram
INTRODUCTION TO DATA MINING
PATTERN RECOGNITION vs. DATA MINING
PATTERN RECOGNITION BY HUMAN
PERCEPTUAL ( EMOTIONS, FEELINGS)
SPECIALIZED - DECISION MAKING
PATTERN RECOGNITION BY COMPUTERS
BENEFIT OF AUTOMATED PATTERN RECOGNITION
ADVANTAGE IN COMPLEX CALCULATIONS
PATTERN RECOGNITION FROM DATA ( DATA MINING)
IS A PROCESS OF LEARNING OR OBSERVING THE PAST DATA BY STUDYING THE DEPENDENCIES AND EXTRACTING KNOWLEDGE FROM DATA
WHAT IS DATA MINING ?
EXTRACTION OF KNOWLEDGE FROM DATA
EXPLORATION AND ANALYSIS OF LARGE QUANTITIES OF DATA TO DISCOVER MEANINGFUL PATTERN FROM DATA
DISCOVER KNOWLEDGE
DATA MINING : MOTIVATION
DATA EXPLOSION PROBLEM
HUGE AMOUNTS OF DATA
IMPORTANT NEED FOR TURNING DATA INTO USEFUL INFORMATION
FAST GROWING AMOUNT OF DATA , COLLECTED AND STORED IN LARGE AND NUMEROUS DATABASES EXCEEDED THE HUMAN ABILITY FOR COMPREHENSION WITHOUT POWERFUL TOOLS
ORIGIN OF DATA MINING
TRADITIONAL TECHNIQUES MAY BE UNSUITABLE DUE TO:-
ENORMITY OF DATA
HIIGH DIMENSIONALITY OF DATA
HETEROGENEOUS, DISTRIBUTED NATURE OF DATA
THE EVOLUTION OF DATA MINING
DATA MINING IS A NATURAL DEVELOPMENT OF THE INCREASED USE OF COMPUTERIZED DATABASES TO STORE DATA AND PROVIDE ANSWERS TO BUSINESS ANALYSTS
EVOLUTIONARY STEP
DATA COLLECTION (1960s)
DATA ACCESS (1980s)
DATA WAREHOUSING AND DECISION SUPPORT
DATA MINING
DATA MINING CAN BE USED TO GENERATE AN HYPOTHESIS
WHAT IS KDD?
KNOWLEDGE DISCOVERY IN DATABASES (KDD)
A COMPREHENSIVE PROCESS OF USING DATA MINING METHODS TO FIND USEFUL INFORMATION AND PATTERNS IN DATA
DATA PREPROCESSING
HANDLING INCOMPLETE DATA, NOISY DATA, UNCERTAIN DATA
DATA DISCRETIZATION / REPRESENTATION
TRANSFORM DATA INTO SUITABLE VALUES FOR THE MINING ALGORITHIM TO FIND PATTERNS
DATA SELECTION
SELECTS THE SUITABLE DATA FOR MINING PURPOSES
DATA REDUCTION
REDUCE THE AMOUNT OF CAPACITY REQUIRED TO STORE DATA
DATA MINING : CONFLUENCE OF MULTIPLE DISCIPLINES
MACHINE LEARNING
PATTERN RECOGNITION
STATISTICS
APPLICATIONS
VISUALIZATION
ALGORITHIM
DATABASE TECHNOLOGY
HIGH-PERFORMANCE COMPUTING
WHY CONFLUENCE OF MULTIPLE DISCIPLINES ?
TREMENDOUS AMOUNT OF DATA
ALGORITHIM MUST BE HIGHLY SCALABLE TO HANDLE SUCH AS TERA-BYTES OF DATA
HIGH-DIMENSIONALITY OF DATA
MICRO-ARRAY MAY HAVE TENS OF THOUSANDS OF DIMENSIONS
HIGH COMPLEXITY OF DATA
DATA STREAMS AND SENSOR DATA
TIME -SERIES DATA, TEMPORAL DATA, SEQUENCE DATA
STRUCTURE DATA, GRAPHS , SOCIAL NETWORKS AND MULTI-LINKED DATA
HETEROGENEOUS DATABASES AND LEGACY DATABASES
SPATIAL, SPATIOTEMPORAL, MULTIMEDIA, TEXT AND WEB DATA
SOFTWARE PROGRAMS, SCIENTIFIC SIMULATIONS
NEW AND SOPHISTICATED APPLICATIONS
SOME OF DATA MINING TECHINIQUES
DECISION TREES
NEURAL NETWORKS
ASSOCIATION RULE
NAIVE BAYES
K-NEAREST NEIGHBOUR
ROUGH SET THEORY STATISTICAL METHOD