Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Mining - Explain what is meant by data mining. (Phases of Data Mining…
Data Mining - Explain what is meant by data mining.
What is data mining?
Definition
The automatic analysis or sorting of large data sets or big data sets or big data in a data warehouse. Pattern recognition/algorithms used to identify patterns/correlations and to predict trends/relationships. Data is combined from multiple sources.
Combines the applications of AI, statistics and database systems in the analysis of groups of structured and unstructured data sets which prove difficult to analyse using traditional means.
Main aim of data mining is to extract information from a data set and transform it into an appropriate format for future use.
Approaches to Data Mining
Associations
Aim
To discover where one event tends to lead to another event.
Example
Finding that people who buy one type of kitchen appliance tend to upgrade one of their other appliances within six months.
Sequences
Aim
To find new ways to group and classify data into groups.
Example
Defining social groups of consumers in non-traditional ways, e.g. rather than using age, social class, gender etc., using patterns of purchases to define consumer types, e.g. 'trendsetters'.
Clustering
Aim
To find a correlation between two between events.
Example
Discovering that customers who buy certain types of cars tend to have holidays in certain places.
Data Warehouse
Definition
A data warehouse comprises a quantity of data stored on a database server.
The data may be drawn from other systems, but stored in a consistent form for processing.
The data is non-volatile (Once stored it can not be changed).
The data is used to support organisational decision-making.
What is involved in creating a data warehouse?
The data is extracted from various transaction processing systems and other sources.
The data is processed, so that all data types are consistent and all data conforms to designed standards. The data is then 'cleaned' to ensure that there are no errors, inconsistencies or redundancies.
The data is loaded into and logically stored within the data warehouse.
Phases of Data Mining
1) Problem Definition - This initial phase of data mining project focuses on understanding the project objectives and requirements.
2) Data Gathering and Preparation - The data understanding phase involves data collection and exploration. As you take a closer look at the data, you can determine how well it addresses the business problems. You might decide to remove some of the data or add additional data. This is also the time to identify data quality problems and to scan for patterns in the data.
3) Model Building and Evaluation - In this phase, you select and apply various modelling techniques and calibrate the parameters to optimal values.
4) Knowledge Deployment - Knowledge deployment is the use of data mining within a target environment.