Please enable JavaScript.
Coggle requires JavaScript to display documents.
EDA Exploratory Data Analysis (Objectives (support the selection of…
EDA
Exploratory Data Analysis
Introduction
An approach to analyzing data sets to summarize their main characteristics.
EDA was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments.
EDA is different from initial data analysis (IDA).
Tukey's EDA was related to robust statistics and nonparametric statistics. Tukey promoted the use of five number summary of numerical data.
Typicl Language
R
SAS
Python
Matlab
Objectives
support the selection of appropriate statistical tools and techniques
suggest hypotheses about the causes of observed phenomena
determine relationships among variables
maximize insight into a data set
detect outliers and anomalies
extract important variables
test underlying assumptions
Classification
non-graphical or graphical
univariate or multivariate (usually bivariate)
Typical graphical techniques
Box plot
Scatter plot
Histogram
Dimensionality reduction
Principal component analysis
Typical quantitative techniques
Median polish
Trimean
Ordination
History
Francis Galton emphasized order statistics and quantiles.
Arthur Lyon Bowley used precursors of the stemplot and five-number summary.
Andrew Ehrenberg articulated a philosophy of data reduction.
John W. Tukey wrote the book Exploratory Data Analysis in 1977.