EDA Exploratory Data Analysis (Objectives (support the selection of…
Exploratory Data Analysis
An approach to analyzing data sets to summarize their main characteristics.
EDA was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments.
EDA is different from initial data analysis (IDA).
Tukey's EDA was related to robust statistics and nonparametric statistics. Tukey promoted the use of five number summary of numerical data.
support the selection of appropriate statistical tools and techniques
suggest hypotheses about the causes of observed phenomena
determine relationships among variables
maximize insight into a data set
detect outliers and anomalies
extract important variables
test underlying assumptions
non-graphical or graphical
univariate or multivariate (usually bivariate)
Typical graphical techniques
Principal component analysis
Typical quantitative techniques
Francis Galton emphasized order statistics and quantiles.
Arthur Lyon Bowley used precursors of the stemplot and five-number summary.
Andrew Ehrenberg articulated a philosophy of data reduction.
John W. Tukey wrote the book Exploratory Data Analysis in 1977.