Statistics
Descriptive Statistics
- Presentation of data using tables and graphs
- Characterizing the data using a few but powerful measures
Basic statistical concepts
- real world problem --> statistical analysis
- The complete set of the objects, which are subject of the analysis
is called the population. - we are not interested in the population itself, but more in
the properties of the population measured by one or several
quantities of interest X
Clasification of Characteristics
Probability Theory
Inferential Statistics
- Inference about the population (complete data set) on the basis of
a sample - Testing statistical hypothesis, building condence intervals,
measuring reliability of the tests and procedures
Additional advanced components
- Theory of point estimation
- Nonparametric statistics
- Large sample theory
- Bayesian statistics
ratio scale
income, price, turnover, age
absolute scale
quantity, number of students enrolled at a university
qualitative if it has a finite set of possible
realizations
quantitative attributes (for
example, age, income, price)
discrete, if the set of possible realizations is a
countable set.
continuous, if it is has
uncountably many possible realizations.
interval scale
temperature values in Celsius, year of birth
ordinal scale
can be naturally ordered
ranks,grades
Parameters of the variables
location
dispersion measure
- arithmetic mean
- modus
- median
- quantile
- range(extremely sensitive to the data)
- interquartile range(is robust to outliers)
- mean absolute deviation
- variance and standard deviation
Sample Variance
Empirical Covariance
sXY = sYX
|sXY|<= sXsY
Corelation - linear
dependence between two variables.
rXY = rY X
|rXY| <= 1
nominal scale
Can say only if equal or not equal
Correlation measures for ordinal data
Idea of the ranks: assign to each observation of the sample x1; : : : ; xn
its position in the ordered sample x(1),. . . ,x(n),
Rank correlation coecient of Spearman
where R(hat) = (n + 1)/2.
Correlation measures for nominal variables
x^2 large, X and Y are dependent.
the contingency coecient of Pearson
corrected contingency coecient of Pearson
The smaller is CKorr, the weaker is dependency. CKorr = 0 only if X and Y are independent
click to edit
ordinal
- Worse Spearman
- Better Kendall's r
metric
- covariance
- Bravais-Pearson
Nominal
- Worse X2
- Better Ckorr
- Poison
- binomial
- hyper geometric
click to edit