Statistics

Descriptive Statistics

  • Presentation of data using tables and graphs
  • Characterizing the data using a few but powerful measures

Basic statistical concepts

  • real world problem --> statistical analysis
  • The complete set of the objects, which are subject of the analysis
    is called the population.
  • we are not interested in the population itself, but more in
    the properties of the population measured by one or several
    quantities of interest X

Clasification of Characteristics


Probability Theory

Inferential Statistics

  • Inference about the population (complete data set) on the basis of
    a sample
  • Testing statistical hypothesis, building con dence intervals,
    measuring reliability of the tests and procedures

Additional advanced components

  • Theory of point estimation
  • Nonparametric statistics
  • Large sample theory
  • Bayesian statistics

ratio scale
income, price, turnover, age

absolute scale
quantity, number of students enrolled at a university

qualitative if it has a finite set of possible
realizations

quantitative attributes (for
example, age, income, price)

discrete, if the set of possible realizations is a
countable set.

continuous, if it is has
uncountably many possible realizations.

interval scale
temperature values in Celsius, year of birth

ordinal scale
can be naturally ordered
ranks,grades

Parameters of the variables

location

dispersion measure

  • arithmetic mean
  • modus
  • median
  • quantile
  • range(extremely sensitive to the data)
  • interquartile range(is robust to outliers)
  • mean absolute deviation
  • variance and standard deviation

Sample Variance


Empirical Covariance
sXY = sYX
|sXY|<= sXsY

Corelation - linear
dependence between two variables.
rXY = rY X
|rXY| <= 1

nominal scale
Can say only if equal or not equal

Correlation measures for ordinal data
Idea of the ranks: assign to each observation of the sample x1; : : : ; xn
its position in the ordered sample x(1),. . . ,x(n),


Rank correlation coecient of Spearman



where R(hat) = (n + 1)/2.

Correlation measures for nominal variables

x^2 large, X and Y are dependent.


the contingency coecient of Pearson




corrected contingency coecient of Pearson




The smaller is CKorr, the weaker is dependency. CKorr = 0 only if X and Y are independent

click to edit

ordinal

  • Worse Spearman 
  • Better Kendall's r

metric

  • covariance
  • Bravais-Pearson

Nominal

  • Worse X2
  • Better Ckorr

Discrete distributions

  • Poison
  • binomial
  • hyper geometric

Continuous distributions


  • normal
  • uniform
  • Student's t
  • Chi-square
  • Exponential distribution

click to edit