STATS I

NOMINAL ORDINAL

INTERVAL-RATIO (no numeri con la virgola) contious

NOMINAL

ORDINAL

CATEGORICAL

CATEGORICAL

DEFINITION: 2 or more exclusive categories; no natural order (yes-or-no question)

DEFINITION: clear order of values, however the spacing between the values is not the same (completely agree, partially agree...)

INTERVAL

RATIO

DEFINITION: continuous, the difference is meaningful and the zero is arbitrary or meaningless (quanto ti piace la pasta)

DEFINITION: continuous variable; zero has a meaning and the difference between the intervals is meaningful; numerical data (stipendio)

WEEK 1
inter-quartile range: range of the middle 50% of the data (dividi il dataset in due parti uguali e prendi la median del primo e la median del secondo e poi fai median2 - median1

  • z-score: valore che si usa per testare critical values
  • large vs small: servono a calcolare le percentuali di quanti dati sono a destra o a sinistra della tua linea; la linea vine tracciata dallo z-score
  • si puo calcolare la percentage tra due valori:
    . se sono entrambi dallo stesso lato: smaller portion di entrambi and the biggest-the smaller
    . se invece comprendono il numero della media: larger - smaller OPPURE 1- smaller portion1 - smaller portion 2

measure of central tendency: measure with mode

measure of central tendency: measure with median and mode

measure of central tendency: mean, median, mode

measure of dispersion not possible (no numeri)

measure of dispersion: range and interquartile range

measure of dispersion: range, interquartile range, variance and standard deviation

DISTRIBUTIONS

NORMAL DISTRIBUTION: symmetrical distribution with mean of 0 and SD of 1
α= 90% --> 1.645
α= 95% --> 1.96
α= 99% --> 2.576

WEEK 2
error: the part of the outcome that our model cannot explain

  • outcome= model+ error
  • variables (measured constructs) vs parameters (hypothetical)
  • if the result is below the mean --> the model overestimates vs above --> underestimates
  • how much error is there? tot deviance (= tot error) = xj - mean
  • deviance vs error
  • confidence intervals: we can estimate a range of values which is likely to include the unknown population parameter
  • Central limit theorem: the sampling distribution of sample means is approximately normally distributed --> this even applies when the population is not normally distributed; <30 samples does not work


DO NOT CONFUSE

  • standard error of the population
  • standard deviation of the sample
  • standard deviation of the sampling distribution

SAMPLING DISTRIBUTION: distribution of the sample means

  • distribution of an infinite amount of samples and plotting the mean
  • it is abstract and theoretical

click to edit

WEEK 3

  • null hypothesis significance test: it is statement of no difference, no association, or no treatment effect (Ho); si regetta quando il p value < α value
    • alternative hypothesis: statement of htere is a difference, association or treatment effect (H1)
    • Fisher: tests of significance
    • Neyman, Pearson: tests of acceptance
    • a test statistic is a statistic for which we know how frequently different values occur; compares the probability of the data witg what is expected under the null-hypothesis

TYPE I of error:

  • incorrect rejection of a true null hypothesis
    TYPE II of error:
  • incorrect acceptance of a wrong null hypothesis; more likely to happen: small samples and small effects
  • problems with null hypothesis stats. test: smaller p value means a stronger effect/difference -- wrong; statistical significance is synonymous with theoretical or practical significance -- wrong; a non-significant effort means that the null hypothesis is true -- wrong

T-DISTRIBUTION: se il sample is < di 100; >100 df= normal - T-TEST: without expectation, 2 sided
with expectation one sided. once you start looking at the rejection interval, we use a different one. f.e. for a 5% rejection alpha di one tailed, we look at the 10%

  • t-test is a univariate analysis

WEEK 4 - independent variable: x vs dependent: y
-

CATEGORICAL (both): -CHI-SQUARE it is used to find the statistical significance

  • doesnt work with percentages
  • (as in t-distribution): p value che confronti col critical (alpha) ...
  • Chi-square is bivariate
    assumptions: independent observation, expected frequency must be > 5
  • when the x2 increases, the result is more likely to be significant
  • if the expected frequencies are too small, we merge categories OR Fisher's test (2X2) OR likelihood ratio

click to edit

nominal: Phi - Cramer's V & Lambda

ordinal: Gamma (dichotomus nominal variable) -1, +1 (-1: each pair of data is in disagreement vs +1 is the opposite)

PHI: 0-1; only 2x2 tables

Cramer's V: 0-1; if it is a 2x2 table it is equal to phi 0.1 o < = small; 0.30 o < = medium; 0.50 o < = medium large; > 0.50 = large

Lambda: 0-1; can be interpreted as percentages; how much error in predicting the values of the dependent variable is reduced when you know the values of the independent variable; asymmetrical (it depends on the variable); clear order of the variables: little and much; Lambda does not give you a direction of association: it simply suggests an association between two variables and its strength. if you have one nominal and one ordinal, use LAMBDA. When E1 is equal E2 we do not use lamda, we use phi or cramer V

click to edit

WEEK 5 interval ratio variables COVARIANCE (Pearson's correlation) means the change in one variable, creates the change in the other.

  • 0 positive association between variables, both variables increase or decrease together

  • = 0 no association between variables
  • <o = negative association between variables, if one variable increases, the other decreases (and vice versa)
    -standardized between -1 and +1
  • Pearson's follows its distribution (non normal)
  • to determine confidence interval use bootstrapping
    when pearson=s r is bigger than the critical values, we reject, (the opposite of p value)
    when n<30, we msut ensure than the population vaariables are normally distributed, with a pp plot, they must be in a straight line facing upwards.
    CONTINOUS VARIABLES
    when we have ordinal scales we treat them as continous when they have more than 10 categories


SPEARMAN RHO, measures the strenght and association between 2 variables, used for discrete ordinal variables
CAN ONLY BE USED WITH DISCRETE, MEANING THAT THE NUMBERS MUST BE WHOLE NUMBERS
CAN BE USED WHEN PEARSON ASSUMPTIONS ARE VIOLATED
significant relationship, we can substitute rho to pearson r, and use the same formula for pearson r

TAU B (SPSS ONLY) when we hav many tight ranks, or small sample size -tight ranks occur when obsevrations have the same value, impossible to assign unique rank numbers
usually lower than pearson r and spearman s.
correlation does not mean causality, look at 4 reasons on notes