RMA

General

Central tendency

Statistics

Population - Parameter

Sample - Statistic

Descriptive: done first, e.g. means, outliers
Inferential: after, to answer research question

Variables

Continuous variables -> measurement / quant data -> means, variance, SD

Discrete variables -> categorical data -> percentages, frequencies ->

IV: manipulated by researcher
DV: measured

Scales

Distributions

Nominal: categories, no sequence

Ordinal: categories, sequence

Interval: categories, sequence, same size interval, NO absolute/true zero, 0 doesn't mean absence

Ratio: categories, sequence, same size interval, absolute/true zero - no negative numbers

Standard normal
distribution

Bimodal / multimodal

Positive skew VS negative skew

Kurtosis: Leptokurtic = high peak, platykurtic = flat

Frequency distributions & histograms: score (x) + frequency (f)

Central tendency

Mode: most common
(average if adjacent)

Median: mid-point
score at (N + 1)/2

Mean: average

  • Used algebraically
  • Influenced by extreme scores

Variability

Degree scores vary from average

Average deviation: deviation from mean, average => 0

Mean absolute deviation (m.a.d.): absolute deviation, average

Variance s2 | σ2 = sum of squared deviations from mean / N - 1

  • Mean squared deviation score
  • Sample variance smaller than pop variance

Standard deviation s | σ = square root of variance

Normal : unimodal, symmetrical, highest frequency in middle, relative frequency decreases towards tail

Standard normal: mean = 0, SD = 1

z-score: convert score into SD units
z = X - X̄ / s

Hypothesis testing

Sampling error

  • Difference between pop & sample mean
  • Variability due to chance

Sampling distribution

  • Degree of variability between samples expected by chance
  • Frequency distribution of sampling means
  • Sampling distrib. mean = population mean

Standard error

  • SD of set of means / sampling distribution
  • How much pop & sample means vary
  • Estimate of sampling error
  • Large samples -> more info -> less standard error

Null hypothesis H0

  • No difference / relationship
  • From same population

Given null hypothesis is true, what is probability of results?
Check against rejection / significance level

  • p < .05 - reject null hypothesis
  • p >= .05 - fail to reject null hypothesis - there is no difference
  • If χ2/t//F > critical value -> reject null hypothesis
  • If χ2/t//F <= critical value -> fail to reject null hypothesis

Symbols

μ = population mean
σ = population SD
X̄ = sample mean
S = sample SD

Type I error - α

  • False positive, say difference when none
  • = p, H0 true, but reject
  • Under researcher's control

Type II error - β

  • False negative, say no difference when there is
  • H0 false, but don't reject

One & two-tailed tests

  • Identify before collecting data
  • One-tail: predict direction
  • Two-tail: no direction (there will be a difference)
    ** Split p between tails

Sample | descriptive statistic: characteristic of sample

  • e.g. SD, mean
    Test | inferential statistic: assoc. with stat. procedure
  • e.g. t-statistic, z-score

z-score

Chi-square
Frequency / Categorical

Linear Relationships

Mean Differences

Compare sample mean to population mean
(pop mean & pop SD known)
z = mean diff / (pop SD / SQRT n)

Compare individual score to mean
z = X - X̄ / s

Two-way Test for Independence

  • Two categorical variables
  • e.g. is lemonade preference dependent on gender
  • Expected = row total * column total / total participants
  • df = (R - 1)(C - 1)

Correlation

Scatterplot

  • Visual representation
  • Must be linear

t-statistic

  • Ratio of mean diff /
    mean diff by chance (standard error)

Matched-Samples t-test

  • Sample means of 1 group tested on 2 occasions
  • 1 categorical DV (pre/post); 1 continuous DV
  • e.g. does treatment reduce impulsiveness?
  • t = mean diff / (SD diff / SQRT N)
  • d = pre mean - post mean / pre SD

Independent Samples t-test

  • Sample mean of 2 groups
  • 1 categorical IV (2 levels); 1 continuous DV
  • e.g. are men more aggressive than women?
  • t = mean diff / SQRT( s21 / n1 + s22 / n2)
  • Use pooled variance if uneven groups
  • CI 95% = (mean diff) +- critical t * SE of mean diff (bottom half of t)
  • d = mean diff / SQRT pooled variance

ANOVA

  • Difference between means with 3+ groups
  • One-way: 1 categorical IV (3+ levels); 1 continuous DV
  • Two-way factorial: 2 categorical IV; 1 continuous DV
  • Use post-hoc after if significant

Represents 'centre' of distribution

Confidence limits on z - 95%

  • 2.5% from each end
  • Mean +- 1.96*SD

Is difference between | treatment (treatment + chance) > within | error variance (chance alone)?
F = 1 means no treatment effect

Conditions

  • Homogeneity of variance: scores spread equally around mean
  • Normality: scores normal distribution
  • Independence of observation: scores unrelated
    If violated, be cautious & use larger sample size

Sum of squares

  • SS treat: n x [(treatment mean - grand mean)2, then sum]
  • SS error: (Individual score - group mean)2, then sum - for each data point
  • SS total: SS treat + SS error
    Degrees of freedom
  • df treat = k - 1
  • df error = k(n - 1)
  • df total = N - 1
    k = no. treatments, n = in each group, N = total participants
    Other
  • MS (2) = SS / df
  • F (1) = MS treat / MS error
  • MS = variance
  • If F > critical F value in table, then reject null hypothesis - there is difference somewhere between groups
  • Effect of treatment is F times greater than error within groups
  • F = 1 means no treatment effect

Effect size

  • How much of variability due to difference between treatments
  • η2 = SS treat / SS total
  • From 0 - 1
  • η2 % of variation in A can be attributed to difference in B

Error rate per comparison (EC) vs Family-wise (FW)

  • EC = alpha = .05
  • FW = 1 – (1 – α)c = prob. of making at least 1 Type I error
    More analyses -> more chance of Type I error

A Priori vs Post Hoc Tests

  • A priori = setup before ANOVA, test specific hypothesis
  • Post hoc = conducted after significant ANOVA, with 3+ groups - which groups differ

Studentized Range Statistic (q)

  • Use diff between largest & smallest mean to calc q
  • If q > critical q, reject null hypothesis
    OR
  • Calc minimum mean difference for significance (adjust r to test diff means)
  • Lookup q table with r (no. means) and df error

Tukey's Honestly Significant Difference Test

  • Use max r -> max q -> larger mean diff required -> harder to reach significance -> more conservative
  • Each mean diff compared to higher r - FW constant over all comparisons

Two-way ANOVAS

  • IV = factors, factors have levels
  • e.g. age (3 levels), gender (2 levels) = 3x2 factorial OR two-way ANOVA

Effects

  • Main 1: ignore factor 2, average, effect regardless of factor 2
  • Main 2: ignore factor 1
  • Interaction: effect of 1 factor on DV is not same at all levels of other factor (depends on levels of 2nd IV)
    ** Lines parallel -> no interaction
    ** Be cautious about interpreting main effect if signification interaction

Sum of squares

  • SS total: every score with grand mean
  • SS A: mean of both A groups with grand mean x n x number of B levels
  • SS cells: cell mean with grand mean * n
  • SS AB = SS cells - SS A - SS B
  • SS error = SS total - SS cells
    Degrees of freedom
  • df A = levels - 1
  • df AB = df A * df B
  • df error = df total - df A - df B - df AB
  • df total = N - 1
    MS (4) = SS / df
    F (3) = MS / MS error

One-Sample t-test

  • Compare single sample mean to pop mean (pop SD unknown)
  • Continuous
  • e.g. does mean impulsiveness differ from pop mean?
  • t = mean diff / (sample SD / SQRT n)

Central limit theorem

  • Standard error = SD / SQRT n
  • Larger sample -> less error -> lower standard error
  • Changes from z to t when estimating SE using sample SD

Cohen's d

  • Difference in SD units
  • .20 = small, .50 = medium, .80 = large

Degrees of freedom

  • Once abc for all categories except 1 is set, abc for last category is auto determined

General

  • For counts (e.g. no. males vs no. females)
  • Null hypothesis = no difference between categories
  • Higher chi-squared => greater discrepancy between O & E
  • Chi-squared = SUM [ (O - E)2 / E ]
  • If > critical value => reject null hypothesis

One-way Goodness of Fit

  • One categorical variable
  • e.g. lemonade preference
  • df = C - 1

r-family effect size

  • Magnitude of the effect (how large, meaningful, practically significant?)
  • Magnitude of relationship between variables

Phi (correlation coefficient)

  • 2x2 tables, use obtained chi
  • = SQRT(chi-square / N)
  • .10 = small, .30 = medium, .50 = large

Cramer's V

  • Larger tables
  • = SQRT(chi-squared / N(k - 1) )
  • k = smaller of R & C
    Larger sample -> same effect size, more significant chi

Continuous, can be categorical

Degree of linear association / relationship between 2 variables
Scores, not groups

Correlation coefficient

  • Strength / Degree: -1 to 1, amount of scatter, 0 = random, 1 or -1 = perfect
  • Direction: - or +
  • Pearson's correlation coefficient r =
    covariance (extent vary together) /
    SD of X & Y (extent vary separately)
  • Adjusted r: b'cos smaller samples overestimate population correlation

Regression

Coefficient of determination r2

  • Proportion of variance accounted for in 1 variable by other
  • x% of variability in Y can be explained by X

Significance test - t-statistic

  • df = N - 2

Factors effecting correlation

  • Large N -> trivial correlations may be statistically significant
  • Outliers
  • Range restriction
  • Heterogenous samples (between groups)

Regression line

  • Line of best fit ('centre' of relationship)
  • Used for prediction - predict Y based on X
  • X = predictor = IV = e.g. anxiety = x-axis
  • Y = criterion = predicted = DV = e.g. negative mood = y-axis
  • Stronger correlation -> more reliable prediction
  • Bivariate (simple) - 1 predictor variable VS multivariate

Equation
Y hat = bX + a

  • b = slope = cov / var x = difference in Y associated with 1 unit difference in X = as X increases by 1, Y increases by slope = degree of association
  • a = intercept = mean Y - b * mean X = score on DV when IV is 0 = where line crosses Y, X is 0

Standard error of estimate
SE = SDy x SQRT(1 - r2)


Confidence intervals
CI(Y) = Y hat +- critical t * SE of estimate