Please enable JavaScript.
Coggle requires JavaScript to display documents.
INFERENTIAL STATISTICS - Coggle Diagram
INFERENTIAL STATISTICS
Foundations of Inferential Statistics
Purpose
Test hypotheses
Make conclusions about population using sample data
Nature
Always probabilistic, never certain
Cannot “prove” theories—only reject null hypotheses (Popper)
Why uncertainty exists
Extraneous variables influence dependent variable
Sample ≠ population → sampling error
Evidence supports but never proves
Hypothesis Logic & Significance Testing
Popper’s Logic
Cannot accept alternative hypothesis
Can only reject null hypothesis based on contrary evidence
Fisher’s Significance Rules
p-value = probability result is due to chance
α (significance level) = 0.05 (5%)
Condition for significance: p ≤ 0.05
Type I error = rejecting null when true
Key Statistical Concepts
Sampling distribution = distribution of infinite samples
Standard error = average error across samples
Confidence interval (CI)
General Linear Model (GLM)
Definition
A family of statistical models
Represents linear relationships between variables
Two-Variable Linear Model
Formula: y = β₀ + β₁x + ε
β₁ = slope (effect of x on y)
β₀ = intercept
ε = error (deviation from predicted value)
Multi-Predictor GLM
Formula: y = β₀ + β₁x₁ + β₂x₂ + … + βₙxₙ + ε
Predictors may include
Independent variables
Covariates (control variables)
Dummy-coded nominal variables
Dummy Variables
Represent nominal categories (0 or 1)
Example: gender (0 = male, 1 = female)
For n categories → use n−1 dummy variables
GLM Family of Methods
ANOVA → compare means using dummy predictors
ANCOVA → ANOVA controlling covariates
Multivariate regression → multiple outcomes
MANOVA → ANOVA with multiple outcomes
Structural equation modeling → interconnected regression equations
Importance of Model Specification
Based on theory, not data fitting
Data validates model, does not define it
Two-Group Comparison (t-Test)
Scenario
Treatment vs control
Predictor = dummy (1 = treatment, 0 = control)
Outcome = ratio scale (e.g., test scores)
Hypotheses
H₀: μ₁ ≤ μ₂
H₁: μ₁ > μ₂ (one-tailed)
Understanding Mean Differences
Sample means vary due to sampling error
Use standard error to understand variability
Overlapping confidence intervals → not significant
Non-overlapping intervals → significant
t-Statistic
1.4.4.1 Formula: t = (X̄₁ − X̄₂) / SE
SE = √((s₁²/n₁) + (s₂²/n₂))
Interpreting Results
Compute p-value from t
df ≈ n₁ + n₂ − 2
If p < 0.05 → reject H₀
Effect Size (ES)
Use regression: y = β₀ + β₁x
β₁ = effect size
For dummy variable, ES = difference in group means
Factorial Designs (Two-Way ANOVA)
Example
Curriculum type (special/traditional)
Instruction time (3 or 6 hours)
2 × 2 factorial design
GLM Equation
y = β₀ + β₁x₁ + β₂x₂ + β₃x₁x₂ + ε
Effects
Main effects: β₁, β₂
Interaction effect: β₃
Interpretation Rules
If β₃ is significant → interaction exists
Main effects cannot be interpreted when interaction is significant
Adding Covariates
New predictors (β₄, β₅…)
Interpretation same as other predictors
Other Inferential Techniques (Brief Overview)
Factor Analysis
Reduces many observed items → fewer latent factors
Used to test convergent & discriminant validity
Discriminant Analysis
Classifies cases into nominal groups
Similar to regression but DV is categorical
Logistic Regression
Outcome = binary (0/1)
Predicts probability of event
Effect size measured by odds ratio
Probit Regression
Outcome between 0 and 1
Assumes normal distribution
Used in finance, insurance, credit scoring
Path Analysis
Examines directional relationships among variables
Uses interconnected regression equations
Time Series Analysis
For variables changing over time
Used for forecasting (markets, crime, etc.)
Corrects for autocorrelation