Please enable JavaScript.
Coggle requires JavaScript to display documents.
PhD Thesis: Addressing the Inferential Limitations of Regression Models by…
PhD Thesis: Addressing the Inferential Limitations of Regression Models by using Path Analytic Models with Mediated Associations
Write three papers exploring and demonstrating that using regression for inference where even weak/moderate collinearity is present is flawed.
Demonstrate and discuss how we can make more accurate and robust estimates of model parameters by addressing problems associated with collinearity between variables by taking into account bidirectional effects of associated predictor variables using path analytic models with mediated associations.
Key Messages Across 3 x Papers
Using regression models for inference in the presence of even weak or moderate collinearity is flawed.
As collinearity between predictors increases, the accuracy of Parameter Estimates diverge from true values
Bridge the gap between regression and path analytic models for applied researchers
Multicollinearity
Detecting Multicollinearity using Bivariate Correlations
The r = |0.7| threshold for removing correlated variables is a flawed approach to dealing with multicollinearity.
The general approach of removing variables after evaluating pairs of bivariate correlations (irrespective of threshold) is a flawed approach.
Variable Transformation
Linearly Combining Variables Together
Leads to not being able to easily interpret effects
Data Reduction Techniques
Such as Principal Components Analysis
What problems does multicollinearity cause in regression and why?
Detecting Collinearity with Variance Inflation Factor (VIF)
Inference with Regression
Partial Regression Coefficients
How do we determine if our parameter estimates are good estimates?
True Value contained within 95% Confidence Interval
Deviation from True Value: Beta - b
True Score Variance
Bias
Variance
Efficiency
Consistency
MSE
Standardisation of Model Variables
Covariance between Covariates
Path Analytic Models
with Mediated Associations
Multiplicative Path Tracing
Mediated Associations
Using mediated associations deals with the problems associated with collinearity for all samples sizes, effect sizes, and all values of collinearity.
Demonstrate the scalability and generalisability of the mediated associations approach using complex simulated and real world path diagrams
What happens to the intercept?
What happens to errors?
What happens to prediction?
Effect Size and Sample Size
Why is the problem worse for larger effects?
Explain mathematically why this is the case.
Explain Up-tick and Down-tick Behaviour
Compounding Errors
Explore compounding errors in path analytic models when we have more than one endogeneous variable. Are compounding errors impacted by collinearity? Can mediated associations improve parameter estimates and reduce compounding errors?
Moderation and Mediation
How do we fit models to include moderation and mediation
Moderation (or interaction terms) alone do not address the issues associated with collinearity
Explore interaction effects in regression and moderation in path analysis and critically examine the effects of collinearity and mediated associations on parameter estimates
Papers
Paper 1: Inference with Regression in the Presence of Multicollinearity
Paper 2: Inference with Path Analytic Models in the Presence of Multicollinearity
Paper 3: Reproducible Research for Regression and Path Analytic Models in the Presence of Multicollinearity
Reproducible Research
Statistical Power
Sample Sizes
Use Simulated Examples to demonstrate what each approach (regression vs mediated associations) would recommend to achieve 80% Statistical Power and the recommended sample size
My expectation is that regression would achieve 80% power with a smaller sample size, but in reality a much larger sample size is required.
Producing non-reproducible research outcomes.
Could these ideas link to the reproducibility crisis?
How does how we estimate parameters in regression versus with path analysis with mediated associations impact on how reproducible studies are?
What are the wide-spread impacts on using regression for inference? Are we underestimating sample sizes and impacting on the statistical power of research studies by relying on regression for inference?
Start the Story with Multiple Regression
and Path Models. What are they used for?
Prediction
Inference
Predictive vs Inferential Models
Goal of prediction is to...
Goal of inference is to...
Path Diagrams and Notation
Use Simulated Examples to demonstrate the approach and the improved results
Multiple Regression
Path Analytic Models
Use Real World Examples to demonstrate the improved results and the inaccurate implications of making flawed/inaccurate estimates about model parameters.
Multiple Regression
Path Analytic Models
Common ways that parameter estimate accuracy and reliability is measured
Sample size
Effect Size
All impact the accuracy of parameter estimates in regression and path models
Severity of Collinearity
Biasedness
Consistency
Precision
Efficiency
Robustness to deviations from assumptions
Type I and II Error Rates
Mediation and Moderation
What are the criticisms of mediation? Who out there is criticising it?
Bi-directional mediation is NEW
GAP
There is the belief that regression handles collinearity well, we should try to identify these sources and cite them in the literature review.
GAP
We show that regression doesn't handle collinearity well, on the contrary
People have shown that a threshold for remove a variable is |0.7|, we show that problems occur far earlier, and under some circumstances that collinearity can be problematic as early as r = |0.2|
Collinearity
Methods for Addressing
Disadvantages
GAP
We introduce a new method with none of these drawbacks
new method is path analytic models with mediated associations
Methods for Detecting
History of Multiple Regression and Path Analytic Models
Laplace 1783
Legendre 1805
Gauss 1809
Galton 1886
Pearson and Yule 1900?
Fisher 1930?
Foundational work on the CLT and that the measurement of errors followed a normal distribution, lead to Gauss's theory of errors (motion of Ceres)
Multiple Regression
What is it?
Prediction
Inference
Path Analytic Models
An extension of multiple regression
Many Interrelated regressions being evaluated simultaneously
Introduction
Methods
Results
Discussion
Allison, P. D. (1999). Multiple regression: A primer. Pine Forge Press.
Literature
Arif, S., & MacNeil, M. A. (2022). Predictive models aren't for causal inference. Ecology Letters, 25(8), 1741-1745.
Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., ... & Lautenbach, S. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27-46.
Wright, S. (1921). Correlation and causation. Journal of agricultural research, 20(7), 557.
Land, K. C. (1969). Principles of path analysis. Sociological methodology, 1, 3-37.
Wright, S. (1934). The method of path coefficients. The annals of mathematical statistics, 5(3), 161-215.
Harter, W. L. (1974). The method of least squares and some alternatives: Part I. International Statistical Review/Revue Internationale de Statistique, 147-174.
Topping, J. (1972). Theory of Errors. In Errors of Observation and their Treatment (pp. 72-114). Dordrecht: Springer Netherlands.
Belsley, D. A., Kuh, E., & Welsch, R. E. (2005). Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons.
Schisterman, Enrique F., et al. "Collinearity and causal diagrams: a lesson on the importance of model specification." Epidemiology 28.1 (2017): 47-53.
Van den Bos, A. (2007). Parameter estimation for scientists and engineers. John Wiley & Sons.
Literature Review
Sewall Wright (Path Analysis) 1918
People generally know what regression is, don't dwell on it too much
Least Squares Methods
The term 'regression' was first coined by Sir Francis Galton as 'regression to the mean' in human measurement studies.
Linked Least Squares Methods to Correlation
Modern Multiple Regression as we know it today
An Important and Critical part of the literature review is to critically review the relevant literature in order to identify a 'gap' you will position your own research findings in relation to the existing literature.
What are my research questions?
These questions should help guide the literature review and should be directly addressed through the research I am proposing to undertake in my thesis.
A good review of the literature comprehensively and critically analyses and evaluates existing knowledge within a particular domain. In relation to the research process, the literature review can identify trends and gaps in the literature that assist in directing the research and refining the research question(s). Ultimately, the literature review should demonstrate a thorough understanding of the topic, and provide justification for the research problem, design and methodology.
Reproducibility crisis
Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246-263.
Kuusela, V. (2012). Laplace-a pioneer of statistical inference. J. Électron. Hist. Probab. Stat, 8, 1-24.
Stigler, S. M. (1990). The history of statistics: The measurement of uncertainty before 1900. Harvard University Press.
Bullock, J. G., & Green, D. P. (2021). The failings of conventional mediation analysis and a design-based alternative. Advances in Methods and Practices in Psychological Science, 4(4), 25152459211047227.
Farrar, D. E., & Glauber, R. R. (1967). Multicollinearity in regression analysis: the problem revisited. The review of economic and statistics, 92-107.