Please enable JavaScript.
Coggle requires JavaScript to display documents.
Statistical testing - Year 1 recap - Coggle Diagram
Statistical testing - Year 1 recap
T-tests
Comparing if two means are statistically significant compared to one another
If confidence intervals do not overlap, there is a significant difference between two group means
Opposite is not necessarily true - CIs of two groups can overlap but the means can be significantly different from each other
Two main types of t-test - between groups (independent design) and dependent t-test (within subject)
One sample t-test - when we have just one group and want to compare these participant's to a specific score
Rationale for t-test -
If manipulation has had no effect, mean number will be similar in both groups
Two sample means taken would be similar
The larger the observed difference between two sample means, the more likely they come from different populations
We compare the observed difference between our two sample means to the to the expected difference by chance
Assumptions of t-tests - parametric test
We have independent scores
Data is interval
Data is normally distributed
Assumption of homogeneity of variance for independent t-test
Jamovi - report descriptives first - mean and SD for t-test; homogeneity of variance (Levene's) for independent
p < .05 = significant
df = participant number - 1
Effect size - .2 = small effect, .5 = medium effect, .8 = large effect
Same for dependent minus Levene's
Italicise letters
Report no 0 before decimal if can be no larger than 1
Non-parametric alternatives to the t-test
Assumptions - continuous DV, random data, non-normally distributed, ordinal
Within-subjects -
Parametric - dependent
Non-parametric - Wilcoxon
Between-groups -
Parametric - independent
Non-parametric - Mann-Whitney
Association / correlation -
Parametric - Pearson's R
Non-parametric - Spearman's Rho
Pros and cons of using non-parametric tests -
Advantages - make fewer assumptions, can use with small datasets and easy to do
Disadvantages - typically have lower power but only in normal data sets, increased chance of type 2 error, not always an alternative for parametric tests
Analysis steps - Check for outliers, check distribution, decide on test, run test, report results
Ranking - non parametric tests use ranks rather than actual data collected
Work on distributions on marks
Tests are between sums of ranks, not between mean scores
Correlations
Examines relationships - whether variables are associated and how they are associated
We do not usually put participants in different experimental conditions / groups and do not manipulate any variables
Measure two variables and look for association between them
Cross sectional designs - two groups at same time point
Scatterplots - we plot a person's score on two different variables as one point on a scatterplot
Regression line - line of best fit through data points, and linear regression is just a way to fit the best line to the data
positive correlation - as one variable's score increases, so does the other
Negative correlation - as one increases, the other decreases
No correlation - data points irregularly scattered
Perfect correlation - all data points fit on the line of regression
Strong correlation - most data points close to or on the line
Moderate correlation - data points are further away from the line
The correlation coefficient - determines the strength and direction of the relationship
Pearson's R - for data that meets the assumptions of normality
Spearman's rho - for data that violates assumptions of normality - non parametric of correlation (parametric - assumption of normality)
Information from the coefficient -
-> Direction - positive is a +, negative is a -
-> Strength - indicated by number -1 to +1
-> Cohen's recommendations - .1 = weak, .3 = medium, r = .5 = large
Interpreting correlations -
Effect size for Pearson's r - r itself is an effect size
However, effect size is dependent on the research design and the variables
Correlation is not causation - we do not know the direction of the effect, could be third variable influencing both
Shared variance - if there is a relationship between two variables, then as scores of one change, scores on the other also change -> if r is 95%, we do .95 x .95 = 90%
Always look at scatterplot
Correlation coefficient - Spearman's Rho - ordinal or skewed data
Correlation coeffcient - Kendall's tau - ideal for small datasets
Regression
Simple linear regression
Line of best fit -
Scatterplot - plot a person's scores for two variables
Draw a line of best fit through data points
Called a regression line
Regression is a method of fitting the best line to the data - can be used as a way to predict scores
Line models the relationship - can go up to line on one variable and go across to predict a score
line of best fit - best model of the data - model with least error, doesn't have to go through all data points, line will be closer to some than others
How we decide where to draw it -
Similar to when we use the mean as a model of the data
Mean goes through the middle of all data points - the smaller the SD, the better the mean as a model of the data
We can use a mean model line to measure the difference between observed values and thus the SD
Residuals - difference between observed values (data points) and values predicted by the model
Residual - observed value - predicted value
Calculate residual sum of squares
Small SSr is a good model of the data
Large SSr - poor model
ANOVA test of model fit - whether the regression line is a better data fit than by chance using the mean
The bugger the F value, the better the line
Regression coefficients - model parameters -
Used to make regression equation - explains relationship
Gradients and y intercepts
Y = b) + b1x (y = mx+c)
B0 - is the constant and the y intercept
B1 is the gradient
Y is the outcome variable
X is predictor
Gradient - short side / long side
Multiple regressions
Rationale -
Cross-sectional - when we have measured more than two variables
Increased predicted variance in outcome
We have predictors and outcomes
Model consists of predictor variables
We can determine - how well the model explains the outcome, how much variance in the outcome our model explains and the importance of each individual predictor
Main types -
-> Forced entry - report R squared, ANOVA F model fit, regression equation and coefficient
-> Hierarchial - report same
-> Stepwise - report above
Assumptions checked post-data collection -
Linearity - relationship between predictor and outcome should be linear
Homoscedasticity / no hetereodasticity - similar to homogeneity of variance; variance of error terms (residuals) should be constant for all values of predicted values (model)
-> Look to see data points are spread for predicted values
-> Heteroscedasticity - when the variance in the error term is not constant for all predicted values
-> Funnel / cone shape indicates this
Normal distribution of residuals - check Q-Q plot
No multicollinearity - problems can occur when predictors correlate too strongly - R should be .80 or .90 and no higher
-> Problem - a good predictor might be rejected because a second predictor won't explain much more unique variance in the outcome than the first predictor
-> Leads to errors in estimation
Solutions - combine predictors or remove variables
Look at tolerance or VIF statistic - if VIF is greater than 10 there is an issue, if tolerance is less than 0.2 there is concern
Also, a high R squared with non-significant beta coefficients is an issue
Cook's distance - individual cases that overly influence model -
Checks for outlier cases in set of predictors
Measures the influence of each case on the model
Values greater than 1 may be cause of concern
ANOVA
Limitations of the t-test - can only compare two means - we cannot use it to compare each of the three means to each other individually, as this will cause a familywise error
Probability per test that we falsely find an effect (Type 1 error) = .05
Each additional t-test increases the chance of this
What is ANOVA - used like the t-test to compare 3 or more condition means
Does not increase Type 1 error
ANOVA hypothesis tested - null and experimental
ANOVA is an omnibus test - looks for an overall effect, is not specific about which means differ, it simply just states that there is a significant difference
One way independent ANOVA - used to compare 3 or more group means (independent t-test equivalent)
Can be used with more than one IV
One IV is a one way independent ANOVA
Used in a between groups design with normally distributed data that has an IV with more than 3 levels
Parametric test based on normal distribution - data needs to be interval, for independent - homogeneity of variance
Effect size - partial eta squared
Repeated measures ANOVA - compares three or more within subjects conditions
Test of sphericity - Maulchy's W test
If violated, use Greenhouse-Geisser or a Huynh-Feidt correction
Use post-hoc testing
Post-hoc testing - we use follow up tests to determine where the difference is
Planned comparisons - used when we have directional hypotheses
Post-hoc testing - done when we have nonspecific hypotheses
Bonferroni is normally used, but one with lowest significance is prioritised
Independent two-way ANOVA - analysing studies with multiple IVs, not just multiple IV levels
Between group design with two IVs, with two levels - 2 x 2 independent design
Two way ANOVA has 3 effects - 2 x main effects for each IV, and one interaction effect
Run Levene's, Main ANOVA output, descriptives and post-hoc testing
Significant interactions have crossing / non-parallel lines, non significant are parallel lines that do not cross
Main effects - have to be significant in ANOVA
Two-way repeated measures ANOVA: same concept as two way independent, but with a within-subjects design
All participants go through all conditions
For within-subject variables with three or more levels, assumption of sphericity needs to be met
Gives output for three effects
Mixed ANOVA design - one or more IVs in a repeated measures (within subjects design) and one or more IVs in an independent measures design (between groups design)
Mixed designs - measure before -> group 1 manipulation and group 2 control -> measure after -> 1 x 1 design
Two way - 1 x within subject, 1 x between groups
Three way - 1 x within subjects, 2 x between subjects OR 1 x between groups, 2 x within subjects
ANOVA rationale - two sources of variation in experimental designs -
Systematic variation - due to purposeful manipulation
Unsystematic variation - random factors
Want to know how much of the overall variation is systematic and how much is unsystematic
ANOVA - breaks down the total variance in a set of scores by calculating - how much is systematic, how much is unsystematic and compares two sources of variation
Our manipulation has had an effect on our DV if it has created a lot of variation in scores compared to random variation that we'd find anyway
Manipulation has had no effect if there is not much difference
Nonparametric alternatives to the ANOVA
If non-normally distributed (300+ participants - look at raw value no z score)
For repeated measures ANOVA - use Friedman's test
For independent measures ANOVA - use Kruskal-Wallis
Advantages -
Make fewer assumptions
Can use with small datasets
Easy to calculate and interpret
Disadvantages -
Can have lower power than parametric tests - increased chances of Type 1 error
Uses ranking - report X squared statistic and significance, and then report Dwass-Steel-Critchlow-Flinger pairwise comparison
Power analysis - G* - hypothesis testing; power and error -
Type I - false positive
Type II - false negative
Power - ability to detect an effect
-> Impacted by effect size, alpha level and sample size
Knowing three of these 4 factors allows us to estimate the 4th
This allows us to -
Calculate the power of a test post-expeirment
Estimate the sample size needed to achieve adequate power pre experiment - power always at .80, effect size is estimated, alpha level is always .05 to find sample
Chi-Square test
Interval and ratio data - Pearson's r, regression, ANOVA and t-test
Ordinal data - Spearman's Rho, Mann Whitney, Wilcoxon, Kruskal-Wallis and Friedman's test
Norminal data - yes/no decisions and we cannot analys categorical data directly
We instead use frequencies or the number of participants associated with a category
Chi-square test 1: Goodness of fit test -
Chi-Square using one variable - one sample or one variable
Compares observed frequencies and expected frequencies (number of participants / number of categories)
In jamovi -
Two ways to enter the same data
Both will give you the same jamovi output
You can enter the data either -
-> By participant - each cell indicates the category they belong to
-> Using a total frequency count - one column for the category variable and one for the count
Chi-square test 2: Chi-square test of association - chi-square with two variables
This is a test where we have two categorical variables
We can determine if there is an association between the two variables
Make a 2 x 3 contingency table - or for however many levels there are of each categorical variable
How it works -
-> Null hypothesis - there is no association between gender and degree chosen
-> Again, we calculate the expected frequencies if the null hypothesis were true
-> This time, it is more complex because there are two variables
-> Again, there is a comparison of observed frequencies to the expected frequencies
In jamovi - same as first, but effect size of Cramer's V - between 1 and 0 for strength of association
Odds ratios - an additional method of helping to report the results
First calculate odds then the ratio - makes sense with a 2 x 2 contingency table
Assumptions of chi squared -
Observations must be independent - each participant only in one cell
There should be an adequate expected frequency in each cell - in general, no more than 20% of the expected frequencies should have value of less than 5, it is not important what the actual observed frequencies are
If it is a 2 x 2 contingency cell, all cells need an expected frequency greater than 5 as 1 cell is worth 25%
If larger than 2x2, collapse variables
Fisher's exact test - if 2 x 2 contingency table - can be used if Pearson's chi-square statistic when sample size is small
Therefore, when we have one or more expected frequencies that is less than 5 in a 2 x 2 contigency table report Fisher's exact
For a 2 x 2 contingency table -> are any of the expected values < 5 - no = Pearson's, yes = Fisher's
Larger than a 2 x 2 contingency table - are more than 20% of the expected values < 5 - no = Pearson's chi square, yes = collapse