Ch 5: test power and effect size

sample size and test requirements

t-test: 30 or more participants

chi-square: 5 per category

regression analysis: 20 or more cases per category

anova: variance (# of cases should be equal among groups- equal group size)

test on larger sample= more statistical significant

small standard error= larger test statistic values= small p-values

more precision= more likely to reject null bc there is less spread; rejection region= closer to the mean

effect size: size of difference between sample outcome and hypothesized population value

the more likely the sample deviates from the mean, the more it is likely to reject the null

metal detector: large effect size--> statistical test picks up large amount of metal more easily

probability of rejecting null hypothesis depends on sample size and effect size

large sample size= statistical test more sensitive

test will be significant for small effect sizes

large effect size= easily picked up by statistical test

large effect size more easily statistically significant

large sample size needs small effect size for it to be significant

small sample size needs large effect size for it to be significant

practical significance

the minimum effect size that makes test worthwhile

statistical significance (metal detector) is used to signal practical significance

we have the OG effect (mean) and the new effect

new effect should be in rejection region

statistical significance will beep (signal it)

null (of no improvement) is rejected-- we want this bc we want to see an improvement

should be in rejection region bc we want to reject null of no improvement in population (after exposure to campaign for example)

unstandardized effect size: the difference between sample outcome and hypothesized value

no rule of thumb for interpreting bc changing the scale does not mean effect size is different (only units are different)

depending on sample size used but if effect size is a great deal to us (in rejection region) then it is practically significant

variation in scores is important (a lot of variations= small diff between sample outcome and hyp pop value is not relevant; constant scores= small diff is relevant)

standardized effect size takes variation into account and tells us if effect size is relevant (strength

direction not relevant

(s.mean- hyp pop value)/ stnd dev in sample

can be applied to one sample, paired sample, and independent sample t-test (look in notebook for formulas)

0.2: weak effect; 0.5: moderate effect; 0.8 strong effect

calculating cohehs d spss (take values from table)

for ind samples t-est, if levens test is not sig, use first row to plug in variables

association as effect sizes (for reg coeff, model, and ANOVA)

if null hypothesis expect no correlation (0 correlation expected), pearsons product-moment correlation coeff./ spearmans rank corr coeff express effect size

SPSS- test on reg coeff (b), reg model (R2), ANOVA (eta2) use effect size of 0

use B* (beta on SPSS), r2, eta2 for effect sizes on SPSS

0-.1= no to weak association; .1-.3= weak;.3-.5=moderate; .5-.8=strong; .8-1= very strong; 1= perfect

use previous research to decide sample size

stndized effect size used to express effects we are interested in

go for the sample that produce certain effects more freq

with larger sig levels, rejection regions are bigger and more likely to reject a false null (power of the test)

test power: probability of rejecting a false null

higher sig level--> large risk to reject null and type 1 error increases bc more likely to reject true null (type 2 error decreases)

type 2 error: not rejecting null when it is false

the world of a researcher

two graphs (hypothetical and true imaginary population. researchers use critical values from tope graph and reject null based on bottom graph

if there is a small effect in the population, the null hypothesis is not true (bc its a diff value than expected), however we still might fail to reject it bc it does not land in the rejection region

prob of making this error is the area in the middle (not the rejection region)

1- type II error probability (area outside of rejection region)-- the rejection regions

when wondering which sample size to use, think of sig level, effect size, test power

smaller test power= larger risk of not rejecting false null

leave stat sig the same or else test power change (smaller sig=smaller test power)

effect size--> use previous knowledge (small effect sizes need large samples); if there are standardized effect sizes, use those

power: rule of thumb- 80% chance of rejecting false hyp

prob of not rejecting true null= 95% (1-type 1 error-- sig level)

determining sample size

previous research

large sample just in case

effect size more important than stat sig bc it relates to practical sig

test power and type 2 error imp for when we don't reject null hyp (confidence in results)