BGY 3701 Biostatistics (All 10 Tests)
PARAMETRIC
NON-PARAMETRIC
One-way ANOVA
Two- way ANOVA
Correlation
Wilcoxon Signed-Rank
Kruskal-Wallis
Non-normally Distributed
To test differences when there are two conditions with the same participants
For dependent samples
Equivalent to the dependent t-test
Null hypothesis
The medians of two samples are equal.
Assumptions
Dependent variables
Ordinal (ranked)
Continuous
Independent variable
Two categorical, "related groups" or "matched pairs"
Same subjects are present in both groups
Distribution
Distribution of the differences between the two related groups is symmetrical in shape.
Used for
Ordered (ranked) categorical variables without a numerical scale
T-TEST
DEPENDENT T-TEST (PAIRED)
INDEPENDENT T-TEST
data is normally distributed
same subjects took part in both condition of experiments
Example of variables
2 experimental conditions
Significant value (Sig/Sig-2-tailed/P-value)
LEVENE'S TEST
Independent variable (Categorical; can manipulate)
Dependent variable (Continuous; numerical, have infinite number of values)
Effect of coffee intake
Reaction time
indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).
P more than 0.05 = fail to reject null hypothesis
not statistically significant
high probability to occur by chance
P less than 0.05 = reject null hypothesis
statistically significant
low probability to occur by chance
data is normally distributed
Example of variables
different subjects assigned to each conditions
Significant value (Sig/Sig-2-tailed/P-value)
2 experimental conditions
Independent variable (Categorical; can manipulate)
Dependent variable ( (Continuous; numerical, have infinite number of values)
Educational level of students
Height of students
P more than 0.05 = fail to reject null hypothesis
P less than 0.05 = reject null hypothesis
indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).
not statistically significant
high probability to occur by chance
statistically significant
low probability to occur by chance
If Sig/P more than 0.05 = variances are equal
If Sig/P less than 0.05 = variances are not equal
Null hypothesis: Variances are equal/same for all groups
Why variances need to be equal?
To test the variances in different groups are the same or not (Homogeneity of variance)
To fulfil the assumptions of parametric test
Chi square test of independence
used to test if two categorical variables are significantly associated, but cannot provide inference for causation
Data requirement
Characteristics
have 2 categorical variables
compare 2 or more mean groups
one independent variable
different participants in each condition
two or more categories (group) for each variable
F-ratio
independence of observation
relatively large sample size
between groups mean square variance divided by within groups mean square variance
can only compare categorical variables
large F-ratio, Ho is less likely to be true
Significant Value
Dependent variable: Height of plants
Null hypothesis: There is no effect of concentration of gibberellins on height of plants
Independent variable: Concentration of gibberellins
p<0.05
Null hypothesis rejected
null hypothesis
there is no relationship between the two variables (i.e. the two variables are independent)
p>0.05
Fail to reject null hypothesis
Highly significant
Post-hoc test
to compare groups pairwise
Multiple comparison procedure
Multiple range tests
Tukey HSD
LSD
Bonferroni
R-E-G-W Q
Gabriel
Tukey HSD
data can be displayed in contingency table
data value that are simple random sample from the population of interest
Characteristics
compare more than 2 mean groups
2 independent variable
different subjects in various groups
use when
data has 3 or more levels (rank-based)
observation are independent (i.e. use different participant in each group)
Significant value
Independent variable: educational level & smoking status
Dependent variable: Lung function
comparison of mean ranks
to determine if there are statistically significant differences between 2 or more groups of an independent variable on a continuous or ordinal dependent variable
Null hypothesis:
- The educational level has no effect on lung function.
- The smoking status has no effect on lung function.
p<0.05
p>0.05
also called as the "one way ANOVA on ranks"
Failed to reject null hypothesis
No interaction effect
data are not normally distributed
Assumptions
1) Dependent variable is measured at ordinal or continuous level (interval or ratio)
Reject null hypothesis
Have interaction effect
F-ratio not significant
Example of continuous level data: revision time (hours), intelligence (IQ), exam performance (0-100%), weight (kg)
F-ratio significant
Example of ordinal level data: Likert scales (e.g., a 7-point scale from "strongly agree" through to "strongly disagree") and ranking categories
Post Hoc Test
2) Independent variable consist of 2 or more categorical, independent groups (usually three or more groups)
Examples: ethnicity (Caucasian, African American and Hispanic), physical activity level (low, moderate high), profession (nurse, doctor, dentist,..)
3) Should have independence of observations (no relationship between the observations in each group or between the groups themselves)
For example, there must be different participants in each group with no participant being in more than one group
additional assumption
4) To know how to interpret the results from this test, determine whether the distributions in each group (i.e., the distribution of scores for each group of the independent variable) have the same shape (which also means the same variability).
Post hoc test (to determine which groups are different from others)
Mann-Whitney test, but need to take into account the problem of multiple testing
Dunn or Dunn-Bonferroni correction which makes adjustment to
ensure that Type I error doesn’t exceed 0.05 (done by dividing α of 0.05 by the number of tests conducted)
similar to the Mann–Whitney U test, but can be applied to one-way data with more than two groups.
Positive-correlation
Manipulated variable: Increases
Observed variable: Increases
Negative-correlation
Manipulated variable: Increases
Observed variable: Decreases
Output (Correlation Table)
Pearson Correlation (r): Correlation coefficient.
Sig. (2-tailed): p-value.
N: Individuals/number of subjects
Assumptions/Must haves
Pearson's Correlation Coefficient (r)
Value: Between -1 to +1 only. Value closer to 1 means the closer the points will fall on a straight line on the scatterplot.
r = 0 means there is no linear relationship between variables.
When to use (r)?
Data: Interval level
Distribution: Normal distribution between population of values.
Scatterplot: Reveals possible linear relationship
At least 5 pairs of measurements
Variables
Ordinal
Nominal
click to edit
Mann-Whitney Test
compare medians or the scale parameters of two populations
Null hypothesis : two samples come from the same population (i.e. have the same median) or, alternatively, whether observations in one sample tend to be larger than observations in the other. Example: There is no statistically significant difference between the median of hemoglobin level for unexposed rats and exposed to cadmium oxide rats
Conclusion: 2) Determine which of the independent variable has higher or lower score
Conclusion: 1) We need to report whether there is statistically significant difference or not for dependent variable between the two independent variables
Two table of output
Test statistics table: After determine which group has higher score based on mean rank, we need to see p-value. To determine whether the null hypothesis is false or true. Example: p=0.02. The results are highly significant.
Ranks: to identify which group had the highest score based on the mean rank. Lower mean rank means lower score.
equivalent to independent t-test
Example: Dependent variable - Hemoglobin level Independent variables- Two conditions: 15 rats that are exposed and 10 rats that are unexposed to cadmium oxide
click to edit
Testing differences between groups - 2 experimental conditions and different subjects
ranked data (ordinal data)
Regression
Model Assumptions
Variables
Dependent (response): Continuous
Independent (predictor): Continuous or dichotomous
No multicollinearity
Variance-inflation Factor (VIF): <10 or ideal is <4
Tolerance: >0.2
Linearity: Linear relationship between independent (predictor) & dependent (response).
Homoscedasticity: Scatterplot should be random (no pattern).
Distribution: Normally distributed between obtained & predicted dependent variable.
Durbin-Watsons value: Between 1 to 3 only. Ideal is 2.
Outliers: Extreme should be deleted.
Output
Model Summary
R = Correlation coefficient
R squared = Coefficient of determination
S.E. = Standard Error
Durbin Watsons
ANOVA
p = Number of predictors
Coefficents
B = The change in outcome associated with a unit change with the predictor
Std. Error = Standard Error of the regression coefficient
Beta = Change in dependent variable that would be produced by a positive increment of one standard deviation in independent variable