BGY 3701 Biostatistics (All 10 Tests)

PARAMETRIC

NON-PARAMETRIC

One-way ANOVA

Two- way ANOVA

Correlation

Wilcoxon Signed-Rank

Kruskal-Wallis

Non-normally Distributed

To test differences when there are two conditions with the same participants

For dependent samples

Equivalent to the dependent t-test

Null hypothesis

The medians of two samples are equal.

Assumptions

Dependent variables

Ordinal (ranked)

Continuous

Independent variable

Two categorical, "related groups" or "matched pairs"

Same subjects are present in both groups

Distribution

Distribution of the differences between the two related groups is symmetrical in shape.

Used for

Ordered (ranked) categorical variables without a numerical scale

T-TEST

DEPENDENT T-TEST (PAIRED)

INDEPENDENT T-TEST

data is normally distributed

same subjects took part in both condition of experiments

Example of variables

2 experimental conditions

Significant value (Sig/Sig-2-tailed/P-value)

LEVENE'S TEST

Independent variable (Categorical; can manipulate)

Dependent variable (Continuous; numerical, have infinite number of values)

Effect of coffee intake

Reaction time

indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

P more than 0.05 = fail to reject null hypothesis

not statistically significant

high probability to occur by chance

P less than 0.05 = reject null hypothesis

statistically significant

low probability to occur by chance

data is normally distributed

Example of variables

different subjects assigned to each conditions

Significant value (Sig/Sig-2-tailed/P-value)

2 experimental conditions

Independent variable (Categorical; can manipulate)

Dependent variable ( (Continuous; numerical, have infinite number of values)

Educational level of students

Height of students

P more than 0.05 = fail to reject null hypothesis

P less than 0.05 = reject null hypothesis

indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

not statistically significant

high probability to occur by chance

statistically significant

low probability to occur by chance

If Sig/P more than 0.05 = variances are equal

If Sig/P less than 0.05 = variances are not equal

Null hypothesis: Variances are equal/same for all groups

Why variances need to be equal?

To test the variances in different groups are the same or not (Homogeneity of variance)

To fulfil the assumptions of parametric test

Chi square test of independence

used to test if two categorical variables are significantly associated, but cannot provide inference for causation

Data requirement

Characteristics

have 2 categorical variables

compare 2 or more mean groups

one independent variable

different participants in each condition

two or more categories (group) for each variable

F-ratio

independence of observation

relatively large sample size

between groups mean square variance divided by within groups mean square variance

can only compare categorical variables

large F-ratio, Ho is less likely to be true

Significant Value

Dependent variable: Height of plants

Null hypothesis: There is no effect of concentration of gibberellins on height of plants

Independent variable: Concentration of gibberellins

p<0.05

Null hypothesis rejected

null hypothesis

there is no relationship between the two variables (i.e. the two variables are independent)

p>0.05

Fail to reject null hypothesis

Highly significant

Post-hoc test

to compare groups pairwise

Multiple comparison procedure

Multiple range tests

Tukey HSD

LSD

Bonferroni

R-E-G-W Q

Gabriel

Tukey HSD

data can be displayed in contingency table

data value that are simple random sample from the population of interest

Characteristics

compare more than 2 mean groups

2 independent variable

different subjects in various groups

use when

data has 3 or more levels (rank-based)

observation are independent (i.e. use different participant in each group)

Significant value

Independent variable: educational level & smoking status

Dependent variable: Lung function

comparison of mean ranks

to determine if there are statistically significant differences between 2 or more groups of an independent variable on a continuous or ordinal dependent variable

Null hypothesis:

  1. The educational level has no effect on lung function.
  2. The smoking status has no effect on lung function.

p<0.05

p>0.05

also called as the "one way ANOVA on ranks"

Failed to reject null hypothesis

No interaction effect

data are not normally distributed

Assumptions

1) Dependent variable is measured at ordinal or continuous level (interval or ratio)

Reject null hypothesis

Have interaction effect

F-ratio not significant

Example of continuous level data: revision time (hours), intelligence (IQ), exam performance (0-100%), weight (kg)

F-ratio significant

Example of ordinal level data: Likert scales (e.g., a 7-point scale from "strongly agree" through to "strongly disagree") and ranking categories

Post Hoc Test

2) Independent variable consist of 2 or more categorical, independent groups (usually three or more groups)

Examples: ethnicity (Caucasian, African American and Hispanic), physical activity level (low, moderate high), profession (nurse, doctor, dentist,..)

3) Should have independence of observations (no relationship between the observations in each group or between the groups themselves)

For example, there must be different participants in each group with no participant being in more than one group

additional assumption

4) To know how to interpret the results from this test, determine whether the distributions in each group (i.e., the distribution of scores for each group of the independent variable) have the same shape (which also means the same variability).

Post hoc test (to determine which groups are different from others)

Mann-Whitney test, but need to take into account the problem of multiple testing

Dunn or Dunn-Bonferroni correction which makes adjustment to
ensure that Type I error doesn’t exceed 0.05 (done by dividing α of 0.05 by the number of tests conducted)

similar to the Mann–Whitney U test, but can be applied to one-way data with more than two groups.

Positive-correlation

Manipulated variable: Increases

Observed variable: Increases

Negative-correlation

Manipulated variable: Increases

Observed variable: Decreases

Output (Correlation Table)

Pearson Correlation (r): Correlation coefficient.

Sig. (2-tailed): p-value.

N: Individuals/number of subjects

Assumptions/Must haves

Pearson's Correlation Coefficient (r)

Value: Between -1 to +1 only. Value closer to 1 means the closer the points will fall on a straight line on the scatterplot.

r = 0 means there is no linear relationship between variables.

When to use (r)?

Data: Interval level

Distribution: Normal distribution between population of values.

Scatterplot: Reveals possible linear relationship

At least 5 pairs of measurements

Variables

Ordinal

Nominal

click to edit

Mann-Whitney Test

compare medians or the scale parameters of two populations

Null hypothesis : two samples come from the same population (i.e. have the same median) or, alternatively, whether observations in one sample tend to be larger than observations in the other. Example: There is no statistically significant difference between the median of hemoglobin level for unexposed rats and exposed to cadmium oxide rats

Conclusion: 2) Determine which of the independent variable has higher or lower score

Conclusion: 1) We need to report whether there is statistically significant difference or not for dependent variable between the two independent variables

Two table of output

Test statistics table: After determine which group has higher score based on mean rank, we need to see p-value. To determine whether the null hypothesis is false or true. Example: p=0.02. The results are highly significant.

Ranks: to identify which group had the highest score based on the mean rank. Lower mean rank means lower score.

equivalent to independent t-test

Example: Dependent variable - Hemoglobin level Independent variables- Two conditions: 15 rats that are exposed and 10 rats that are unexposed to cadmium oxide

click to edit

Testing differences between groups - 2 experimental conditions and different subjects

ranked data (ordinal data)

Regression

Model Assumptions

Variables

Dependent (response): Continuous

Independent (predictor): Continuous or dichotomous

No multicollinearity

Variance-inflation Factor (VIF): <10 or ideal is <4

Tolerance: >0.2

Linearity: Linear relationship between independent (predictor) & dependent (response).

Homoscedasticity: Scatterplot should be random (no pattern).

Distribution: Normally distributed between obtained & predicted dependent variable.

Durbin-Watsons value: Between 1 to 3 only. Ideal is 2.

Outliers: Extreme should be deleted.

Output

Model Summary

R = Correlation coefficient

R squared = Coefficient of determination

S.E. = Standard Error

Durbin Watsons

ANOVA

p = Number of predictors

Coefficents

B = The change in outcome associated with a unit change with the predictor

Std. Error = Standard Error of the regression coefficient

Beta = Change in dependent variable that would be produced by a positive increment of one standard deviation in independent variable