Please enable JavaScript.
Coggle requires JavaScript to display documents.
BGY 3701 Biostatistics (All 10 Tests) - Coggle Diagram
BGY 3701 Biostatistics (All 10 Tests)
PARAMETRIC
One-way ANOVA
Characteristics
compare 2 or more mean groups
one independent variable
different participants in each condition
F-ratio
between groups mean square variance
divided by
within groups mean square variance
large F-ratio, Ho is less likely to be true
Significant Value
Dependent variable: Height of plants
Null hypothesis: There is no effect of concentration of gibberellins on height of plants
Independent variable: Concentration of gibberellins
p<0.05
Null hypothesis rejected
Highly significant
Post-hoc test
to compare groups pairwise
Multiple comparison procedure
Tukey HSD
LSD
Bonferroni
Multiple range tests
R-E-G-W Q
Gabriel
Tukey HSD
p>0.05
Fail to reject null hypothesis
Two- way ANOVA
Characteristics
compare more than 2 mean groups
2 independent variable
different subjects in various groups
Significant value
Independent variable: educational level & smoking status
Dependent variable: Lung function
Null hypothesis:
The educational level has no effect on lung function.
The smoking status has no effect on lung function.
p<0.05
Reject null hypothesis
Have interaction effect
F-ratio significant
p>0.05
Failed to reject null hypothesis
No interaction effect
F-ratio not significant
Post Hoc Test
T-TEST
DEPENDENT T-TEST (PAIRED)
data is
normally distributed
same subjects
took part in
both condition
of experiments
Example of variables
Independent variable
(Categorical; can manipulate)
Effect of
coffee intake
Dependent variable
(Continuous; numerical, have infinite number of values)
Reaction
time
2
experimental conditions
Significant value (Sig/Sig-2-tailed/P-value)
indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).
P more than 0.05 = fail to reject null hypothesis
not statistically significant
high probability to occur by chance
P less than 0.05 = reject null hypothesis
statistically significant
low probability to occur by chance
INDEPENDENT T-TEST
data is
normally distributed
Example of variables
Independent variable
(Categorical; can manipulate)
Educational level
of students
Dependent variable
( (Continuous; numerical, have infinite number of values)
Height
of students
different subjects
assigned to
each conditions
Significant value (Sig/Sig-2-tailed/P-value)
P more than 0.05 = fail to reject null hypothesis
not statistically significant
high probability to occur by chance
P less than 0.05 = reject null hypothesis
statistically significant
low probability to occur by chance
indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).
2
experimental conditions
LEVENE'S TEST
If Sig/P more than 0.05 = variances are equal
If Sig/P less than 0.05 = variances are not equal
Null hypothesis: Variances are equal/same for all groups
Why variances need to be equal?
To fulfil the assumptions of parametric test
To test the variances in different groups are the same or not (Homogeneity of variance)
NON-PARAMETRIC
Wilcoxon Signed-Rank
Non-normally
Distributed
To test differences when there are
two conditions
with the
same participants
For
dependent
samples
Equivalent to the dependent t-test
Null hypothesis
The medians of two samples are equal.
Assumptions
Dependent variables
Ordinal (ranked)
Continuous
Independent variable
Two
categorical
, "related groups" or "matched pairs"
Same subjects
are
present in both groups
Distribution
Distribution of the differences between the two related groups is
symmetrical
in shape.
Used for
Ordered (ranked) categorical variables
without a numerical scale
Kruskal-Wallis
use when
data has
3 or more levels
(rank-based)
observation are independent
(i.e. use different participant in each group)
comparison of mean ranks
data are
not normally distributed
to determine if there are
statistically significant differences between 2 or more groups of an independent variable on a continuous or ordinal dependent variable
also called as the
"one way ANOVA on ranks"
Assumptions
1)
Dependent variable
is measured at
ordinal or continuous level
(interval or ratio)
Example of
continuous level data
:
revision time
(hours),
intelligence
(IQ),
exam performance
(0-100%),
weight
(kg)
Example of
ordinal level data
:
Likert scales
(e.g., a 7-point scale from "strongly agree" through to "strongly disagree") and
ranking categories
2)
Independent variable
consist of
2 or more categorical, independent groups
(usually three or more groups)
Examples
:
ethnicity
(Caucasian, African American and Hispanic),
physical activity level
(low, moderate high),
profession
(nurse, doctor, dentist,..)
3) Should have
independence of observations
(no relationship between the observations in each group or between the groups themselves)
For example, there must be
different participants in each group
with
no participant being in more than one group
additional assumption
4) To know how to interpret the results from this test,
determine whether the distributions in each group
(i.e., the distribution of scores for each group of the
independent variable
)
have the same shape
(which also means the
same variability
).
Post hoc test
(to determine which groups are different from others)
Mann-Whitney test
, but need to take into account the problem of multiple testing
Dunn or Dunn-Bonferroni correction
which makes adjustment to
ensure that Type I error doesn’t exceed 0.05 (done by dividing α of 0.05 by the number of tests conducted)
similar
to the
Mann–Whitney U test
,
but can be applied to one-way data with more than two groups
.
Chi square test of independence
used to test if
two categorical variables are significantly associated
, but cannot provide inference for causation
Data requirement
have
2 categorical variables
two or more categories (group)
for each variable
independence of observation
relatively
large sample size
data value that are simple
random sample
from the population of interest
can
only compare categorical variables
null hypothesis
there is
no relationship between the two variables
(i.e. the two variables are independent)
data can be displayed in
contingency table
Mann-Whitney Test
compare medians or the scale parameters of two populations
Null hypothesis : two samples come from the same population (i.e. have the same median) or, alternatively, whether observations in one sample tend to be larger than observations in the other. Example: There is no statistically significant difference between the median of hemoglobin level for unexposed rats and exposed to cadmium oxide rats
Conclusion: 2) Determine which of the independent variable has higher or lower score
Conclusion: 1) We need to report whether there is statistically significant difference or not for dependent variable between the two independent variables
Two table of output
Test statistics table: After determine which group has higher score based on mean rank, we need to see p-value. To determine whether the null hypothesis is false or true. Example: p=0.02. The results are highly significant.
Ranks: to identify which group had the highest score based on the mean rank. Lower mean rank means lower score.
equivalent to independent t-test
Example: Dependent variable - Hemoglobin level Independent variables- Two conditions: 15 rats that are exposed and 10 rats that are unexposed to cadmium oxide
Testing differences between groups - 2 experimental conditions and different subjects
ranked data (ordinal data)
Correlation
Positive-correlation
Manipulated variable
: Increases
Observed variable:
Increases
Negative-correlation
Manipulated variable
: Increases
Observed variable:
Decreases
Output (Correlation Table)
Pearson Correlation (r)
: Correlation coefficient.
Sig. (2-tailed):
p-value.
N:
Individuals/number of subjects
Assumptions/Must haves
Pearson's Correlation Coefficient (r)
Value:
Between -1 to +1 only. Value closer to 1 means the closer the points will fall on a straight line on the scatterplot.
r = 0
means there is no linear relationship between variables.
When to use (r)?
Data:
Interval level
Distribution:
Normal distribution between population of values.
Scatterplot
: Reveals possible linear relationship
At least 5 pairs of measurements
Variables
Ordinal
Nominal
Regression
Model Assumptions
Variables
Dependent (response)
: Continuous
Independent (predictor):
Continuous or dichotomous
No multicollinearity
Variance-inflation Factor (VIF)
: <10 or ideal is <4
Tolerance
: >0.2
Linearity:
Linear relationship between independent (predictor) & dependent (response).
Homoscedasticity:
Scatterplot should be random (no pattern).
Distribution:
Normally distributed between obtained & predicted dependent variable.
Durbin-Watsons value
: Between 1 to 3 only. Ideal is 2.
Outliers:
Extreme should be deleted.
Output
Model Summary
R = Correlation coefficient
R squared = Coefficient of determination
S.E. = Standard Error
Durbin Watsons
ANOVA
p = Number of predictors
Coefficents
B
= The change in outcome associated with a unit change with the predictor
Std. Error
= Standard Error of the regression coefficient
Beta
= Change in dependent variable that would be produced by a positive increment of one standard deviation in independent variable