L8 - Biostatistics Continued


  1. Describe the types of variables and samples in research,including independent, related, categorical and continuous.
  2. Describe, apply and interpret key hypotheses tests including independent t test, paired t test, Mann-Whitney U; Wilcoxon
    signed-rank; Chi Square and Fisher’s exact
  3. Describe, apply and interpret tests of association with a focus on the three main types of correlation.
  4. Describe, apply and interpret tests of prediction with a focus on simple linear regression
  5. Describe, apply and interpret survival analysis.
  6. To describe and recognise common problems in research including: intervention effect; restricted ranges; violating the independence of observations; mistaking correlation for causation; unequal groups; intra-group dependency, and external validity issues.*

Using Statistics

Simple Regression

Simple Linear Regression involves estimating an equation which descrives the relationship between two variables, a dependant variable, and an explanaotry/independent variable

Associations

Conceptually… If X and Y are correlated then it is easy to think of them as being related, i.e. moving in the same direction (positively correlated) or opposite direction (negatively
correlated)

Selecting Tests for evaluating Hypotheses of difference

Learn the most common tests

Learn how to use the statistical flowchart to
determine which test to use.

Variable Types

Dependent variables:

Variables whose value is altered as the independent variable is manipulated (a rating scale score)

Independent variables:

The condition which is manipulated (giving a drug)

Independent or Related Samples

Independent samples yield results which are not influenced by results from other samples involved in the same experiment. Date derived during the experiment is not related to one another


The data does not depend on, or relate to, each other.

Examples of Data/Results which are NOT indepedent

Categorical and Continuous Variables

Scale/Continuous

Ratio variables

Are a subtype of interval variables, but with the added condition that 0 (zero) of the measurement
indicates that there is none of that variable e.g. height.

Interval variables

Have central characteristic that can be measured along a continuum and they have a numerical value e.g. temperature (Celsius)

Categorical/Discrete/Qualitative

Nominal variables

Have two or more categories, but which do not have an intrinsic order. e.g. houses, condos,
bungalows.*

Ordinal variables

Have two or more categories just like nominal variables only the categories can also be ordered or ranked. e.g. "Not very much", "OK" and "Yes, a lot"*

Example

Do people have higher blood pressure after they complete 15 minutes of aerobic exercise compared to their resting blood pressure rate?

...people...

Related data/sample

... blood pressure...

Dependent variable = 1


Continuous Variable

Measure along a continume

...after they complete...

Number of samples/groups = 2

###...aerobic exercise
Independent variable = 1

Any covariate (confounders) variables? These are characteristics of the participants that may impact on the outcome and you haven’t controlled for in your study design,
e.g. are some participants smokers, overweight, children, hungry etc?

Idependent Samples t-test

Paired t-test

Mann-Whitney U Test

Wilcoxon signed-rank test

Chi-square test (and fisher's exact test)

Used to assess the distribution of a categorical variable between two, or more, groups

Two independant categorical groups

A continuous outcome (e.g. age, Hb)

Observations are independent

Outcome is approximately Normally distributed

No massive outliers

Two related groups

A continuous outcome

Non-parametric tests (don't rely upon the underling principles of normality)

A continuous or ordinal outcome

Two related grups

A continous or ordinal outcome

Observations are independent within the groups

Assumes the expected cell frequency is at least 5, in each cell (if this assumptio is not valid, we can use the fishers exact test)

Non-parametic test

The null hypothesis is that the distribution of observation between columns is independent of the rows

Correlation

Correlation, quantifies the degree to which two variables are related/associated.


BUT it doesn’t say anything about the relative rates at which the variables change

Kendall, and Spearman's Correlation

  1. Both rank correlations (i.e. is the highest ranking X variable correlated with the highest ranking Y variable)
  2. Does not assume that X and Y are linearly related
  3. Often used for ordinal variables
  4. Both non-parametric correlations
  5. Spearman's is typically more common

Pearson's Correlation

  • Likely the most widely known Correlation
  • Measure if X and Y are linearly related

Does NOT assume normal distribution

Two indepnedent categorical grops

Does NOT assume normal distribution

Observations are independent within groups

Outcome is approximately normally distributed

No massive outliers

Assumes

  1. Both variables are continuous
  2. Both variables are normally distributed
  3. Errors are normally distributed about the regression line

Confers Predictive Value

It is more likely that data will not fit on this line the that they do


It measures the "Goodness of Fit"=> line of best fit


R^2 = measures the degree to which the model explains the relationship (generally other confounders/cvarates that are difficult to measure)

Using R^2 to determine whether or not the gradient is Statistically Significant


R^2 = the degree to which the model explains the relationship

Survival Analysis

The Kaplan-Meier method is used to estimate the probability of survival past
given time points.


The survival distributions of two or more groups can be compared for equality.

Example

In a study on the effect of drug dose on cancer survival in rats, you could use the Kaplan-Meier method to understand the survival distribution (based on time until death) for rats receiving one of four different drug doses and then compare the survival distributions (experiences) between the four doses to determine if they are equal.


Outcome being measured doesn't have to be literal "survival" => the time until failure of knee replacement

Slide 50 - 52
1

e.g. if we try to predict the weight of humans and also collect height. Weight is a related variable to height, e.g. taller people tend to be heavier.

Examples of Independent Data

The data are independent if the variables aren’t related.

Calculating the average recovery time for ACL injuries, comparing individuals who have received synthetic grafts vs hamstring grafts.

e.g. students an in-semester test and a final exam, the final grade is likely to be related to their in-semester grade, not just because of a general relationship between the two grades, but because it is the same person.

Is the Outcome Variable Continuous or Dichotomous

Dichotomous = Chi-square test

Or Fisher's exact test if it is a small sample or a rare event

Continuous

Are the groups independent, or is the data paired

Paired

Independent

Are the Assumptions of the Paired Test violated?


  1. No massive Outliers
  2. Approximately Normally Distributed

No

Yes

Paired t-test

Wilcoxon Signed Rank Test

Are the Assumptions of the Independent Samples T-test violated?

  • No Massive Outliers
  • Approximately Normally Distributed

No

Yes

Whitney U Test

Independent Sampled t-test

Observations are independent

234

Simple Regression (36 - 47)