Statistics for the Behavioral Sciences

Introduction to the t statistic

Sample variance = s^2 = SS / n-1 = SS / df

Sample standard deviation = s = √SS/n-1 = √SS/df

Estimated Standard Error - is used as an estimate of the real standard error σM when the value of σ is unknown. It is computted from the sample variance or sample standard deviation and provides an estimate of the standard distance bteween a sample mean M and the population mean μ.

estimated standard error = sM = S / √n or sM = √s^2 / n

The estimated standard error of M typically is presented and computed using variance

2 reasons for making this shift from standard deviation to variance:

  1. The sample variance is an unbiased statistic and provides an accurate and unbiased estimate of the population variance.
  2. Use variance for all of the different t statistics. Thus, estimated standard error = √sample variance / sample size

t statistic - is used to test hypotheses about an unknown population mean when the value of σ is unknown: t = M - μ / sM

Degrees of freedom - describes the # of scores in a sample that are independent and free to vary: df = n-1

t distribution is the complete set of t values computed for every possible random sample for a specific sample size (n) or a specific degrees of freedom (df). The t distribution approximates the shape of a normal distribution

The greater the sample size (n) is, the larger the degrees of freedom are, and the better the t distribution approximates the normal distribution.

The shape of a t distribution: as df get large, the t distribution gets closer in shape to a normal z-score distribution. The t distribution has more variability than a normal z distribution, therefore the t distribution tends to be flatter and more spread out, whereas the normal z distribution has more of a central peak

t distribution table- the numbers in the table are the values of t that separate the tail from the main body of the distribution. Proportions for one or two tails are listed at the top of the table and the df values are listed in the first column

Hypothesis Test with t Statistic

t = sample mean (from the data) - population mean (hypothesized from null hypothesis) / estimated standard error (computed from the sample data

Step 1: State the Hypothesis and Select an Alpha Level

Step 2: Locate the Critical Region

Step 3: Calculate the t statistic t = M - μ / Sm

Make a Decision regarding the null hypothesis

Two assumptions of the t test:
1- the values in the sample must consist of independent observations
2- The population sampled must be normal

Estimated Cohen's d: mean difference / sample standard deviation - M - μ / s

Percentage of variance accounted for by the treatment - a measure of effect size that determines what portion of the variability in the scores can be accounted for by the treatment effect

Confidence Interval - an interval, or range of values centered around a sample statistic. The logic behind a confidence interval is that a sample statistic, such as a sample mean, should be relatively near to the corresponding population parameter

Correlation - a statistical technique that is used to measure and describe the relationship between two variables

A correlation requires two scores for each individual. (One score from each of the two variables. These scores normally are identified as X and Y

The direction of the relationship

in a positive correlation the two variables tend to change in the same direction: as the value of the X variable increases from one individual to another, the Y variable also tends to increase; when the X variable decreases. the y variable also decreases

In a negative correlation, the two variable tend to go in opposite directions. As the X variable increases, the Y variable decreases. That is, it is an inverse relationship

The form of the relationship

The most common use of correlation is to measure straight-line relationships

Strength or Consistency of the Relationship

The closer the correlation is to + or - 1.00 the stronger the correlation is. A perfect correlation always is identified by a correlation of 1.00 and indicates a perfectly consistent relationship

The Pearson Correlation measure the degree and the direction of the linear relationship between two variables

r = covariability of X and Y / variability of X and Y separately

The sum of products of deviations (SP) - a measure of the degree of covariability betwwen two variables; the degree to which they vary together

Definitional Formula: SP = Σ(X-Mx)(Y-Mx)

Computational Formula: SP = ΣXY - ΣXΣY / n

SS Formula for X variable : SS = ΣX^2 - (ΣX)^2 / n

SS formula for Y variable: SS = ΣY^2 - (ΣY)^2 / n

Pearson r correlation formula: r = SP / √SSxSSy

The value of a correltaion can be affected greatly by the range of scores represented in the data
One or two extreme data points, outliers, can have dramatic effect on the value of a correlation

The value of r^2 is called the coefficient of determination because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable

Partial Correlation measures the relationship between two variables while controlling the influence of a third variable by holding it constant

Hypothesis Test with Pearson Correlation

The null hypothesis can either say there is no population correlation or that that population correlation is not positive or negative, if it is a directional or one tailed test

The alternative hypothesis can say that there is real correlation or that the population correlation is positive or negative, if it is a directional or one tailed test

Standard error for r = sr = √s-r^2 / n-2

t statistic: t = r - p / √(1-r^2) / (n-2)

Degrees of Freedom: df = n - 2

Spearman Correlation: a correlation calculated for ordinal data. Also used to measure the consistency of direction for a relationship

Point- biserial correlation - a correlation between two variable where one of the variables is dichotomous

The phi-coefficient - a correlation between two variables both of which are dichotomous