Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 14: Correlation and Regression - Coggle Diagram
Chapter 14: Correlation and Regression
Correlation. A statistical technique used to describe the relationship between two variables.
Scatter plot. A plot of the relationship of two variables with X values on the horizontal axis and Y values on the vertical axis.
Direction of the relationship. A positive or negative correlation between two variables.
Positive correlation. Two variables act in the same direction. When one variable increases, the other increases. The more one studies, the better the grades
Negative correlation. Two variables act in the opposite direction. The more one studies, the less one watches TV.
The form of a relationship. Typically, the relationship between two variables is viewed as a straight line, called a linear form.
Strength or consistency of a relationship. The degree of the relationship is measured by correlation.
A perfect correlation, depicted as a 1.00 correlation, is one when the X variable goes, the Y variable goes up a consistent amount.
A correlation must exist between -1.00 and +1.00.
Envelope. A relationship can also be drawn using an envelope that encircles data points.
A football shaped envelope corresponds to a correlation of 0.7. Fatter envelopes are closer to 0, while a narrower shape is closer to 1.00.
Pearson correlation. Measures the degree and direction of a linear relationship between two variables.
r = Covariability of two variables / Total variability of two variables. r is between -1.00 and 1.00.
The sum of products. Measures the covariability between two variables. SP = ∑(X - Mx)*(Y - My).
Alternative sum of products formula. SP = ∑XY - ∑X∑Y/n
Pearson formula. r = SP/√SSxSSy
z-scores. Each x and y value can be transformed into a z-score to represent the exact value along a distribution.
Pearson formula for a sample. r = ∑zxzy/(n-1)
Pearson formula for a population. ρ = ∑zxzy/N
Why are correlations used?
Predictions. If one knows the value of variable X, what is the variable Y.
Validity. If one designs a new test for intelligence, how good of a predictor is the test?
Reliability. If one takes a test this week, it should be the same result as a test next week.
Theory verification. If I theorize that this class will lead to higher grades in statistics, then a correlation of class time and higher grades should exist.
Correlations do not mean cause and effect. Ex.
Outliers are extreme data points that can skew the correlation between two variables.
We square the correlation, r-squared, to explain the variability between two variables.
Coefficient of determination, r-squared. Measures the proportion of variability in one variable that determines the variability in another variable. Ex. r-squared = .64. 64% of the variability of Y can be explained by the relationship with X.
Hypothesis testing with Pearson correlation. Null hypothesis is ρ = 0 for two tailed tests.
Complete t statistic = (r - ρ) / √(1 - r-squared)/(n - 2)
Spearman correlation. Used when the variables are ordinal (ranks).
rs = 1 - 6∑D-squared/n(n-squared -1). D is the difference between the X and Y rank for each individual.
phi-coefficient. When each variable X and Y are dichotomous. Ex. Birth order and personality (extrovert, introvert)
Regression line. Statistical technique to determine the best fitting line to explain the correlation between two variables.
Standard error of estimate. Predicts the average distance between the predicted Y values in a regression and the actual Y values.
Standard error formula = √∑(Y - Ŷ)²/(n-2)