Correlation and Regression - Coggle Diagram
Correlation and Regression
Direction of the Relationship: The sign of the correlation, being positive or negative, describes the direction of relationship.
Positive correlation: Two variables tend to change in the same direction. For example, as the X variable increases from one variable to another, the Y variable tends to increase.
The same can be said vice versa
Negative correlation: The two variables tend to go in opposite directions. As the X value increases, the Y variable decreases. This is otherwise and inverse relationship.
The Form of the Relationship: The most common use of correlation is to measure straight-line relationship.
The Strength of Consistency of the Relationship: Every time X increases by one point and Y also changes by a consistent and predictable amount.
The degree of relationship is measured by the numerical value of the correlation.
Perfect correlation is always identified by a correlation of 1.00 or each change in X is followed by a predictable change in Y.
The Pearson Correlation
Pearson Correlation: Also called the Pearson-product moment correlation, measures how well the data fits into a straight line.
Measures the degree and the direction of the linear relationship between two variables.
Notation is r
The Sum of Products of Deviations
Measures the amount of covariability between two variables.
Here is the formula:
Pearson correlation formula:
Any factor that does not change the pattern formed by the data points, any factor that does not change the pattern also does not change the correlation
Interpreting the Pearson Correlation
Where and Why Correlations are Used
Prediction: If two variables are known to be related in some systematic way, it is possible to use one of the variables to make accurate predictions about the other.
Validity: Demonstrating validity of tests or experiments is done by observing correlation.
Reliability: Like validity, the reliability of a test or experiment is measured by observing correlation. In other words, reliable data produces reliable results.
Theory Verification: The prediction of the theory could be tested by determining the correlation between two variables.
Outliers: One or two extreme data points that can have a dramatic effect on the value of a correlation.
Be careful not to assume that one variable causes an effect on the other.
The simple existence of a correlation does not prove a cause and effect relationship
Restricted range: The range of scores that belong to a certain category of data. For example, only looking at the data for the dogs in a study that contains four other animals. Only observing the data for dogs in a study can change the correlation that may have have otherwise been observed while observing all of the data.
Coefficient Determination: r^2 because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable.
Hypothesis testing with the Pearson Correlation
The t statistic for a correlation has this formula:
When the Pearson correlation formula is used with ordinal scale, the result is called the Spearman scale.
In a situation like this, Spearman correlation can be used to measure the degree to which a relationship is consistently one- directional relationship between two variables.
A one-directional relationship between two variables is described as monotonic.
Ranking Tied Scores
List the scores in order from smallest to largest
Assign a rank to each position
When two scores are tied, compute the mean of their ranked position. Then assign this mean values as the final rank for each score
Formula for ranked scores:
The point-biserial correlation is used to measure the relationship between two variables in situations in which one consists of regular, numerical scores. However, the second variable has two values.
A variable with only two variables is a dichotomous variable or binomial varibale
When both variables measured for each individual is dichotomous, this correlation is called the phi-coefficient.
Introduction to Linear Equations and Regression
A linear equation can be notated as Y = bX + a
The value of b is the slope or how much the Y variable changes when X is increased by one point.
The value of a is the y-intercept because it determines the value of Y when X = 0
Regression: The statistical technique for finding the best-fitting straight line for a set of data.
The resulting straight line is called regression line.
Formula for total squared error:
The total error between the line and the data
The least-squared error: Determines the point on the line (ŷ) that gives the best prediction of y.
Here is the formula:
ŷ = bX + a
Also defined as the regression of Y
The standardized form of the regression equation (transforming X and Y into z-scores before finding the regression scores):
Standard error of estimate:
Gives a measure of the standard distance between predicted Y values on the regression line and actual Y values in the data.
Here is the formula:
Formula for predicted variability:
SSregression = r^2SSY
Formula for unpredicted variability:
SSresidual = (1-r^2)SSy
Analysis of regression:
The process of testing the significance of a regression equation.
t^2 = F = MSregression/