Please enable JavaScript.
Coggle requires JavaScript to display documents.
Correlation and Regression - Coggle Diagram
Correlation and Regression
Introduction
The Characteristics of a Relationship
scatter plot: where the values for the X variable are listed on the horizontal axis and the Y values are listed on the vertical axis.
positive correlation: the two variables tend to change in the same direction: As the value of the X variable increases from one individual to another, the Y variable also tends to increase; when the X variable decreases, the Y variable also decreases.
correlation: a statistical technique that is used to measure and describe the relationship between two variables.
Three characteristics of correlation include:
The Form of the Relationship.
The Strength or Consistency of the Relationship
The Direction of the Relationship
In a negative correlation , the two variables tend to go in opposite directions. As the X variable increases, the Y variable decreases. That is, it is an inverse relationship.
The Pearson Correlation
Calculation of the Pearson Correlation
In the formula for the Pearson r, we use SP to measure the covariability of X and Y.
The variability of X is measured by computing SS for the X scores and the variability of Y is measured by SS for the Y scores.
The Pearson correlation consists of a ratio comparing the covariability of X and Y (the numerator) with the variability of X and Y separately (the denominator).
With these definitions, the formula for the Pearson correlation becomes:
r = SP/√‾SSxSSy
The Pearson Correlation and z-Scores
For example, a positive correlation means that individuals who score high on X also tend to score high on Y.
Similarly, a negative correlation indicates that individuals with high X scores tend to have low Y scores.
The Pearson correlation measures the relationship between an individual’s location in the X distribution and his or her location in the Y distribution.
The Sum of Products of Deviations
sum of products of deviations: to measure the amount of covariability between two variables.
definitional formula: ∑(X-Mx)(Y-My)
Find the product of the deviations for each individual.
Add the products
Find the X deviation and the Y deviation for each individual.
Pearson correlation: measures the degree and the direction of the linear relationship between two variables.
computational formula: SP = ∑XY - ∑X∑Y/n
Correlation and the Pattern of Data Points
Multiplying (or dividing) each X or each Y value by a positive constant does not change the pattern and does not change the value of the correlation.
Multiplying by a negative constant, however, produces a mirror image of the pattern and, therefore, changes the sign of the correlation.
Adding a constant to (or subtracting a constant from) each X and/or Y value does not change the pattern of data points and does not change the correlation.
Using and Interpreting the Pearson Correlation
Correlation and Causation
Even if two variables are related, the relationship alone does not prove that one causes the other.
True cause-and-effect conclusions require controlled experiments where variables are manipulated and alternative explanations are ruled out.
Correlation does not imply causation.
Outliers
outliers: an individual with X and/or Y values that are substantially different (larger or smaller) from the values obtained for the other individuals in the data set.
Interpreting Correlations
When you encounter correlations, there are four additional considerations that you should bear in mind:
The value of a correlation can be affected greatly by the range of scores represented in the data.
One or two extreme data points, often called outliers , can have a dramatic effect on the value of a correlation.
Correlation simply describes a relationship between two variables. It does not explain why the two variables are related. Specifically, a correlation should not and cannot be interpreted as proof of a cause-and-effect relationship between the two variables.
When judging how “good” a relationship is, it is tempting to focus on the numerical value of the correlation. For example, a correlation of +0.50 is halfway between 0 and 1.00 and therefore appears to represent a moderate degree of relationship. However, a correlation should not be interpreted as a proportion.
Correlation and Restricted Range
restricted range: where the values of one or both variables in a correlation are limited to a narrow span, instead of representing the full range that normally exists.
Where and Why Correlations Are Used
Although correlations have a number of different applications, a few specific examples are presented next to give an indication of the value of this statistical measure.
Validity
Reliability.
Prediction
Theory Verification
Correlation and the Strength of the Relationship
The value r^2 is called the coefficient of determination because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable.
Hypothesis Tests with the Pearson Correlation
The Hypothesis Test
t = sample statistic - population parameter/standard error
The Hypotheses
When you obtain a nonzero correlation for a sample, the purpose of the hypothesis test is to decide between the following two interpretations
There is no correlation in the population (p = 0) and the sample value is the result of sampling error. Remember, a sample is not expected to be identical to the population. There always is some error between a sample statistic and the corresponding population parameter. This is the situation specified by H0.
The nonzero sample correlation accurately represents a real, nonzero correlation in the population. This is the alternative stated in H1.
Alternatives to the Pearson Correlation
Special Formula for the Spearman Correlation
When working with ranks for Spearman’s correlation, the calculations become much simpler because ranks are just the integers 1 through n.
his allows easy formulas for the mean and SS of the ranks, and leads to a simplified Spearman correlation formula that uses only the differences between paired ranks (D), provided there are no tied ranks.
SS = n(n^2-1)/12
rs = 1 - 6∑D^2/n(n^2-1)
Ranking Tied Scores
When you are converting scores into ranks for the Spearman correlation, you may encounter two (or more) identical scores. Whenever two scores have exactly the same value, their ranks should also be the same. This is accomplished by the following procedure:
Assign a rank (first, second, and so on) to each position in the ordered list.
When two (or more) scores are tied, compute the mean of their ranked positions, and assign this mean value as the final rank for each score.
List the scores in order from smallest to largest. Include tied values in the list.
The Point-Biserial Correlation and Measuring Effect Size with r^2
point-biserial correlation: used to measure the relationship between two variables in situations in which one variable consists of regular, numerical scores, but the second variable has only two values.
dichotomous variable or a binomial variable: A variable with only two values
The Spearman Correlation
When the Pearson correlation formula is used with data from an ordinal scale (ranks), the result is called the Spearman correlation
The word monotonic describes a sequence that is consistently increasing (or decreasing). Like the word monotonous, it means constant and unchanging.
The Phi-Coefficient
When both variables (X and Y) measured for each individual are dichotomous, the correlation between the two variables is called the phi-coefficient
Introduction to Linear Equations and Regression
Regression
The statistical technique for finding the best-fitting straight line for a set of data is called regression, and the resulting straight line is called the regression line.
The Standard Error of Estimate
standard error of estimate
The standard error of estimate gives a measure of the standard distance between the predicted Y values on the regression line and the actual Y values in the data.
Linear Equations
In general, a linear relationship between two variables X and Y can be expressed by the equation: y = bX + a
In the general linear equation, the value of b is called the slope
The value of a in the general equation is called the Y-intercept
Analysis of Regression: The Significance of the Regression Equation
For a single predictor (X) and outcome (Y), this test is equivalent to testing the significance of the Pearson correlation.
The regression analysis partitions the total variability in Y into the portion explained by the regression (predicted) and the unexplained portion (residual), and uses an F-ratio to assess whether the predicted variance is significantly greater than what would be expected by chance.
Testing the significance of a regression equation determines whether the equation meaningfully predicts variation in the Y variable or if the apparent relationship is just due to sampling error.