Please enable JavaScript.
Coggle requires JavaScript to display documents.
Correlation and Regression - Coggle Diagram
Correlation and Regression
Correlation is a statistical technique that is used to measure and describe the relationship between two variables.
In a positive correlation , the two variables tend to change in the same direction: As the value of the X variable increases from one individual to another, the Y variable also tends to increase; when the X variable decreases, the Y variable also decreases.
In a negative correlation , the two variables tend to go in opposite directions. As the X variable increases, the Y variable decreases. That is, it is an inverse relationship.
The Direction of the Relationship. The sign of the correlation, positive or negative, describes the direction of the relationship.
The most common use of correlation is to measure straight-line relationships. However, other forms of relationships do exist and there are special correlations used to measure them.
The correlation measures the consistency of the relationship.
For correlations, there are four additional considerations that you should bear in mind.
When judging how “good” a relationship is, it is tempting to focus on the numerical value of the correlation.
One or two extreme data points, often called outliers , can have a dramatic effect on the value of a correlation.
The value of a correlation can be affected greatly by the range of scores represented in the data.
a correlation should not and cannot be interpreted as proof of a cause-and-effect relationship between the two variables.
An envelope encloses the data, often helps you to see the overall trend in the data. A
The value r squared is called the coefficient of determination because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable.
An outlier is an individual with X and/or Y values that are substantially different (larger or smaller) from the values obtained for the other individuals in the data set.
A correlation measures the degree of relationship between two variables on a scale from 0 to 1.00. Although this number provides a measure of the degree of relationship, the squared correlation provides a better measure of the strength of the relationship.
The hypothesis test evaluating the significance of a correlation can be conducted using either a t statistic or an F-ratio.
The Pearson correlation measures the degree and the direction of the linear relationship between two variables.
The calculation of the Pearson correlation requires one new concept: the sum of products of deviations, or SP
One common technique for demonstrating validity is to use a correlation
A measurement procedure is considered reliable to the extent that it produces stable, consistent measurements.
f two variables are known to be related in some systematic way, it is possible to use one of the variables to make accurate predictions about the other.
Many psychological theories make specific predictions about the relationship between two variables
One of the most common errors in interpreting correlations is to assume that a correlation necessarily implies a cause-and-effect relationship between the two variables.
The Pearson correlation measures the degree of linear relationship between two variables when the data (X and Y values) consist of numerical scores from an interval or ratio scale of measurement.
When the Pearson correlation formula is used with data from an ordinal scale (ranks), the result is called the Spearman correlation
When both variables (X and Y) measured for each individual are dichotomous, the correlation between the two variables is called the phi-coefficient
three additional correlations: the Spearman correlation, the point-biserial correlation, and the phi-coefficient.
The statistical technique for finding the best-fitting straight line for a set of data is called regression, and the resulting straight line is called the regression line.
The standard error of estimate gives a measure of the standard distance between the predicted Y values on the regression line and the actual Y values in the data.