Please enable JavaScript.
Coggle requires JavaScript to display documents.
Correlation- a measure that describes the strength and direction of a…
Correlation- a measure that describes the strength and direction of a relationship between two variables.
Usually the two variables in a correlational study are simply observed as they exist naturally in the environment—there is no attempt to control or manipulate the variables.
These scores normally are identified as X and Y. The pairs of scores can be listed in a table, or they can be presented graphically in a scatter plot
Envelope - sketched a line around the data points. It encloses the data, often helps you to see the overall trend in the data.
3 Characteristics
- The Direction of the Relationship. The sign of the correlation, positive or negative, describes the direction of the relationship.
Positive Correlation - the two variables tend to change in the same direction: As the value of the X variable increases from one individual to another, the Y variable also tends to increase; when the X variable decreases, the Y variable also decreases.
Negative Correlation - the two variables tend to go in opposite directions. As the X variable increases, the Y variable decreases. That is, it is an inverse relationship.
- The Form of the Relationship. The relationships tend to have a linear form; that is, the points in the scatter plot tend to cluster around a straight line. We have drawn a line through the middle of the data points in each figure to help show the relationship. The most common use of correlation is to measure straight-line relationships. However, other forms of relationships do exist and there are special correlations used to measure them.
- The Strength or Consistency of the Relationship. Finally, the correlation measures the consistency of the relationship. For a linear relationship, for example, the data points could fit perfectly on a straight line. Every time X increases by one point, the value of Y also changes by a consistent and predictable amount. However, relationships are usually not perfect. The degree of relationship is measured by the numerical value of the correlation. A perfect correlation always is identified by a correlation of 1.00 and indicates a perfectly consistent relationship. For a correlation of 1.00 (or −1.00), each change in X is accompanied by a perfectly predictable change in Y. At the other extreme, a correlation of 0 indicates no consistency at all. For a correlation of 0, the data points are scattered randomly with no clear trend. Intermediate values between 0 and 1 indicate the degree of consistency.
The value r (squared) is called the coefficient of determination because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable. A correlation of r = 0.80 (or −0.80), for example, means that (or 64%) of the variability in the Y scores can be predicted from the relationship with X.
Pearson correlation - measures the degree and the direction of the linear relationship between two variables.
-
r = (degree of which x and y vary together) / (degree of which x and y vary separately) = (covariability of x and y) / (variability of x and y separately)
The sum of products of deviations, or SP.
Definitional Formula:
- Find the X deviation and the Y deviation for each individual.
- Find the product of the deviations for each individual.
-
Computational Formula:
-
Because the Pearson correlation describes the pattern formed by the data points, any factor that does not change the pattern also does not change the correlation.
z-scores identify the exact location of each individual score within a distribution, each X value can be transformed into a z-score.
Prediction. If two variables are known to be related in some systematic way, it is possible to use one of the variables to make accurate predictions about the other.
-
Reliability. In addition to evaluating the validity of a measurement procedure, correlations are used to determine reliability. A measurement procedure is considered reliable to the extent that it produces stable, consistent measurements. That is, a reliable measurement procedure will produce the same (or nearly the same) scores when the same individuals are measured twice under the same conditions.
Theory Verification. Many psychological theories make specific predictions about the relationship between two variables.
Considerations:
- Correlation simply describes a relationship between two variables. It does not explain why the two variables are related. Specifically, a correlation should not and cannot be interpreted as proof of a cause-and-effect relationship between the two variables.
- The value of a correlation can be affected greatly by the range of scores represented in the data.
- One or two extreme data points, often called outliers , can have a dramatic effect on the value of a correlation.
- When judging how “good” a relationship is, it is tempting to focus on the numerical value of the correlation. For example, a correlation of +0.50 is halfway between 0 and 1.00 and therefore appears to represent a moderate degree of relationship. However, a correlation should not be interpreted as a proportion. Although a correlation of 1.00 does mean that there is a 100% perfectly predictable relationship between X and Y, a correlation of 0.50 does not mean that you can make predictions with 50% accuracy. To describe how accurately one variable predicts the other, you must square the correlation. Thus, a correlation of r = 0.50 means that one variable partially predicts the other, but the predictable portion is only r (squared) = 0.50 (squared) = 0.25 (or 25%) of the total variability.
One of the most common errors in interpreting correlations is to assume that a correlation necessarily implies a cause-and-effect relationship between the two variables. Correlation only means there is a relationship between the 2 variables and 1 variable does not effect another variable.
Restricted Range - whenever a correlation is computed from scores that do not represent the full range of possible values.
An outlier is an individual with X and/or Y values that are substantially different (larger or smaller) from the values obtained for the other individuals in the data set.
-
Other relationship tests: the Spearman correlation, the point-biserial correlation, and the phi-coefficient.
Spearman Correlation - When the Pearson correlation formula is used with data from an ordinal scale (ranks), the result is called the.
Point-Biserial Correlation - compare the independent-measures t test (Chapter 10) and a special version of the Pearson correlation.
-
Regression - is a powerful modeling technique used to analyze and estimate the relationship between a dependent variable (the outcome) and one or more independent variables (the predictors).
Linear Equation - describes the relationship between the total cost (Y) and the number of months (X):
-
-
-
-
The statistical technique for finding the best-fitting straight line for a set of data is called regression, and the resulting straight line is called the regression line.
We can define the best-fitting line as the one that has the smallest total squared error. For obvious reasons, the resulting line is commonly called the least-squared-error solution
-
Caution
- The predicted value is not perfect.
- The regression equation should not be used to make predictions for X values that fall outside the range of values covered by the original data.
Occasionally, however, researchers standardize the scores by transforming the X and Y values into z-scores before finding the regression equation. The resulting equation is often called the standardized form of the regression equation and is greatly simplified compared to the raw-score version.
The standard error of estimate gives a measure of the standard distance between the predicted Y values on the regression line and the actual Y values in the data.
The process of testing the significance of a regression equation is called analysis of regression and is very similar to the analysis of variance (ANOVA)