Please enable JavaScript.
Coggle requires JavaScript to display documents.
Correlation and Regression - Coggle Diagram
Correlation and Regression
Statistical technique that is used to measure and describe the relationship between two variables.
Two variables are observed as they exist in their natural environment.
Scores are normally identified as X and Y.
Can be listed in a table or a graphic like a scatterplot.
Characteristics of a Relationship:
The Direction: the sign of the correlation, positive or negative, describes the direction of the relationship.
Positive Correlation: the two variables change in the same direction.
Negative Correlation: the two variables go in the opposite direction. Creates an inverse relationship.
The form: the points on the scatter plot tendency to cluster, or not cluster, around a centered line.
Lack of form indicates weak correlation:
Strength or Consistency of the Relationship: measured by the numerical value of the correlation.
The Pearson Correlation: measures the degree and direction of the linear relationship between two variables.
Computing using formula:
Requires calculation of the sum of products of deviations: also uses directional and computational formula.
Definitional Formula:
Operational Formula:
or:
Perfect linear relationship would be demonstrated by a change in X corresponding to a change in Y.
Reasons to use Correlation:
Prediction: if two variables are known to be related in some systemic way, it is possible to use one of the variables to make predictions about the other.
Validity: use correlation to demonstrate validity by comparing to previous, related measurements.
Reliability: measurement procedure is considered reliable when it produces stable, consistent measurements.
Theory Verification: prediction of a theory can be tested by determining the correlation between two variables.
Things to Consider with Correlations:
Correlation describes a relationship, it does not explain it.
Value of the correlation can be strongly affected by the range of scores found in the data.
Outliers can also have a dramatic affect on the value of the correlation.
Correlation should not be interpreted as a proportion.
Hypothesis Testing with Pearson Correlation:
Sample correlation is often used to answer questions about the corresponding population correlation.
Basic question is whether a correlation exists in the population.
Null hypothesis: No. There is no correlation in the population.
Alternative Hypothesis: Yes. There is a real, nonzero correlation in the population.
When a nonzero number is obtained, one must decide between two interpretations:
There is no correlation in the population and the sample value is a result of sampling error.
The nonzero sample correlation accurately represents a real, nonzero correlation in the population.
Alternatives to the Pearson Correlation:
The Spearman Correlation: Used in two situations.
Measures relationship between X and Y with both variables are measured on an ordinal scale.
Measures degree to which the relationship between X and Y is consistently one directional.
If a relationship is consistently one directional, it is said to be monotonic.
In the event that tied scores occur: Step 1. List scores in order from largest to smallest; Step 2. Assign a rank to each position in the ordered list; Step 3. Compute mean of tied scores and use mean as final rank of score.
Point Biserial Correlation: Used to measure relationship between two variables when one variable consists of regular numeric scores, but second variable only has two values, also known as a binomial variable.
One category is designated as a zero and one category is designated as a one, then the Pearson correlation formula is used with converted data.
The Phi-Coefficient: When both variables measured are dichotomous.
Step 1: Convert each of the dichotomous variables to numerical values. Step 2. Use the regular Pearson formula with the converted scores.
Linear Equations: expressed by equation
Describes relationship between two variables.
Slope: determines how much the Y variable changes when X is increased by one point.
Y Intercept: determines value of Y when X is zero.
Regression: the statistical technique for determining the best fitting straight line for a set of data. Resulting line is called the regression line.
Uses least squared solution:
Z-scores are standardized and can used in the equation:
Standard Error of Estimate: gives a measure of a standard distance between a predicted Y value on the regression line and the actual Y values in the data.
Standard Error of Estimate: