Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chp. 14 - Correlation & Regression - Coggle Diagram

- - - - The pairs of scores can be listed in a table, or they can be presented graphically in a scatter plot . It allows you to see any patterns or trends that exist in the data.
        
        x - value : independent variable
        y - value : dependent variable
  - - - In a negative correlation, the two variables tend to go in opposite directions.
  - - - At the other extreme, a correlation of 0 indicates no consistency at all. For a correlation of 0, the data points are scattered randomly with no clear trend. Intermediate values between 0 and 1 indicate the degree of consistency.
        
        Sketching a line around the data points is called an envelope which helps you see the overall trend in the data. Football shape = correlation around 0.7. Fatter than football = correlation closer to 0. Narrow shape = correlation closer to 1.00.
        
        If you sketch a line around the data points, it's called an envelope, which helps you see the overall trend in the data. Football shape = correlation around 0.7. Fatter than football = correlation closer to 0. Narrow shape = correlation closer to 1.00.
- - - - Positive sign = line slopes up to the right; High value for the correlation (near 1.00) indicates points are tightly clustered close to the line.
        
        Because the Pearson correlation describes the pattern formed by the data points, any factor that does not change the pattern also does not change the correlation.
        
        In summary, adding a constant to (or subtracting a constant from) each X and/or Y value does not change the pattern of data points and does not change the correlation. Also, multiplying (or dividing) each X or each Y value by a positive constant does not change the pattern and does not change the value of the correlation.
        
        Multiplying by a negative constant, however, produces a mirror image of the pattern and, therefore, changes the sign of the correlation.
  - - - Z-scores identify the exact location of each individual score within a distribution. With this in mind, each X value can be transformed into a z-score, zx , using the mean and standard deviation for the set of Xs. Similarly, each Y score can be transformed into zy . If the X and Y values are viewed as a sample, the transformation is completed using the sample formula for z
        
        If X and Y values form a complete population, the z-scores are computed using a separate equation.
- - - - Validity: You can demonstrate the validity of the test by using correlation.
    - - Theory Verification: the prediction of the theory could be tested by determining the correlation between the two variables.
  - - - The value of a correlation can be affected greatly by the range of scores represented in the data.
        
        One or two extreme data points, often called outliers , can have a dramatic effect on the value of a correlation.
        
        A correlation should not be interpreted as a proportion. To describe how accurately one variable predicts the other, you must square the correlation. Thus, a correlation of means that one variable partially predicts the other, but the predictable portion is only (or 25%) of the total variability.
  - - - Although there may be a causal relationship, the simple existence of a correlation does not prove it.
        
        To establish a cause-and-effect relationship, it is necessary to conduct a true experiment (see The Experimental Method ) in which one variable is manipulated by a researcher and other variables are rigorously controlled.
  - - - The correlation within this restricted range could be completely different from the correlation that would be obtained from a full range of scores.
  - - - The problem of outliers is a good reason for looking at a scatter plot instead of simply basing your interpretation on the numerical value of the correlation. If you only “go by the numbers,” you might overlook the fact that one extreme data point inflated the size of the correlation.
  - - - One of the common uses of correlation is for prediction.
        
        In general, the squared correlation measures the gain in accuracy that is obtained from using the correlation for prediction. The squared correlation measures the proportion of variability in the data that is explained by the relationship between X and Y. It is sometimes called the coefficient of determination .
        
        The value is called the coefficient of determination because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable. A correlation of (or −0.80), for example, means that (or 64%) of the variability in the Y scores can be predicted from the relationship with X.
- - - - The null hypothesis is “No. There is no correlation in the population,” or “The population correlation is zero.”
        
        The alternative hypothesis is “Yes. There is a real, nonzero correlation in the population.”
        
        Samples are not expected to be identical to the populations from which they come; there will be some discrepancy (sampling error) between a sample statistic and the corresponding population parameter.
        
        Specifically, you should always expect some error between a sample correlation and the population correlation it represents.
      - The purpose of the hypothesis test is to decide between 2 interpretations:
        
        There is no correlation in the population ρ and the sample value is the result of sampling error. Remember, a sample is not expected to be identical to the population. There always is some error between a sample statistic and the corresponding population parameter. This is the situation specified by H0 .
        
        The nonzero sample correlation accurately represents a real, nonzero correlation in the population. This is the alternative stated in H1 .
        
        The correlation from the sample will help to determine which of these two interpretations is more likely. A sample correlation near zero supports the conclusion that the population correlation is also zero. A sample correlation that is substantially different from zero supports the conclusion that there is a real, nonzero correlation in the population.
- - - - The Spearman correlation is used to measure the relationship between X and Y when both variables are measured on ordinal scales.
        
        The Spearman correlation can be used as a valuable alternative to the Pearson correlation, even when the original raw scores are on an interval or a ratio scale.
        
        The Spearman correlation can be used to measure the degree to which a relationship is consistently one directional, independent of its form. I
        
        When there is a consistently one-directional relationship between two variables, the relationship is said to be monotonic. Thus, the Spearman correlation measures the degree of monotonic relationship between two variables.
      - 2 Situations which Spearman correlation is used:
        
        Spearman is used when the original data are ordinal; that is, when the X and Y values are ranks. In this case, you simply apply the Pearson correlation formula to the set of ranks.
        
        The Spearman correlation is used when a researcher wants to measure the degree to which the relationship between X and Y is consistently one directional, independent of the specific form of the relationship.
  - - - However, note that this special formula should be used only after the scores have been converted to ranks and when there are no ties among the ranks.
  - - - A variable with only two values is called a dichotomous variable or a binomial variable. Ex) success vs failure; 1st born vs later-born child; older than 30 years old vs younger than 30 years old
        
        To compute the point-biserial correlation, the dichotomous variable is first converted to numerical values by assigning a value of zero (0) to one category and a value of one (1) to the other category.
  - - - Convert each of the dichotomous variables to numerical values by assigning a 0 to one category and a 1 to the other category for each of the variables.
        
        Use the regular Pearson formula with the converted scores.
- - - - If the correlation is near 1.00 (or −1.00), the data points are clustered close to the line, and the standard error of estimate is small. As the correlation gets nearer to zero, the data points become more widely scattered, the line provides less accurate predictions, and the standard error of estimate grows larger.
        
        The regression equation simply describes the best-fitting line and is used for making predictions. However, and the standard error of estimate indicate how accurate these predictions will be.