Unit 2 Ap Statistics Mindmap (correlation (measure the direction and the…
Unit 2 Ap Statistics Mindmap
used to determine the line of best fit for a set of data, providing a visual demonstration of the relationship between the data points
-Predict the value of a dependent variable based on the value of at least one independent variable.
-Explain the impact of changes in an independent variable on the dependent variable.
Analyzing Departures from Linearity
An outlier is a data point that response y does not follow the general trend of the rest of the data. A data point has high leverage if it has extreme predictor x values. With a single predictor, an extreme x value is simply one that is particularly high or low.
Is an outlier that greatly affects the slope of the regression line. If this removal significantly changes the slope of the regression line, then the point is considered an influential point.
Least Squares Regression
The least-squares regression line is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.
variation in the (Y) response variable that is explained by the explanatory variable (x)
r2 represents the percentage of the variance in y (vertical scatter from the regression line) that can be explained by changes in x.
measure the direction and the strength of linear relationship between two quantitative variable
calculation of correlation is based on mean and standard deviation
r = correlation
r2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset.
A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis.
The mean od residuals always 0
It is a scatterplot of the residuals vs the explanatory variable.
the horizontal line at zero helps orient us. residual = 0 line corresponds to the regression line.
y= actual value - predicted value
Extrapolation is the use of a regression line for prediction far outside the range of values of x used to obtain the line. Such predictions are often not accurate
Distribution of column variable separately expressed in counts or percent.
A conditional distribution is the distribution of one factor for each level of the other factor