Descriptive Statistics

Ways to represent two categorical variables

Segmented bar graph

Side by side bar graph

Mosaic Plots

Side by side bar graph are used to display two categorical variables. The difference between bar graphs and histograms is that the bars do not touch in bar graphs

A segmented bar graph is a stacked bar graph where each bar shows 100 percent of the discrete value. You will need to turn two-way tables that display frequencies to relative freq for this type of graph.

A mosaic plot expresses a part to whole relationship between two or more variables. It allows us visualize data to determine independence of variables. If variables are independent, the bozes across the categories have the same area.

Frequency table

Frequency table is a table that shows frequency counts for a categorical variable. We use this table to make the side by side bar graph, segmented bar graph and mosaic plot.

Ways to calculate them

Use marginal relative frequency

Use conditional relative frequency

Ways to represent the relationship between two quantitative variables

Scatterplots

Using bivariate data to find the explanatory and response variables for the scatterplot.

Ways to describe scatterplot.

Direction: Positive/ Negative

Outliers: Unusual values

Form: Linear/ Curved/ No form

Strength: weak/Strong/moderate

Using correlation(r) to determine the strength of the linear relationship

A perfect positive/negative linear relationship has a correlation 1/-1

r have no units, and it will not change it value if you switch the x and y axis. Correlation does NOT imply causation.

r that is bigger than 0.7 is considered strong correlation. r that is between .7 and .5 is moderate correlation. r that is less than .5 is considered weak.

Residuals

They are the left over vertical variation in the response variable from the LSRL. They show the form of the data by creating residual plot

Equation: Actual value - Predicted value

Least squares regression

r^2 tells us the proportion of variation in the values of y that can be explained by the value of x.

Standard deviation of the residuals: give us the typical or average prediction value

Ways to interpret: --- of the variation in the response variable that is explained by the explanatory variable

the typical prediction error is --- units.

For linear, the equation is a+bx

Can influence data dramatically, include its slope, y-intercept, r, and s.