Two Proportions

Type of Plot

Stacked bar chart

Mosaic plot

qplot(x = explanatory,
data = dataframe,
fill = response,
geom = "bar")

mosaicplot(table(dataframe$explanatory, dataframe$response),
xlab = "Explanatory",
ylab = "Response",
main = "Explanatory vs Response")

Example Problem

Parameter: $P_1 - P_2$
Point Estimate: $\hat{P}_1 - \hat{P}_2$

Hypothesis Test

Hypotheses:
$ H_0: P_1 - P_2 = 0 $
1) $ H_a: P_1 - P_2 < 0 $
2) $ H_a: P_1 - P_2 \ne 0 $
3) $ H_a: P_1 - P_2 > 0 $

Test Statistic Random Variable (Assuming $H_0$ is true):
$ Z =\dfrac{ (\hat{P}_1 - \hat{P}_2) - 0}{\sqrt{\dfrac{\hat{P}(1 - \hat{P})}{n_1} + \dfrac{\hat{P}(1 - \hat{P})}{n_2} }} \sim N(0, 1) $

where $ \hat{P} = \dfrac{\text{total number of successes} }{ \text{total number of cases}} $

Observed Test Statistic:
$ z_{obs} = \dfrac{ \hat{p}_{1, obs} - \hat{p}_{2, obs} }{\sqrt{\dfrac{\hat{p}_{obs}(1 - \hat{p}_{obs})}{n_1} + \dfrac{\hat{p}_{obs}(1 - \hat{p}_{obs)}}{n_2} } } $

$ \mathbf{\textit{P}}$-value:
1) $ \mathbb{P}(\hat{P}_1 - \hat{P}_2 \le \hat{p}_{1, obs} - \hat{p}_{2, obs}) = \mathbb{P}(Z \le z_{obs}) $
2) $2 \cdot \mathbb{P}(\hat{P}_1 - \hat{P}_2 \ge \hat{p}_{1, obs} - \hat{p}_{2, obs}) = \mathbb{P}(\big| Z \big| \ge \big| z_{obs} \big|) $
3) $ \mathbb{P}(\hat{P}_1 - \hat{P}_2 \ge \hat{p}_{1, obs} - \hat{p}_{2, obs}) = \mathbb{P}(Z \ge z_{obs}) $

Conditions for Distributional Approximation (To $ Z$):

Independent observations in each sample
Number of pooled successes and pooled failures is at least 10 for each group
Independent selection of samples

Confidence Interval

Formula for CI:
$ (\hat{p}_{1, obs} - \hat{p}_{2, obs}) \pm z^* \sqrt{ \dfrac{\hat{p}_{1, obs}(1 - \hat{p}_{1, obs})}{n_1} + \dfrac{\hat{p}_{2, obs}(1 - \hat{p}_{2, obs})}{n_2} }$

Conditions for Distributional Approximation (To $ Z$):

Independent observations in each sample
Number of observed successes and observed failures is at least 10 for each group
Independent selection of samples