Two Proportions
Type of Plot
Stacked bar chart
Mosaic plot
qplot(x = explanatory,
data = dataframe,
fill = response,
geom = "bar")
mosaicplot(table(dataframe$explanatory, dataframe$response),
xlab = "Explanatory",
ylab = "Response",
main = "Explanatory vs Response")
Parameter: P1−P2
Point Estimate: ˆP1−ˆP2
Hypothesis Test
Hypotheses:
\( H_0: P_1 - P_2 = 0 \)
1) \( H_a: P_1 - P_2 < 0 \)
2) \( H_a: P_1 - P_2 \ne 0 \)
3) \( H_a: P_1 - P_2 > 0 \)
Test Statistic Random Variable (Assuming \(H_0\) is true):
\( Z =\dfrac{ (\hat{P}_1 - \hat{P}_2) - 0}{\sqrt{\dfrac{\hat{P}(1 - \hat{P})}{n_1} + \dfrac{\hat{P}(1 - \hat{P})}{n_2} }} \sim N(0, 1) \)
where \( \hat{P} = \dfrac{\text{total number of successes} }{ \text{total number of cases}} \)
Observed Test Statistic:
\( z_{obs} = \dfrac{ \hat{p}_{1, obs} - \hat{p}_{2, obs} }{\sqrt{\dfrac{\hat{p}_{obs}(1 - \hat{p}_{obs})}{n_1} + \dfrac{\hat{p}_{obs}(1 - \hat{p}_{obs)}}{n_2} } } \)
\( \mathbf{\textit{P}}\)-value:
1) \( \mathbb{P}(\hat{P}_1 - \hat{P}_2 \le \hat{p}_{1, obs} - \hat{p}_{2, obs}) = \mathbb{P}(Z \le z_{obs}) \)
2) \(2 \cdot \mathbb{P}(\hat{P}_1 - \hat{P}_2 \ge \hat{p}_{1, obs} - \hat{p}_{2, obs}) = \mathbb{P}(\big| Z \big| \ge \big| z_{obs} \big|) \)
3) \( \mathbb{P}(\hat{P}_1 - \hat{P}_2 \ge \hat{p}_{1, obs} - \hat{p}_{2, obs}) = \mathbb{P}(Z \ge z_{obs}) \)
Conditions for Distributional Approximation (To \( Z\)):
- Independent observations in each sample
- Number of pooled successes and pooled failures is at least 10 for each group
- Independent selection of samples
Confidence Interval
Formula for CI:
\( (\hat{p}_{1, obs} - \hat{p}_{2, obs}) \pm z^* \sqrt{ \dfrac{\hat{p}_{1, obs}(1 - \hat{p}_{1, obs})}{n_1} + \dfrac{\hat{p}_{2, obs}(1 - \hat{p}_{2, obs})}{n_2} }\)
Conditions for Distributional Approximation (To \( Z\)):
- Independent observations in each sample
- Number of observed successes and observed failures is at least 10 for each group
- Independent selection of samples