Statistics

Quantile

Random Variable
$X: W\mapsto R$ #

Discrete Random Variable

Event (E: X = a)
Each sample ($\omega$) is mapped to a real number (a), and we call collection of all samples that shares the same number ($X = a$) an 'event'

Probability of an event

Expected value

$E(X)=\sum_{j=1}^{n}p(x_j)x_j$

Properties

$E(X+Y)=E(X)+E(Y)$
$E(aX+b)=aE(X)+b$

Variance

$\sigma^2=E((X-\mu)^2)$

Standard Deviation

$\sigma=\sqrt{(\sigma^2)}$

Properties

$Var(X+Y)=Var(X)+Var(Y) \Leftarrow X \perp Y$
$Var(aX+b)=a^2Var(X)$
$Var(X)=E(X^2)-E(X)^2$

Continuous Random Variable

Probability of an event

Probability Density Function (PDF): $f(x)$

$f(x)\geq 0$

$f(x)$ is not probability, so doesn't have to be $\leq 1$

$\int^{\infty}_{-\infty}f(x)dx=1$

Cumulative Distribution Function (CDF): $F(b)$

Calculation

properties

$F(b)$ is non-decreasing

$\lim_{x\to \infty}F(x)=1$
$\lim_{x\to -\infty}F(x)=0$

$P(a \leq X \leq b)=\int_a^b f(x) dx=F(b)-F(a)$

$F'(x)=f(x)$
Fundamental Theorem of Calculus part.1

Common Continuous Distributions

Uniform R.V: $X\sim U(a,b)$

Params: a, b
Range: [a,b]
Density Fn: $f(x)=\frac{1}{b-a}$
CDF: $F(x)=\frac{x - a}{b - a}$ for $a \leq x \leq b$

Exponential R.V: $X\sim exp(\lambda)$
$\lambda$: $\frac{events}{time}$
probability of time taken between successive events

Params: $\lambda$
Range: $[0, \infty)$
Density Fn: $f(x)=\lambda e^{-\lambda x}$
CDF: $F(x)=1-e^{-\lambda x}$ for $x \geq 0$
-

Normal R.V: $X\sim N(\mu, \sigma^2)$

Params: $\mu, \sigma$
Range: $(-\infty, \infty)$
Density Fn: $f(x)=\frac{1}{\sigma \sqrt{2\pi}} e^{\frac{-(x-\mu)^2}{2\sigma^2}}$
CDF: (graph only)
-

Expected Value

$E(X)=\int^{b}_{a}xf(x)dx$

$E(Y)=E(h(X))=\int^{\infty}_{-\infty}h(x)f_X(x)dx$

Quantile

Median

$p^{th}$ quantile

Any $q_p$ which $P(X \leq q_p)=p$
or $F(q_p)=p$

median is $q_{0.5}$

Percentile

$p=x^{th} = q_\frac{x}{100}$

Deciles

quartiles

$p=q^{th} = q_\frac{q}{4}$

$p=d^{th} = q_\frac{d}{10}$

Let $\bar{X}_n=\frac{1}{n}\sum^{n}_{i=1}X_i$
and $S_n =\sum^{n}_{i=1}X_i$
Where X are i.i.d random variables

The Law of Large Number

Central Limit Theorem

As n grows, $lim_{n \to \infty}P(|\bar{X}_n-\mu | < a)=1$
a is an arbitrarily small threshold (say 0.0001)

As n grows, the distributuion of $\bar{X}_n$ converges to $N(\mu, \frac{\sigma^2}{n})$

Standardization

$Z=\frac{X-\mu}{\sigma}$

$Z \sim N(0,1)$

Histgram

Frequency Histogram

Height = # of $x_i$

Density Histogram

Area = fraction of all data points that lie in the bin
Sum of area must equal 1
Closely related to the PMF of discrete random variable

Properties

$E(\bar{X}_n)=\mu, Var(\bar{X}_n)=\frac{\sigma^2}{n}$
$E(S_n)=n \mu, Var(S_n)=n \sigma^2$

Discrete Joint

Continuous Joint

Joint PMF of $X, Y, ...$:
p(x_1, y_1, ...), p(x_2, y_1, ...), ...

$0 \leq p(x_i, y_j) \leq 1$

$\sum_{i=1}^n \sum_{j=1}^m p(x_i, y_j) = 1$

Joint CDF: $F(x, y)$

Joint PDF of $X, Y$: $f(x, y)$

$P(a < X < b, c < Y < d)=\int_a^b \int_c^d f(x,y) dy dx = 1$

Joint CDF: $F(x, y)=P(X \leq x, Y \leq y)$

$f(x,y)=\frac{\partial^2 F}{\partial y \partial x}$

Marginal Distribution

PMF

$p_X(x_i)=\sum_j p(x_i, y_j)$

$p_Y(y_j)=\sum_i p(x_i, y_j)$

PDF

$f_X(x)=\int_c^d f(x,y) dy$

$f_Y(y)=\int_a^bf(x,y)dx$

CDF
X and Y takes [a, b] X [c, d]

$F_X(x)=F(x, d)$

$F_Y(y)=F(b,y)$

Independence

PMF

PDF

CDF

$p(x_i,y_j)=p(x_i)p(y_j)$

$f(x, y)=f_X(x)f_Y(y)$

$F(X,Y)=F(X)F(Y)$

Discrete Variables

P of cell in table
=
P of marginal distribution of its row and column

Covariance and Correlation:
Measure of how two random variables vary together

Covariance
$(Cov(X,Y)=E((X-\mu_X)(Y-\mu_Y))$

Properties (Bilinear Map)

$Cov(aX+b, cY+d)=acCov(X, Y)$

$Cov(X_1+X_2,Y)=Cov(X_1,Y)+Cov(X_2,Y)$

$Cov(X,X)=Var(X)$

$Cov(X,Y)=E(XY)-\mu_x\mu_y$

$Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)$

$Cov(X,Y)=0 \Leftarrow X \perp Y$

Discrete

Continuous

$\sum_i^n \sum_j^m p(x_i,y_j)(x_i-\mu_X)(y_j-\mu_Y)$

$\int_a^b \int_c^d (x-\mu_X)(y-\mu_Y)f(x,y) dy dx$

Correlation $\sigma$
$=Cor(X,Y)=\frac{Cov(X,Y)}{\sigma_X \sigma_Y}$

$-1 \leq \sigma \leq 1$

$+\sigma$: positive corr
$-\sigma$: negative corr

Rules/Theorems

Conditional Probability
$P(A|B)$ = P(A "given" B)

Independence
$A \perp B$ iff
$P(A|B)=P(A)$
or $P(A \cap B) = P(A)P(B)$
or $P(A \cup B) = P(A) + P(B) - P(A)P(B) $

Baye's Theorem
$P(B|A)=\frac{P(A|B)P(B)}{P(A)}$

Multiplication Rule
$P(A|B)=\frac{P(A \cap B)}{P(B)}$
$P(A \cap B)=P(A|B)*P(B)$

Law of Total Probability
$P(A)=P(A \cap B_1)+P(A \cap B_2)+P(A \cap B_3)+...$
$P(A)=P(A|B_1)P(B_1)+P(A|B_2)P(B_2)+P(A|B_3)P(B_3)+...$

Axiom of Probability

'$X = a$' is an event where $X(w)=a$

$E \subseteq W$

Sample Space (W)
(or universal set)

$P(E) = \sum_{w\in E}P(w)$

$P(c\leq X \leq d)=\int^{d}_{c}f(x)dx$

Probability Mass Function (PMF): $p(a)$

Cumulative Distribution Function (CDF): $F(a)$

$p(a)=P(X=a)$
$0 \leq p(a) \leq 1$

$F(a)=P(X \leq a)$

sum of all p(b) which $-\infty \leq b \leq a$

$F(b) = P(X \leq b) = \int^{b}_{-\infty}f(x)dx$

$f(x)$ is PDF of X

$0 \leq F(x) \leq 1$

R.V with functions
If $h(x)$ is a function and X is a random variable,
then $h(X)$ is also a random variable

Any x which $P(X \leq x) = 0.5$
or $F(x) = 0.5$

Use Table (X\Y\Z\...)

$f(x,y) \geq 0$

$\sum_{x_i\leq x} \sum_{y_i\leq y} p(x_i, y_i)$

Volume under surface

$P(X \leq x, Y \leq y) = \int_c^y \int_a^x f(u, v) dudv$

$(\int_a^b \int_c^d xyf(x,y) dy dx) - \mu_x \mu_y$

$(\sum_i^n \sum_j^m p(x_i,y_j)x_i y_i) - \mu_X\mu_Y$

Maximum Likelihood Estimation
Find the hypothesis(H) which maximizes the likelihood

Likelihood
$P(D|H)$

Log Likelihood
Let likelihood function in log function
$ln(P(D|H))$

Odd
$O(E) = \frac{P(E)}{P(E^c)}$

Bayes Factor
$\frac{P(D|H)}{P(D|H^c)}$

Posterior odd = Bayes Factor * Prior odd
$O(H|D) = \frac{P(D|H)}{P(D|H^c)} * O(H)$

$BF > 1$
Data provides evidence for the hypothesis
$BF < 1$
Data provides evidence against the hypothesis
$BF = 1$
Data provides no evidence

Types of Data

Qualitative

Quantitative

Binary
True/False

Ordinal
Day1/Day2/Day3/...
Low/Medium/High/...

Nominal
Red/Green/Blue/...

Discrete
0, 1, 2, ...

Continuous
0.01203 ~ 0.30402 ...

Trimmed Mean

Data must be sorted before trimming

Let $n = 9$ and $ \alpha = 0.25$, then we need to take out $9 * 0.25 = 2.25$ of elements from the data.

Since we have fractional part, we take ceil(2.25) and floor(2.25), which are 3 and 2, so we calculate the mean of data by taking out 3 and 2 each from the beginning and the end of the data.

After that, the means are interpolated by the factional value 0.25, in order words,

$0.25*mean(data, trim=2) + 0.75*mean(data, trim=3)$

$W = ${$w_1, w_2, ...$}
where $P(w)$ is probability of outcome w

$P(E) > 0$ for $E \subset W$

$P(W) = 1$

If $E_1, E_2, ..., E_k$ for all $E_i \cap E_j = \emptyset $ where $i \neq j$
Then $P(E_1 \cup E_2 \cup ... \cup E_k) = P(E_1) + P(E_2) + ... + P(E_k)$

Counting

Permutation
Let n = 10, and we want to find ways to arrange them in 3, regarding order, is then

$10 * 9 * 8$

Generalized: $\frac{10!}{(10-3)!} = \frac{n!}{(n-k)!}$

Combination
Combination is the same as permutation except the order doesn't matter. So for the example of permutation above, combination would be

$\frac{10*9*8}{3*2*1}$ because in the permuted arrangements $10*9*8$, we want to exclude the duplicated arrangements $3*2*1$

Generalized: $\frac{n!}{(n-k)!k!}$

Mutually Exclusive
$A \cap B = \emptyset$
$P(A \cup B) = P(A) + P(B)$

Confidence/Prediction Interval of Normal Population
100(1-a)%

Decision Tree

Unknowingly distributed population

Normally distributed population
(QQ plot nearly straight line)

Population $\sigma$ unknown

Not enough samples

* Use this all the time for STMATH 390

Population $\sigma$ known

Large enough samples
$n \gt 30$

Assume normality due to CLT

Not enough samples

Population $\sigma$ known

$ P(-z_{\frac{\alpha}{2}} \lt \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \lt z_{\frac{\alpha}{2}}) \approx \alpha - 1 $

Population $\sigma$ unknown

$ P(-z_{\frac{\alpha}{2}} \lt \frac{\bar{X} - \mu}{s / \sqrt{n}} \lt z_{\frac{\alpha}{2}}) \approx \alpha - 1 $

CI: $\bar{X} \pm z_\frac{a}{2} \frac{\sigma}{\sqrt{n}}$
PI: $\bar{X} \pm z_\frac{a}{2} \sigma \sqrt{1+\frac{1}{n}}$

CI: $\bar{X} \pm z_\frac{a}{2} \frac{s}{\sqrt{n}}$
PI: $\bar{X} \pm z_\frac{a}{2} s \sqrt{1+\frac{1}{n}}$

Poisson R.V: $X \sim Poisson(\alpha, t))$
$\alpha$: $\frac{events}{time}$, $\mu$: $\alpha * t $
probability of number of events in fixed time interval

Params: $\alpha, t$
Range: $[0, \infty)$
Density Fn: $f(x) = \frac{\mu^x e^{-\mu}}{x!}$
CDF: $F(x) = $

Use t - distribution
CI: $\bar{X} \pm t_{\frac{a}{2}, n-1} \frac{s}{\sqrt{n}}$
PI: $\bar{X} \pm t_{\frac{a}{2}, n-1} s \sqrt{1+\frac{1}{n}}$

Large enough samples
$n \gt 30$

t - distribution with 30 degrees of freedom is almost identical to the standard normal distribution.

CI: $\bar{X} \pm z_\frac{a}{2} \frac{s}{\sqrt{n}}$
PI: $\bar{X} \pm z_\frac{a}{2} s \sqrt{1+\frac{1}{n}}$

Population Proportion

p = proportion of successes
E(X) = np
$\sigma_x = \sqrt{np(1-p)}$

If $np \geq 10$ and $n(1-p) \geq 10$, $X$ is approximately normal

estimator: $\hat{p} = X/n$
$P(-z_{\frac{\alpha}{2}} \lt \frac{\hat{p} - p}{\sqrt{np(1-p)}} \lt z_{\frac{\alpha}{2}} ) \approx \alpha - 1 $

CI: $\hat{p} \pm z_{\frac{\alpha}{2}} \sqrt{\frac{p(1-p)}{n}}$

Prediction Interval
What is $X_{n+1}$ given $X_1, X_2, ... X_n$ (All normally distributed)?

$\bar{X} \pm t_{\frac{a}{2}, n-1} s \sqrt{1+\frac{1}{n}}$

Estimator $\bar{X}$

Prediction Error
$\bar{X} - X_{n+1}$

Variance
$= Var(\bar{X} - X_{n+1})$
$= Var(\bar{X}) + Var(X_{n+1})$
$= \frac{\sigma^2}{n} + \frac{\sigma^2}{1}$
$= \sigma^2 (\frac{1}{n} + 1)$

Variance/Standard Deviation
given $X_1, X_2, ... X_n$ normally distributed
$\frac{(n-1)S^2}{\sigma^2} = \frac{\sum (X_i - \bar{X})^2 }{\sigma^2} \sim \chi^2$ with n -1 df

Expected Value
$= E(\bar{X} - X_{n+1})$
$= E(\bar{X}) - E(X_{n+1})$
$= \mu - \mu$
$=0$

Since Xs are all normally distributed,
$\bar{X} - X_{n+1} \sim N(0, \sigma^2 (\frac{1}{n} + 1))$

Then,
$P(-z_{\frac{\alpha}{2}} \lt \frac{(\bar{X} - X_{n+1}) - 0}{\sqrt{\sigma^2 (\frac{1}{n} + 1)}} \lt z_{\frac{\alpha}{2}}) = 1 - \alpha$
Or, since we do not know the population standard deviation,
$P(-t_{\frac{a}{2}, n-1} \lt \frac{(\bar{X} - X_{n+1}) - 0}{\sqrt{s^2 (\frac{1}{n} + 1)}} \lt t_{\frac{a}{2}, n-1}) = 1 - \alpha$
*Use this for exam

CI: $\bar{X} \pm z_\frac{a}{2} \frac{\sigma}{\sqrt{n}}$
PI: $\bar{X} \pm z_\frac{a}{2} \sigma \sqrt{1+\frac{1}{n}}$

$P(\chi^2_{1-\frac{\alpha}{2}, n-1} < \frac{(n-1)S^2}{\sigma^2} < \chi^2_{\frac{\alpha}{2}, n-1}) = 1 - \alpha$
*Note that $\chi^2$ isn't symmetric

$P(\frac{(n-1)S^2}{\chi^2_{\frac{\alpha}{2}, n-1}} < \sigma^2 < \frac{(n-1)S^2}{\chi^2_{1-\frac{\alpha}{2}, n-1}}) = 1 - \alpha$
*Note that critical values are swapped

Common Discrete Distributions

Bernoulli R.V: $X \sim Bernoulli(p)$

Binomial R.V: $X \sim Binomial(n, p)$

Param: p
Range:
Mass fn: $P(X=0) = 1 - p$, $P(X=1) = p$
CDF: $F(a) = 0 $ if $ x \gt 0$, $1 - p$ if $x \geq 0 \geq 1$, $1$ if $x \gt 1$

Geometric R.V: $X \sim Geometric(p)$

Negative Binomial R.V: $X \sim NBinomial(r, p)$

Pareto Charts

Categories are ordered
Descending order: Largest number of frequency to the left

Chain Rule
$P(A_n, A_{n-1},..., A_1) = P(A_n | A_{n-1}, ..., A_1)P(A_{n-1}, ..., A_1)$
$P(A_n, A_{n-1},..., A_1) = P(A_n | A_{n-1}, ..., A_1)P(A_{n-1} | A_{n-2}, ..., A_1)P(A_{n-2}, ..., A_1)$
...

Multiplication Rule (three variables)
$P(A|B,C)$
$= \frac{P(A,B,C)}{P(B,C)} = \frac{\frac{P(A,B,C)}{P(C)}}{\frac{P(B,C)}{P(C)}}$
$= \frac{P(A,B|C)}{P(B|C)}$

Conditional Independence
A and B are conditionally independent if
$P(A,B|C) = P(A|C)P(B|C)$ where $P(C) \gt 0$

It also follows
$P(A|B,C) = \frac{P(A,B|C)}{P(B|C)} = \frac{P(A|C)P(B|C)}{P(B|C)} = P(A|C)$

Param: n, p
Range:
Mass fn: $P(X=n) = {n \choose k}(1-p)^{n-k}(p)^k$
CDF:

$0.8 \leq r \leq 1$: Strong correlation
$0.5 \lt r \lt 0.8$: Moderate correlation
$0 \leq r \leq 0.5$: Weak correlation