Please enable JavaScript.
Coggle requires JavaScript to display documents.
Statistics (Random Variable \(X: W\mapsto R\) # (Discrete Random Variable…
Statistics
Random Variable
\(X: W\mapsto R\)
#
Discrete Random Variable
Probability of an event
\(P(E) = \sum_{w\in E}P(w)\)
Expected value
\(E(X)=\sum_{j=1}^{n}p(x_j)x_j\)
Properties
\(E(X+Y)=E(X)+E(Y)\)
\(E(aX+b)=aE(X)+b\)
Variance
\(\sigma^2=E((X-\mu)^2)\)
Standard Deviation
\(\sigma=\sqrt{(\sigma^2)}\)
Properties
\(Var(X+Y)=Var(X)+Var(Y) \Leftarrow X \perp Y\)
\(Var(aX+b)=a^2Var(X)\)
\(Var(X)=E(X^2)-E(X)^2\)
Discrete Joint
Joint PMF of \(X, Y, ...\)
:
p(x_1, y_1, ...), p(x_2, y_1, ...), ...
\(0 \leq p(x_i, y_j) \leq 1\)
\(\sum_{i=1}^n \sum_{j=1}^m p(x_i, y_j) = 1\)
Use Table (X\Y\Z\...)
Joint CDF
: \(F(x, y)=P(X \leq x, Y \leq y)\)
\(\sum_{x_i\leq x} \sum_{y_i\leq y} p(x_i, y_i)\)
Probability Mass Function (PMF): \(p(a)\)
\(p(a)=P(X=a)\)
\(0 \leq p(a) \leq 1\)
Cumulative Distribution Function (CDF): \(F(a)\)
\(F(a)=P(X \leq a)\)
sum of all p(b)
which \(-\infty \leq b \leq a\)
Common Discrete Distributions
Bernoulli R.V: \(X \sim Bernoulli(p)\)
Param: p
Range:
Mass fn: \(P(X=0) = 1 - p\), \(P(X=1) = p\)
CDF: \(F(a) = 0 \) if \( x \gt 0\), \(1 - p\) if \(x \geq 0 \geq 1\), \(1\) if \(x \gt 1\)
Binomial R.V: \(X \sim Binomial(n, p)\)
Param: n, p
Range:
Mass fn: \(P(X=n) = {n \choose k}(1-p)^{n-k}(p)^k\)
CDF:
Geometric R.V: \(X \sim Geometric(p)\)
Negative Binomial R.V: \(X \sim NBinomial(r, p)\)
Event (E: X = a)
Each sample (\(\omega\)) is mapped to a real number (a), and we call collection of all samples that shares the same number (\(X = a\)) an 'event'
'\(X = a\)' is an
event
where \(X(w)=a\)
\(E \subseteq W\)
Continuous Random Variable
Probability of an event
\(P(c\leq X \leq d)=\int^{d}_{c}f(x)dx\)
Probability Density Function (PDF): \(f(x)\)
\(f(x)\geq 0\)
\(f(x)\) is not probability, so doesn't have to be \(\leq 1\)
\(\int^{\infty}_{-\infty}f(x)dx=1\)
Cumulative Distribution Function (CDF): \(F(b)\)
Calculation
\(P(a \leq X \leq b)=\int_a^b f(x) dx=F(b)-F(a)\)
\(F(b) = P(X \leq b) = \int^{b}_{-\infty}f(x)dx\)
\(f(x)\) is PDF of X
properties
\(F(b)\) is non-decreasing
\(\lim_{x\to \infty}F(x)=1\)
\(\lim_{x\to -\infty}F(x)=0\)
\(F'(x)=f(x)\)
Fundamental Theorem of Calculus part.1
\(0 \leq F(x) \leq 1\)
Common Continuous Distributions
Uniform R.V: \(X\sim U(a,b)\)
Params: a, b
Range: [a,b]
Density Fn: \(f(x)=\frac{1}{b-a}\)
CDF: \(F(x)=\frac{x - a}{b - a}\) for \(a \leq x \leq b\)
Exponential R.V: \(X\sim exp(\lambda)\)
\(\lambda\): \(\frac{events}{time}\)
probability of time taken between successive events
Params: \(\lambda\)
Range: \([0, \infty)\)
Density Fn: \(f(x)=\lambda e^{-\lambda x}\)
CDF: \(F(x)=1-e^{-\lambda x}\) for \(x \geq 0\)
-
Normal R.V: \(X\sim N(\mu, \sigma^2)\)
Params: \(\mu, \sigma\)
Range: \((-\infty, \infty)\)
Density Fn: \(f(x)=\frac{1}{\sigma \sqrt{2\pi}} e^{\frac{-(x-\mu)^2}{2\sigma^2}}\)
CDF: (graph only)
-
Poisson R.V: \(X \sim Poisson(\alpha, t))\)
\(\alpha\): \(\frac{events}{time}\), \(\mu\): \(\alpha * t \)
probability of number of events in fixed time interval
Params: \(\alpha, t\)
Range: \([0, \infty)\)
Density Fn: \(f(x) = \frac{\mu^x e^{-\mu}}{x!}\)
CDF: \(F(x) = \)
Expected Value
\(E(X)=\int^{b}_{a}xf(x)dx\)
\(E(Y)=E(h(X))=\int^{\infty}_{-\infty}h(x)f_X(x)dx\)
Quantile
Median
Any x which \(P(X \leq x) = 0.5\)
or \(F(x) = 0.5\)
\(p^{th}\) quantile
Any \(q_p\) which \(P(X \leq q_p)=p\)
or \(F(q_p)=p\)
median is \(q_{0.5}\)
Percentile
\(p=x^{th} = q_\frac{x}{100}\)
Deciles
\(p=d^{th} = q_\frac{d}{10}\)
quartiles
\(p=q^{th} = q_\frac{q}{4}\)
Continuous Joint
Joint CDF
: \(F(x, y)\)
\(f(x,y)=\frac{\partial^2 F}{\partial y \partial x}\)
Volume under surface
\(P(X \leq x, Y \leq y) = \int_c^y \int_a^x f(u, v) dudv\)
Joint PDF of \(X, Y\)
: \(f(x, y)\)
\(P(a < X < b, c < Y < d)=\int_a^b \int_c^d f(x,y) dy dx = 1\)
\(f(x,y) \geq 0\)
Marginal Distribution
PMF
\(p_X(x_i)=\sum_j p(x_i, y_j)\)
\(p_Y(y_j)=\sum_i p(x_i, y_j)\)
PDF
\(f_X(x)=\int_c^d f(x,y) dy\)
\(f_Y(y)=\int_a^bf(x,y)dx\)
CDF
X and Y takes [a, b] X [c, d]
\(F_X(x)=F(x, d)\)
\(F_Y(y)=F(b,y)\)
Independence
PMF
\(p(x_i,y_j)=p(x_i)p(y_j)\)
PDF
\(f(x, y)=f_X(x)f_Y(y)\)
CDF
\(F(X,Y)=F(X)F(Y)\)
Discrete Variables
P of cell in table
=
P of marginal distribution of its row and column
R.V with functions
If \(h(x)\) is a function and X is a random variable,
then \(h(X)\) is also a random variable
Rules/Theorems
Conditional Probability
\(P(A|B)\) = P(A "given" B)
Multiplication Rule
\(P(A|B)=\frac{P(A \cap B)}{P(B)}\)
\(P(A \cap B)=P(A|B)*P(B)\)
Law of Total Probability
\(P(A)=P(A \cap B_1)+P(A \cap B_2)+P(A \cap B_3)+...\)
\(P(A)=P(A|B_1)P(B_1)+P(A|B_2)P(B_2)+P(A|B_3)P(B_3)+...\)
Chain Rule
\(P(A_n, A_{n-1},..., A_1) = P(A_n | A_{n-1}, ..., A_1)P(A_{n-1}, ..., A_1)\)
\(P(A_n, A_{n-1},..., A_1) = P(A_n | A_{n-1}, ..., A_1)P(A_{n-1} | A_{n-2}, ..., A_1)P(A_{n-2}, ..., A_1)\)
...
Multiplication Rule (three variables)
\(P(A|B,C)\)
\(= \frac{P(A,B,C)}{P(B,C)} = \frac{\frac{P(A,B,C)}{P(C)}}{\frac{P(B,C)}{P(C)}}\)
\(= \frac{P(A,B|C)}{P(B|C)}\)
Independence
\(A \perp B\) iff
\(P(A|B)=P(A)\)
or \(P(A \cap B) = P(A)P(B)\)
or \(P(A \cup B) = P(A) + P(B) - P(A)P(B) \)
Conditional Independence
A and B are conditionally independent if
\(P(A,B|C) = P(A|C)P(B|C)\) where \(P(C) \gt 0\)
It also follows
\(P(A|B,C) = \frac{P(A,B|C)}{P(B|C)} = \frac{P(A|C)P(B|C)}{P(B|C)} = P(A|C)\)
Baye's Theorem
\(P(B|A)=\frac{P(A|B)P(B)}{P(A)}\)
Mutually Exclusive
\(A \cap B = \emptyset\)
\(P(A \cup B) = P(A) + P(B)\)
Axiom of Probability
Sample Space (W)
(or universal set)
\(W = \){\(w_1, w_2, ...\)}
where \(P(w)\) is probability of outcome w
\(P(E) > 0\) for \(E \subset W\)
\(P(W) = 1\)
If \(E_1, E_2, ..., E_k\) for all \(E_i \cap E_j = \emptyset \) where \(i \neq j\)
Then \(P(E_1 \cup E_2 \cup ... \cup E_k) = P(E_1) + P(E_2) + ... + P(E_k)\)
Confidence/Prediction Interval of Normal Population
100(1-a)%
Decision Tree
Unknowingly distributed population
Large enough samples
\(n \gt 30\)
Assume normality due to CLT
Population \(\sigma\) known
\( P(-z_{\frac{\alpha}{2}} \lt \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \lt z_{\frac{\alpha}{2}}) \approx \alpha - 1 \)
CI: \(\bar{X} \pm z_\frac{a}{2} \frac{\sigma}{\sqrt{n}}\)
PI: \(\bar{X} \pm z_\frac{a}{2} \sigma \sqrt{1+\frac{1}{n}}\)
Population \(\sigma\) unknown
\( P(-z_{\frac{\alpha}{2}} \lt \frac{\bar{X} - \mu}{s / \sqrt{n}} \lt z_{\frac{\alpha}{2}}) \approx \alpha - 1 \)
CI: \(\bar{X} \pm z_\frac{a}{2} \frac{s}{\sqrt{n}}\)
PI: \(\bar{X} \pm z_\frac{a}{2} s \sqrt{1+\frac{1}{n}}\)
Not enough samples
?
Normally distributed population
(QQ plot nearly straight line)
Population \(\sigma\) unknown
Not enough samples
* Use this all the time for STMATH 390
Use t - distribution
CI: \(\bar{X} \pm t_{\frac{a}{2}, n-1} \frac{s}{\sqrt{n}}\)
PI: \(\bar{X} \pm t_{\frac{a}{2}, n-1} s \sqrt{1+\frac{1}{n}}\)
Large enough samples
\(n \gt 30\)
t - distribution with 30 degrees of freedom is almost identical to the standard normal distribution.
CI: \(\bar{X} \pm z_\frac{a}{2} \frac{s}{\sqrt{n}}\)
PI: \(\bar{X} \pm z_\frac{a}{2} s \sqrt{1+\frac{1}{n}}\)
Population \(\sigma\) known
CI: \(\bar{X} \pm z_\frac{a}{2} \frac{\sigma}{\sqrt{n}}\)
PI: \(\bar{X} \pm z_\frac{a}{2} \sigma \sqrt{1+\frac{1}{n}}\)
Population Proportion
p = proportion of successes
E(X) = np
\(\sigma_x = \sqrt{np(1-p)}\)
If \(np \geq 10\) and \(n(1-p) \geq 10\), \(X\) is approximately normal
estimator: \(\hat{p} = X/n\)
\(P(-z_{\frac{\alpha}{2}} \lt \frac{\hat{p} - p}{\sqrt{np(1-p)}} \lt z_{\frac{\alpha}{2}} ) \approx \alpha - 1 \)
CI: \(\hat{p} \pm z_{\frac{\alpha}{2}} \sqrt{\frac{p(1-p)}{n}}\)
Prediction Interval
What is \(X_{n+1}\) given \(X_1, X_2, ... X_n\) (All normally distributed)?
\(\bar{X} \pm t_{\frac{a}{2}, n-1} s \sqrt{1+\frac{1}{n}}\)
Estimator \(\bar{X}\)
Prediction Error
\(\bar{X} - X_{n+1}\)
Variance
\(= Var(\bar{X} - X_{n+1})\)
\(= Var(\bar{X}) + Var(X_{n+1})\)
\(= \frac{\sigma^2}{n} + \frac{\sigma^2}{1}\)
\(= \sigma^2 (\frac{1}{n} + 1)\)
Expected Value
\(= E(\bar{X} - X_{n+1})\)
\(= E(\bar{X}) - E(X_{n+1})\)
\(= \mu - \mu\)
\(=0\)
Since Xs are all normally distributed,
\(\bar{X} - X_{n+1} \sim N(0, \sigma^2 (\frac{1}{n} + 1))\)
Then,
\(P(-z_{\frac{\alpha}{2}} \lt \frac{(\bar{X} - X_{n+1}) - 0}{\sqrt{\sigma^2 (\frac{1}{n} + 1)}} \lt z_{\frac{\alpha}{2}}) = 1 - \alpha\)
Or, since we do not know the population standard deviation,
\(P(-t_{\frac{a}{2}, n-1} \lt \frac{(\bar{X} - X_{n+1}) - 0}{\sqrt{s^2 (\frac{1}{n} + 1)}} \lt t_{\frac{a}{2}, n-1}) = 1 - \alpha\)
*
Use this for exam
Variance/Standard Deviation
given \(X_1, X_2, ... X_n\) normally distributed
\(\frac{(n-1)S^2}{\sigma^2} = \frac{\sum (X_i - \bar{X})^2 }{\sigma^2} \sim \chi^2\) with n -1 df
\(P(\chi^2_{1-\frac{\alpha}{2}, n-1} < \frac{(n-1)S^2}{\sigma^2} < \chi^2_{\frac{\alpha}{2}, n-1}) = 1 - \alpha\)
*
Note that \(\chi^2\) isn't symmetric
\(P(\frac{(n-1)S^2}{\chi^2_{\frac{\alpha}{2}, n-1}} < \sigma^2 < \frac{(n-1)S^2}{\chi^2_{1-\frac{\alpha}{2}, n-1}}) = 1 - \alpha\)
*
Note that critical values are swapped
Let \(\bar{X}_n=\frac{1}{n}\sum^{n}_{i=1}X_i\)
and \(S_n =\sum^{n}_{i=1}X_i\)
Where X are i.i.d random variables
The Law of Large Number
As n grows, \(lim_{n \to \infty}P(|\bar{X}_n-\mu | < a)=1\)
a is an arbitrarily small threshold (say 0.0001)
Central Limit Theorem
As n grows, the distributuion of \(\bar{X}_n\) converges to \(N(\mu, \frac{\sigma^2}{n})\)
Standardization
\(Z=\frac{X-\mu}{\sigma}\)
\(Z \sim N(0,1)\)
Properties
\(E(\bar{X}_n)=\mu, Var(\bar{X}_n)=\frac{\sigma^2}{n}\)
\(E(S_n)=n \mu, Var(S_n)=n \sigma^2\)
Covariance and Correlation
:
Measure of how two random variables vary together
Covariance
\((Cov(X,Y)=E((X-\mu_X)(Y-\mu_Y))\)
Properties (Bilinear Map)
\(Cov(aX+b, cY+d)=acCov(X, Y)\)
\(Cov(X_1+X_2,Y)=Cov(X_1,Y)+Cov(X_2,Y)\)
\(Cov(X,X)=Var(X)\)
\(Cov(X,Y)=E(XY)-\mu_x\mu_y\)
\(Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\)
\(Cov(X,Y)=0 \Leftarrow X \perp Y\)
Discrete
\(\sum_i^n \sum_j^m p(x_i,y_j)(x_i-\mu_X)(y_j-\mu_Y)\)
\((\sum_i^n \sum_j^m p(x_i,y_j)x_i y_i) - \mu_X\mu_Y\)
Continuous
\(\int_a^b \int_c^d (x-\mu_X)(y-\mu_Y)f(x,y) dy dx\)
\((\int_a^b \int_c^d xyf(x,y) dy dx) - \mu_x \mu_y\)
Correlation \(\sigma\)
\(=Cor(X,Y)=\frac{Cov(X,Y)}{\sigma_X \sigma_Y}\)
\(-1 \leq \sigma \leq 1\)
\(+\sigma\): positive corr
\(-\sigma\): negative corr
\(0.8 \leq r \leq 1\):
Strong
correlation
\(0.5 \lt r \lt 0.8\):
Moderate
correlation
\(0 \leq r \leq 0.5\):
Weak
correlation
Types of Data
Qualitative
Binary
True/False
Ordinal
Day1/Day2/Day3/...
Low/Medium/High/...
Nominal
Red/Green/Blue/...
Quantitative
Discrete
0, 1, 2, ...
Continuous
0.01203 ~ 0.30402 ...
Counting
Permutation
Let n = 10, and we want to find ways to arrange them in 3, regarding order, is then
\(10 * 9 * 8\)
Generalized: \(\frac{10!}{(10-3)!} = \frac{n!}{(n-k)!}\)
Combination
Combination is the same as permutation except the order doesn't matter. So for the example of permutation above, combination would be
\(\frac{10*9*8}{3*2*1}\) because in the permuted arrangements \(10*9*8\), we want to exclude the duplicated arrangements \(3*2*1\)
Generalized: \(\frac{n!}{(n-k)!k!}\)
Trimmed Mean
Data must be sorted before trimming
Let \(n = 9\) and \( \alpha = 0.25\), then we need to take out \(9 * 0.25 = 2.25\) of elements from the data.
Since we have fractional part, we take ceil(2.25) and floor(2.25), which are 3 and 2, so we calculate the mean of data by taking out 3 and 2 each from the beginning and the end of the data.
After that, the means are interpolated by the factional value 0.25, in order words,
\(0.25*mean(data, trim=2) + 0.75*mean(data, trim=3)\)
Histgram
Frequency Histogram
Height = # of \(x_i\)
Density Histogram
Area =
fraction
of all data points that lie in the bin
Sum of area must equal 1
Closely related to the
PMF
of discrete random variable
Pareto Charts
Categories are ordered
Descending order: Largest number of frequency to the left
Maximum Likelihood Estimation
Find the hypothesis(H) which maximizes the likelihood
Likelihood
\(P(D|H)\)
Log Likelihood
Let likelihood function in log function
\(ln(P(D|H))\)
Odd
\(O(E) = \frac{P(E)}{P(E^c)}\)
Bayes Factor
\(\frac{P(D|H)}{P(D|H^c)}\)
\(BF > 1\)
Data provides evidence for the hypothesis
\(BF < 1\)
Data provides evidence against the hypothesis
\(BF = 1\)
Data provides no evidence
Posterior odd
= Bayes Factor * Prior odd
\(O(H|D) = \frac{P(D|H)}{P(D|H^c)} * O(H)\)