Please enable JavaScript.
Coggle requires JavaScript to display documents.
Book 2. Quantitative Analysis, Book 2. Quantitative Analysis - Coggle…
Book 2. Quantitative Analysis
Book 2. Quantitative Analysis
Fundamentals of Probability
12.1 Basics of Probability
Events and Event Spaces
An event is one of the possible outcomes or a subset of the possible outcomes of a random event
The event space is all the subsets of possible outcomes and the empty set
Independent and Mutually Exclusive Events
independent
Two events are mutually exclusive if the joint probability, P(AB) = 0
When two events are mutually exclusive,
P(A orB) = P(A) + P(B)
mutually exclusive
Conditionally Independent Events
12.2 Conditional, Unconditional, and Joint Probabilities
Discrete Probability Function
Conditional and Unconditional Probabilities
Bayes’ Rule
Random Variables
13.1 Probability Mass Functions, Cumulative Distribution Functions, and Expected Values
Random Variables and Probability Functions
PMF, f (x), gives us the probability that a discrete random variable will take on the value x.
CDF, F(x), gives us the probability that a random variable X will take on a value less than or equal to x
Expectations
mean of the distribution
E(X) = ΣP(xi)xi= P(x1)x1 + P(x2)x2+ … + P(xn)xn
13.2 Mean, Variance, Skewness and Kurtosis
variance is a measure of dispersion
skewness is a measure of symmetry
kurtosis is a measure of the proportion of the
outcomes in the tails of the distribution
13.3 Probability Density Functions, Quantiles, and Linear Transformation
Probability Density Functions
(PDF) provides the probability that a discrete random variable will take on a given value
Quantile Functions
A quantile is the percentage of outcomes less than a given
outcome
Q(x%), provides the value of an outcome which is greater
than x% of all possible outcomes
Q(50%) is the median of a distribution
The quantile function, Q(a), is the inverse of the CDF. Recall that a CDF gives us the probability that a random variable will be less than or equal to some value X = x.
Linear Transformations of Random Variables
For a variable Y = a + bX (a linear transformation of X):
Common Univariate Random Variables
14.1 Uniform, Bernoulli, Binomial, and Poisson Distributions
The Uniform Distribution
the probability of X occurring in a possible range is the length of the range relative to the total of all possible values
The Bernoulli Distribution
A Bernoulli random variable only has two possible outcomes. The outcomes can be defined as either a
success or a failure
.
The Binomial Distribution
A binomial random variable may be defined as the number of successes in a given number of Bernoulli trials, whereby the outcome can be
either success or failure
.
expected value of X = E(X) = np
variance of X = np(1 − p)
The Poisson Distribution
The Poisson distribution is a discrete probability distribution with a number of real world applications.
While the Poisson random variable X refers to the number of successes per unit, the parameter lambda (λ) refers to the average or
expected number of successes per unit
E(X) = λ
14.2 Normal and Lognormal Distributions
The Normal Distribution
The standard normal distribution
z-distribution
Z is how many deviation from the mean
Calculating probabilities using z-values
P(Z ≤ z) = N(z) for z ≥ 0
P(Z ≤ −z) = 1 − N(z)**
90% 1.65
95% 1.96
99% 2.58
The Lognormal Distribution
14.3 Student'sT, Chi-Squared, and F-Distribution
Student’s t-Distribution
Student’s t-distribution is similar to a normal distribution, but has
fatter tails (i.e., a greater proportion of the outcomes are in the tails of the distribution)
The Chi-Squared Distribution
asymmetrical, bounded below by zero
approaches the normal distribution in shape as the degrees of freedom increase
mean = n and variance = 2n
The F-Distribution
relationship between the F- and chi-squared
as the # of observations in denominator → ∞
approaches the normal distribution as the number of
observations increases
The Exponential Distribution
used to model waiting times
howlong it takes an employee to serve a customer
time it takes a company to default
the rate parameter λ
scale parameter β
The Beta Distribution
The beta distribution can be used for modeling default probabilities and recovery rates.
Mixture distributions
Mixture distributions combine the concepts of parametric and nonparametric distributions.
The component distributions used as inputs are parametric while the weights of each distribution within the mixture are based on historical data, which is nonparametric.
Multivariate Random Variables
15.1 Marginal and Conditional Distributions for Bivariate Distribution
Probability Matrices
Marginal and Conditional Distributions
15.2 Moments of Bivariate Random Distributions
Expectation of a Bivariate Random Function
Covariance and Correlation Between Random
Variables
15.3 Behavior of Moments for Bivariate Random Variables
Linear Transformations
Coskewness and cokurtosis are cross variable versions of skewness and kurtosis
Variance of Weighted Sum of Bivariate Random
Variables
Conditional Expectations
15.4 Independent and Identically Distributed Random Variables
Independent and identically distributed
(i.i.d.) random variables
Variables are independent of all other components.
Variables are all from a single univariate distribution
Variables all have the same moments
Expected value of the sum of n i.i.d. random variables is equal to nμ
Variance of the sum of n i.i.d. random variables is equal to nσ2
Variance of the sum of i.i.d. random variables grows linearly
Variance of the average of multiple i.i.d. random variables decreases as n increases
Sample Moments
16.1 Estimating mean, variance, and standard deviation
Population and Sample Moments
Variance and Standard Deviation
Point Estimates and Estimators
Biased Estimators
Unless you are specifically instructed on the exam to compute a biased variance, you should always compute the unbiased variance by dividing by (n − 1).
Best Linear Unbiased Estimator
16.2 Estimating Moments of the Distribution
Law of Large Numbers
Central Limit Theorem
Skewness and Kurtosis
measures the tendency of a larger amount of values moving in a given direction or with respect to the mean
the kurtosis measures the peakedness of a distribution with respect to the normal distribution
Median and Quantile Estimates
Estimating quantiles
Mean of Two Random Variables
Covariance and Correlation Between Random
Variables
Coskewness and Cokurtosis
Hypothesis Testing
17.1 Hypothesis Testing Basics
The Null Hypothesis and Alternative Hypothesis
The null hypothesis, designated H0, is the hypothesis the researcher wants to reject.
The Choice of the Null and Alternative Hypotheses
H0: μ = μ0 versus HA: μ ≠ μ0
Reject
test statistic > upper critical value
test statistic < lower critical value
One-Tailed and Two-Tailed Tests of Hypotheses
Type I and Type II Errors
Type I error: reject when it is actually true.
a significance level of x% = probability of Type I
Type II : failure to reject when it is actually false.
The Power of Test = 1 - Type II
The Relation Between Confidence Intervals and Hypothesis Tests
Statistical Significance vs. Practical Significance
transactions costs
Taxes
statistically significant results may not be economically significant
standard error
17.2 Hypothesis Testing Results
The p-Value
the probability of obtaining a test statistic that would lead to a rejection of the null hypothesis
Confidence Intervals for Hypothesis Tests
The t-Test
The z-Test
Testing the Equality of Means
Multiple Hypothesis Testing
Type I error will increase
A hypothesis is a statement about the value of a population parameter developed for the purpose of testing a theory or belief.
Linear Regression
18.1 Regression Analysis
Regression analysis seeks to measure how changes in one variable, called a dependent (or explained) variable can be explained by changes in one or more other variables called the independent (or explanatory) variables
E(Y) = α + β × (X)
Y = α + β × (X) + ε
Linear Regression Conditions
The relationship between Y and X should be linear
The error term must be additive
All X variables should be observable
18.2 Ordinary Least Squares Estimation
Coefficient of Determination of a Regression (R2)
The R2 of a regression model captures the fit of the model
For a regression model with a single independent variable, R2 is the square of the correlation between the independent and dependent variable
R2 = r2X,Y
Assumptions Underlying Linear Regression
The expected value of the error term, conditional on the independent variable, is zero [E(εi|Xi) = 0]
Survivorship, or sample selection, bias
Simultaneity bias
trading volume / volatility
Omitted variables
not excluded from the model
All (X, Y) observations are independent and identically distributed
(i.i.d.)
Variance of X is positive
Variance of the errors is constant
It is unlikely that large outliers will be observed in the data
Properties of OLS Estimators
Interpreting Regression Results
Dummy Variables
Independent variables that fall into this category are called dummy variables and are often used to quantify the impact of qualitative variables.
18.3 Hypothesis Testing
Specify the hypothesis to be tested.
Calculate the test statistic
Reject or fail to reject the null hypothesis after comparing the test statistic to its critical value
Properties of OLS Estimators
Confidence Intervals
The confidence interval of the slope coefficient = β ± (t × S).
The p-Value
The p-value is the smallest level of significance for which the null
hypothesis can be rejected
Regression with Multiple Explanatory Variables
19.1 Multiple Regression
19.2 Measure of fit in Linear Regression
Coefficient of Determination
1 = (ESS/TSS) + (RSS/TSS)
Adjusted R2
Joint Hypothesis Tests and Confidence Intervals
The F-test
R2 F = coefficient of determination of the full model
R2 P = coefficient of determination of the partial model
RSSP = residual sum of squares of the partial model
q = number of restrictions imposed on the full model to arrive at the partial model
RSSF = residual sum of squares of the full model
n = number of observations
H0: β1 = β 2 = β 3 = …. = β k = 0 versus HA: at least one β j ≠ 0
The standard error of the regression (SER) measures the uncertainty about the accuracy of the predicted values of the dependent variable
TSS = ESS + RSS
Regression Diagnostics
20.1 Heteroskedasticity and Multicollinearity
Effect of Heteroskedasticity on Regression Analysis
chi-squared test statistic
Estimate the regression using standard ordinary least squares (OLS) procedures and estimate the residuals and square them (ε2 i )
Use the squared estimated residuals in step 1 as the independent variable in a new regression with the original explanatory variables.
Calculate the R2 for the model in step 2 and use it to calculate the chisquared test statistic: χ2 = nR2
The chi-squared statistic is compared to its critical value with [k × (k + 3) / 2] degrees of freedom, where k = number of independent variables.
If the calculated χ2 > critical χ2, we reject the null hypothesis of no conditional heteroskedasticity.
Correcting for Heteroskedasticity
perfect collinearity
X3 = 2X1 + 3X2
Multicollinearity
the condition when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other.
Effect of Multicollinearity On Regression Analysis
Type II error
Detecting Multicollinearity
ttests indicate that none of the individual coefficients is significantly different than zero, while the R2 is high
the F-test rejects the null hypothesis
high R2 but large p value
variance inflation factor (VIF)
Xj = b0 + b1X1 + … + bj−1Xj−1 + bj+1Xj+1 + … + bkXk
A VIF > 10 (i.e., R2 > 90%) should be considered problematic for that variable
Correcting Multicollinearity
to omit one or more of the correlated independent variables
stepwise regression, which systematically remove variables from
the regression until multicollinearity is minimized
homoskedastic
If the variance of the residuals is constant across all observations in the sample, the regression is said to be homoskedastic
Unconditional heteroskedasticity
when the heteroskedasticity is not related to the level of the independent variables, which means that it doesn’t systematically increase or decrease with changes in the value of the independent variable(s).
heteroskedasticity
when the variance of the residuals is not the same across all observations in the sample.
Conditional heteroskedasticity
heteroskedasticity that is related to the
level of (i.e., conditional on) the independent variable.
20.2 Model Specification
Bias-Variance Tradeoff
variance errors
General-to-specific model
starting with the largest model and then successively dropping independent variables that have the smallest absolute t-statistic
m-fold cross-validation
dividing the sample into m parts and then using (m-1) parts (known as the training set) to fit the model and the remaining part (known as the validation block) to use for out-of sample validation.
Residual Plots
Ideally, the residuals
should be small in magnitude
not related to any of the explanatory
variables
Identifying Outliers
Cook’s measure
ˆy−j) = predicted value of y after dropping outlier observation j
ˆyi = predicted value of y without dropping any observation
k = number of independent variables
S2 = squared residuals in the model with all observations
Large values of Cook’s measure (i.e., Dj > 1) indicate that the dropped observation was indeed an outlier.
The Best Linear Unbiased Estimator
assumptions underlying the linear regression need to be satisfied
the relationship between Y and X(s) should be linear
residual distribution should be identical
Model specification
the model’s adjusted R2 declines
Omitted variable bias
the omitted variable is correlated with
other independent variables in the model
the omitted variable is a
determinant of the dependent variable
selecting the appropriate
explanatory variables to include in the model
Stationary Time Series
21.1 Covariance Stationary
Autocovariance and Autocorrelation Functions
The covariance between the current value of a time series and its value τ periods in the past is referred to as its autocovariance at l
lag τ.
Its autocovariances for all τ make up its autocovariance function (ACF)
If a time series is covariance stationary, its autocovariance function is stable over time
the partial autocorrelation function
makes up the correlations for all lags after controlling for the values between the lags
Partial autocorrelations may be large only for a few lags
those lags become prime candidates for inclusion in an autoregressive (AR) model
yt = d+ Φyt–1 + Φyt–1 +…+ Φyt–p + εt
White Noise ()
serially uncorrelated
exhibit zero correlation among any of its lagged values
A model’s forecast errors should follow a white noise process
unconditional mean and variance
Wold’s theorem
general linear process
Wold’s theorem proposes a way to model the role of white noise and holds that a covariance stationary process can be modeled as an infinite distributed lag of a white noise process. Such a model would take the following form
WN = serially uncorrelated series is one that has a mean of zero and a constant variance
independent white noise
time series
Its mean must be stable over time
Its variance must be finite and stable over time
Its covariance structure must be stable over time.
Covariance structure refers to the covariances among the values of a time series at its various lags, which are a given number of periods apart at which we can observe its values.
τ = 1 refers to a one-period lag, comparing
each value of a time series to its preceding value
τ = 4 we are
comparing values four periods apart along the time series
21.2 Autoregressive and Moving Average Models
Autoregressive Processes
Yule-Walker equation
Moving Average (MA) Processes
current random shock (εt)
lagged unobservable shock (εt−1)
μ = mean of the time series
θ = coefficient for the lagged random shock
yt = μ + εt + θ1εt−1 + ….. + θqεt−q
Lag Operators
It shifts the time index back by one period.
To apply the lag operator over multiple periods,
L^m x yt = yt−m.
When applied to a constant, the lag operator does not change the constant.
Forecasting models often take the form of a distributed lag that
assigns weights to the past values of a time series.
Lag polynomials can be multiplied.
Assuming that the coefficients satisfy some conditions, the polynomial can be inverted.
yt–1 = one-period lagged observation of the variable being estimated
εt = current random white noise shock (mean 0)
yt = the time series variable being estimated
Φ = coefficient for the lagged observation of the variable being estimated
d = intercept term
AR(1) series =
The long-run (or unconditional) mean reverting level of an AR(p) series
A key difference between an MA representation and an AR process is that the MA process shows autocorrelation cutoff while an AR process shows a gradual decay in autocorrelations
21.3 Autoregressive Moving Average (ARMA) Models
Application of AR, MA, and ARMA processes
ARMA(3,1) model
Sample and Partial Autocorrelations
Testing Autocorrelations
Box-Pierce (BP) statistic
QBP = chi-squared statistic with h degrees of freedom
We want all residual autocorrelations to be zero
smaller samples (T ≤ 100),Ljung-Box (LB) statistic
Modeling Seasonality in an ARMA
ARMA (p, q) × (ps, qs),
where ps and qs denote the seasonal component
ps and qs
are restricted to values of 1 or 0 (i.e., true or false)
|θ| < 1
also decay gradually for essentially the same reasons
Non-Stationary Time Series
22.1 Time Trends
A time series that tends to grow by a constant amount each period has a linear trend
A time series that tends to grow at a constant rate each period has a nonlinear trend
polynomial time trend
deterministic trends
stochastic trends
a log-linear model
ln (yt) = δ0 + δ1t + εt
log-quadratic model
22.2 Seasonality
(xt) to be related to sales for the same month last year (xt−12).
calendar effects
dummy variables
a value of either one or zero to represent the season being on or off
Wold’s theorem proposes a way to model the role of white noise and holds that a covariance stationary process can be modeled as an infinite distributed lag of a white noise process. Such a model would take the following form
After adding a time trend,
the model can then take the following form
We can expand the forecasting model even further by allowing for other calendar effects. For example, if we suspect a time series exhibits holiday variations (HDV) and trading-day variations, we can account for them with additional dummy variables
h-step-ahead point forecast
22.3 Unit Roots (random walk)
if its value in any given period is
its previous value plus-or-minus a random “shock.”
its variance increases with time
is not covariance stationary
random walk with drift
three main problems
unit root does not revert to a mean
show spurious relationships with each other
Dickey-Fuller distribution
If we use an ARMA model, its estimated parameters follow an asymmetric distribution that depends on the sample size and the presence of a time trend
e addressed by modeling the differences of the unit root series
how to test if a time series contains a unit root
augmented Dickey-Fuller test
null hypothesis is γ = 0
alternative hypothesis is γ < 0
γ > 0, significantly greater than zero
Measuring Return, Volatility, and Correlation
23.1 Defining Returns and Volaitlity
Simple and Continuously Compounded Returns
Volatility, Variance, and Implied Volatility
implied volatility
Black-Scholes-Merton (BSM)
inherent assumption that variance is constant over time
23.2 Normal and Non-normal distributions
Jarque-Bera Test
to test whether a distribution is normal
s zero skewness
kurtosis =3 (and excess kurtosis = 0)
JB is approximately χ22
The Power Law
power law tails
Student’s t-distribution
link to gu.qq.com
23.3 Correlations and Dependence
Spearman’s Rank Correlation
Kendall’s τ
measure concordant and discordant pairs and their
relative frequency
concordant
(Xi < Xj) and (Yi < Yj),
discordant
Positive Definiteness
every weighted average combination having a positive variance, requires that the variance of an average of components in a covariance matrix must be positive
equicorrelation
sets all correlations equal to the same amount
Simulation and Bootstrapping
24.1 Monte Carlo Simulation and Sampling Error Reduction
Reducing Monte Carlo Sampling Error
accuracy
the standard deviation
the number of scenarios run
Antithetic Variates
random values are constructed to generate negative correlation within the values used in the simulation
using a complement set of the original set of random variables
x = (x1 + x2) / 2
Without using antithetic variates, the two sets of Monte Carlo replications are independent. Thus, the covariance will be zero and the variance of ¯x is simply reduced to the following
Control Variates
reduce the variance of the approximation by adding values with a mean of zero that are correlated to the simulation
replacing a variable x (under simulation) that has unknown properties, with a similar variable y that has known properties
Monte Carlo Simulation
data generating process (DGP)
Calculate the statistic or function of interest, gi = g(xi)
Repeat steps 1 and 2 to produce N replications
Estimate the quantity of interest from {g1, g2, …, gb}
Evaluate the accuracy by computing the standard error. N should be increased until the required level of accuracy is achieved.
C1 = C0(1 + r).
24.2 Bootstrapping and Random Number Generation
The Bootstrapping Method
rely on the key assumption that the present resembles the past
Independent and identically distributed (i.i.d.)
simply drawn one-by-one from the observed data, and replaced
data is dependent across time
{x1, x2 … , x10}
{x2, x7, x9}
{x2, x5, x10}
{x1, x1, x8}
Circular block bootstrap (CBB)
{x1, x2, x3}, {x2, x3, x4}, …,{x8, x9, x10}, {x9, x10, x1}, {x10, x1, x2}.
Random Number Generation
to produce an irregular sequence of numerical values
pseudo-random number generators (PRNGs)
these computer-generated numbers are not truly random
they are actually generated from a formula
an initial seed value must first be chosen
The choice of seed value will determine the random number sequence that is generated
benifits
Repeatability
Computing Clusters
Using a common seed value allows us to use the same set of random numbers across multiple simulations
Disadvantages of Bootstrapping
Using the entire data set may not be reliable
Structural changes
Bootstrapping simulations repeatedly draw data from historical data sets, each time replacing the data so it can be redrawn
The bootstrapping technique requires no assumptions with respect to the true distribution of the parameter estimates
Disadvantages of Simulation
Specification of the DGP
assumptions of model inputs or the data generating process are unrealistic
option prices are typically fat-tailed
but a model could erroneously draw option prices from a normal distribution
Computational cost