Please enable JavaScript.

Coggle requires JavaScript to display documents.

Book 2. Quantitative Analysis, Book 2. Quantitative Analysis - Coggle…

- - - - An event is one of the possible outcomes or a subset of the possible outcomes of a random event
      - The event space is all the subsets of possible outcomes and the empty set
    - - independent
      - Two events are mutually exclusive if the joint probability, P(AB) = 0
      - When two events are mutually exclusive,
        P(A orB) = P(A) + P(B)
      - mutually exclusive
- - - - PMF, f (x), gives us the probability that a discrete random variable will take on the value x.
      - CDF, F(x), gives us the probability that a random variable X will take on a value less than or equal to x
    - - mean of the distribution
      - E(X) = ΣP(xi)xi= P(x1)x1 + P(x2)x2+ … + P(xn)xn
  - - - (PDF) provides the probability that a discrete random variable will take on a given value
    - - A quantile is the percentage of outcomes less than a given
        outcome
      - Q(x%), provides the value of an outcome which is greater
        than x% of all possible outcomes
      - Q(50%) is the median of a distribution
      - The quantile function, Q(a), is the inverse of the CDF. Recall that a CDF gives us the probability that a random variable will be less than or equal to some value X = x.
    - - For a variable Y = a + bX (a linear transformation of X):
- - - - the probability of X occurring in a possible range is the length of the range relative to the total of all possible values
    - - A Bernoulli random variable only has two possible outcomes. The outcomes can be defined as either a success or a failure.
    - - A binomial random variable may be defined as the number of successes in a given number of Bernoulli trials, whereby the outcome can be either success or failure.
      - expected value of X = E(X) = np
      - variance of X = np(1 − p)
    - - The Poisson distribution is a discrete probability distribution with a number of real world applications.
      - While the Poisson random variable X refers to the number of successes per unit, the parameter lambda (λ) refers to the average or expected number of successes per unit
      - E(X) = λ
  - - - z-distribution
        
        Z is how many deviation from the mean
    - - P(Z ≤ z) = N(z) for z ≥ 0
        P(Z ≤ −z) = 1 − N(z)**
      - 90% 1.65
      - 95% 1.96
      - 99% 2.58
  - - - Student’s t-distribution is similar to a normal distribution, but has fatter tails (i.e., a greater proportion of the outcomes are in the tails of the distribution)
    - - asymmetrical, bounded below by zero
      - approaches the normal distribution in shape as the degrees of freedom increase
      - mean = n and variance = 2n
    - - relationship between the F- and chi-squared
        
        as the # of observations in denominator → ∞
      - approaches the normal distribution as the number of
        observations increases
    - - used to model waiting times
        
        howlong it takes an employee to serve a customer
        
        time it takes a company to default
      - the rate parameter λ
      - scale parameter β
    - - The beta distribution can be used for modeling default probabilities and recovery rates.
    - - Mixture distributions combine the concepts of parametric and nonparametric distributions.
      - The component distributions used as inputs are parametric while the weights of each distribution within the mixture are based on historical data, which is nonparametric.
- - - - Coskewness and cokurtosis are cross variable versions of skewness and kurtosis
  - - - Variables are independent of all other components.
      - Variables are all from a single univariate distribution
      - Variables all have the same moments
      - Expected value of the sum of n i.i.d. random variables is equal to nμ
      - Variance of the sum of n i.i.d. random variables is equal to nσ2
      - Variance of the sum of i.i.d. random variables grows linearly
      - Variance of the average of multiple i.i.d. random variables decreases as n increases
- - - - Unless you are specifically instructed on the exam to compute a biased variance, you should always compute the unbiased variance by dividing by (n − 1).
  - - - measures the tendency of a larger amount of values moving in a given direction or with respect to the mean
      - the kurtosis measures the peakedness of a distribution with respect to the normal distribution
- - - - The null hypothesis, designated H0, is the hypothesis the researcher wants to reject.
    - - H0: μ = μ0 versus HA: μ ≠ μ0
      - Reject
        
        test statistic > upper critical value
        
        test statistic < lower critical value
    - - Type I error: reject when it is actually true.
        
        a significance level of x% = probability of Type I
      - Type II : failure to reject when it is actually false.
        
        The Power of Test = 1 - Type II
    - - transactions costs
      - Taxes
      - statistically significant results may not be economically significant
  - - - the probability of obtaining a test statistic that would lead to a rejection of the null hypothesis
    - - Type I error will increase
- - - - The relationship between Y and X should be linear
      - The error term must be additive
      - All X variables should be observable
  - - - The R2 of a regression model captures the fit of the model
      - For a regression model with a single independent variable, R2 is the square of the correlation between the independent and dependent variable
      - R2 = r2X,Y
    - - The expected value of the error term, conditional on the independent variable, is zero [E(εi|Xi) = 0]
        
        Survivorship, or sample selection, bias
        
        Simultaneity bias
        
        trading volume / volatility
        
        Omitted variables
        
        not excluded from the model
      - All (X, Y) observations are independent and identically distributed
        (i.i.d.)
      - Variance of X is positive
      - Variance of the errors is constant
      - It is unlikely that large outliers will be observed in the data
    - - Independent variables that fall into this category are called dummy variables and are often used to quantify the impact of qualitative variables.
  - - - The confidence interval of the slope coefficient = β ± (t × S).
    - - The p-value is the smallest level of significance for which the null
        hypothesis can be rejected
- - - - 1 = (ESS/TSS) + (RSS/TSS)
    - - R2 F = coefficient of determination of the full model
        
        R2 P = coefficient of determination of the partial model
        
        RSSP = residual sum of squares of the partial model
        
        q = number of restrictions imposed on the full model to arrive at the partial model
        
        RSSF = residual sum of squares of the full model
        
        n = number of observations
        
        H0: β1 = β 2 = β 3 = …. = β k = 0 versus HA: at least one β j ≠ 0
- - - - chi-squared test statistic
        
        Estimate the regression using standard ordinary least squares (OLS) procedures and estimate the residuals and square them (ε2 i )
        
        Use the squared estimated residuals in step 1 as the independent variable in a new regression with the original explanatory variables.
        
        Calculate the R2 for the model in step 2 and use it to calculate the chisquared test statistic: χ2 = nR2
        
        The chi-squared statistic is compared to its critical value with [k × (k + 3) / 2] degrees of freedom, where k = number of independent variables.
        
        If the calculated χ2 > critical χ2, we reject the null hypothesis of no conditional heteroskedasticity.
    - - perfect collinearity
        
        X3 = 2X1 + 3X2
      - Multicollinearity
        
        the condition when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other.
    - - Type II error
    - - ttests indicate that none of the individual coefficients is significantly different than zero, while the R2 is high
      - the F-test rejects the null hypothesis
      - high R2 but large p value
      - variance inflation factor (VIF)
        
        Xj = b0 + b1X1 + … + bj−1Xj−1 + bj+1Xj+1 + … + bkXk
        
        A VIF > 10 (i.e., R2 > 90%) should be considered problematic for that variable
    - - to omit one or more of the correlated independent variables
      - stepwise regression, which systematically remove variables from
        the regression until multicollinearity is minimized
    - - If the variance of the residuals is constant across all observations in the sample, the regression is said to be homoskedastic
    - - when the heteroskedasticity is not related to the level of the independent variables, which means that it doesn’t systematically increase or decrease with changes in the value of the independent variable(s).
    - - when the variance of the residuals is not the same across all observations in the sample.
    - - heteroskedasticity that is related to the
        level of (i.e., conditional on) the independent variable.
  - - - variance errors
      - General-to-specific model
        
        starting with the largest model and then successively dropping independent variables that have the smallest absolute t-statistic
      - m-fold cross-validation
        
        dividing the sample into m parts and then using (m-1) parts (known as the training set) to fit the model and the remaining part (known as the validation block) to use for out-of sample validation.
    - - Ideally, the residuals
        should be small in magnitude
      - not related to any of the explanatory
        variables
    - - Cook’s measure
        
        ˆy−j) = predicted value of y after dropping outlier observation j
        
        ˆyi = predicted value of y without dropping any observation
        
        k = number of independent variables
        
        S2 = squared residuals in the model with all observations
        
        Large values of Cook’s measure (i.e., Dj > 1) indicate that the dropped observation was indeed an outlier.
    - - assumptions underlying the linear regression need to be satisfied
      - the relationship between Y and X(s) should be linear
      - residual distribution should be identical
    - - the model’s adjusted R2 declines
      - Omitted variable bias
        
        the omitted variable is correlated with
        other independent variables in the model
        
        the omitted variable is a
        determinant of the dependent variable
      - selecting the appropriate
        explanatory variables to include in the model
- - - - The covariance between the current value of a time series and its value τ periods in the past is referred to as its autocovariance at l
        lag τ.
      - Its autocovariances for all τ make up its autocovariance function (ACF)
      - If a time series is covariance stationary, its autocovariance function is stable over time
      - the partial autocorrelation function
        
        makes up the correlations for all lags after controlling for the values between the lags
        
        Partial autocorrelations may be large only for a few lags
        
        those lags become prime candidates for inclusion in an autoregressive (AR) model
      - yt = d+ Φyt–1 + Φyt–1 +…+ Φyt–p + εt
    - - serially uncorrelated
        
        exhibit zero correlation among any of its lagged values
      - A model’s forecast errors should follow a white noise process
      - unconditional mean and variance
      - Wold’s theorem
        
        general linear process
        
        Wold’s theorem proposes a way to model the role of white noise and holds that a covariance stationary process can be modeled as an infinite distributed lag of a white noise process. Such a model would take the following form
      - WN = serially uncorrelated series is one that has a mean of zero and a constant variance
      - independent white noise
    - - Its mean must be stable over time
      - Its variance must be finite and stable over time
      - Its covariance structure must be stable over time.
      - Covariance structure refers to the covariances among the values of a time series at its various lags, which are a given number of periods apart at which we can observe its values.
        
        τ = 1 refers to a one-period lag, comparing
        each value of a time series to its preceding value
        
        τ = 4 we are
        comparing values four periods apart along the time series
  - - - Yule-Walker equation
    - - current random shock (εt)
        
        lagged unobservable shock (εt−1)
        
        μ = mean of the time series
        
        θ = coefficient for the lagged random shock
      - yt = μ + εt + θ1εt−1 + ….. + θqεt−q
    - - It shifts the time index back by one period.
      - To apply the lag operator over multiple periods,
        L^m x yt = yt−m.
      - When applied to a constant, the lag operator does not change the constant.
      - Forecasting models often take the form of a distributed lag that
        assigns weights to the past values of a time series.
      - Lag polynomials can be multiplied.
      - Assuming that the coefficients satisfy some conditions, the polynomial can be inverted.
    - - yt–1 = one-period lagged observation of the variable being estimated
      - εt = current random white noise shock (mean 0)
      - yt = the time series variable being estimated
      - Φ = coefficient for the lagged observation of the variable being estimated
      - d = intercept term
      - AR(1) series =
      - The long-run (or unconditional) mean reverting level of an AR(p) series
  - - - ARMA(3,1) model
    - - Box-Pierce (BP) statistic
        
        QBP = chi-squared statistic with h degrees of freedom
      - We want all residual autocorrelations to be zero
      - smaller samples (T ≤ 100),Ljung-Box (LB) statistic
    - - ARMA (p, q) × (ps, qs),
        where ps and qs denote the seasonal component
      - ps and qs
        are restricted to values of 1 or 0 (i.e., true or false)
    - - |θ| < 1
      - also decay gradually for essentially the same reasons
- - - - polynomial time trend
    - - ln (yt) = δ0 + δ1t + εt
  - - - a value of either one or zero to represent the season being on or off
    - - After adding a time trend,
        the model can then take the following form
        
        We can expand the forecasting model even further by allowing for other calendar effects. For example, if we suspect a time series exhibits holiday variations (HDV) and trading-day variations, we can account for them with additional dummy variables
        
        h-step-ahead point forecast
  - - - is not covariance stationary
    - - unit root does not revert to a mean
      - show spurious relationships with each other
      - Dickey-Fuller distribution
        
        If we use an ARMA model, its estimated parameters follow an asymmetric distribution that depends on the sample size and the presence of a time trend
      - e addressed by modeling the differences of the unit root series
    - - augmented Dickey-Fuller test
        
        null hypothesis is γ = 0
        
        alternative hypothesis is γ < 0
        
        γ > 0, significantly greater than zero
- - - - implied volatility
        
        Black-Scholes-Merton (BSM)
        
        inherent assumption that variance is constant over time
  - - - to test whether a distribution is normal
        
        s zero skewness
        
        kurtosis =3 (and excess kurtosis = 0)
      - JB is approximately χ22
    - - power law tails
      - Student’s t-distribution
  - - - measure concordant and discordant pairs and their
        relative frequency
      - concordant
        
        (Xi < Xj) and (Yi < Yj),
      - discordant
    - - every weighted average combination having a positive variance, requires that the variance of an average of components in a covariance matrix must be positive
      - equicorrelation
        
        sets all correlations equal to the same amount
- - - - accuracy
        
        the standard deviation
        
        the number of scenarios run
      - Antithetic Variates
        
        random values are constructed to generate negative correlation within the values used in the simulation
        
        using a complement set of the original set of random variables
        
        x = (x1 + x2) / 2
        
        Without using antithetic variates, the two sets of Monte Carlo replications are independent. Thus, the covariance will be zero and the variance of ¯x is simply reduced to the following
      - Control Variates
        
        reduce the variance of the approximation by adding values with a mean of zero that are correlated to the simulation
        
        replacing a variable x (under simulation) that has unknown properties, with a similar variable y that has known properties
    - - data generating process (DGP)
      - Calculate the statistic or function of interest, gi = g(xi)
      - Repeat steps 1 and 2 to produce N replications
      - Estimate the quantity of interest from {g1, g2, …, gb}
      - Evaluate the accuracy by computing the standard error. N should be increased until the required level of accuracy is achieved.
      - C1 = C0(1 + r).
  - - - rely on the key assumption that the present resembles the past
    - - simply drawn one-by-one from the observed data, and replaced
      - data is dependent across time
      - {x1, x2 … , x10}
        
        {x2, x7, x9}
        
        {x2, x5, x10}
        
        {x1, x1, x8}
    - - {x1, x2, x3}, {x2, x3, x4}, …,{x8, x9, x10}, {x9, x10, x1}, {x10, x1, x2}.
    - - to produce an irregular sequence of numerical values
      - pseudo-random number generators (PRNGs)
        
        these computer-generated numbers are not truly random
        
        they are actually generated from a formula
        
        an initial seed value must first be chosen
        
        The choice of seed value will determine the random number sequence that is generated
        
        benifits
        
        Repeatability
        
        Computing Clusters
        
        Using a common seed value allows us to use the same set of random numbers across multiple simulations
    - - Using the entire data set may not be reliable
      - Structural changes
    - - Specification of the DGP
        
        assumptions of model inputs or the data generating process are unrealistic
        
        option prices are typically fat-tailed
        
        but a model could erroneously draw option prices from a normal distribution
      - Computational cost