Please enable JavaScript.
Coggle requires JavaScript to display documents.
Reading 10: Sampling and Estimation (Sampling (Simple random sampling:…
Reading 10: Sampling and Estimation
Sampling
Simple random sampling
: each member of the population has the same probability of being included in the sample.
Sampling distribution
: is the distribution of sample statistics for repeated samples of size
n
Sampling Error
: the difference between a sample statistic and the true population parameter.
Stratified random sampling
: randomly selecting samples proportionally from subgroups that are based on the population's characteristics, to preserve the characteristics of the overall population.
Time-series & Cross-section data
Time-series data
: consists of observations at specific and equally spaced points in time.
Cross-sectional data
: consists of observations taken at a single point in time.
Standard Error of Sample Mean
Definition
: is the standard deviation of the distribution of sample means
Calculation
:
When Population Standard Deviation, \(\sigma \), is
known
\(\sigma_{\bar{X}}=\frac{\sigma}{\sqrt{n}}\)
When Population Standard Deviation, \(\sigma \), is
unknown
\(S_{\bar{X}}=\frac{S}{\sqrt{n}}\)
Central limit theorem
For a population with mean \(\mu\) and a finite variance \(\sigma^{2} \) , the sampling distribution of sample mean of all possible sample of size n \(\geq \) 30 will be approximately normally distributed with a mean = \(\mu\) and variance =\(\frac{\sigma^{2}}{n} \)
As the size of random sample gets large, the distribution of sample means approaches normal distribution
Estimator
Desired properties
Unbiasedness
: expected value equal to parameter
Efficiency
: sampling distribution has smallest variance of all unbiased estimators
Consistency
: the larger the sample, the better the estimator
Point Estimate vs Confidence Interval
Point Estimator
: single value estimates of population parameter
Confidence interval
: a range in which actual value of parameter will lie, given a specific probability
Confidence interval = Point Estimate \(\pm \) (reliability factor X standard error)
The greater the variability of the random variable, the wider the confidence interval.
The larger the sample, the narrower the confidence interval
Reliability Factor
: a number that depends on the sampling distribution of point estimate, and the probability that the point estimate falls in the confidence interval.
T-distribution
Properties
Symmetrical (bell shaped)
Fatter tails than a normal distribution
Defined by the
degree of freedom
(df = n-1).
As df increases, T-distribution comes closer to Normal distribution.
Lower DF means
Greater probability of extreme Outcomes
Wider confidence intervals
Calculating Confidence interval
Confidence Interval for Mean of Normal Distritbution
(Produce narrower interval than those of a random variable)
When Variance ( \(\sigma ^{2}\) ) is unknown
If n\( \geq\) 30, Z statistic can also be used, but t-statistic is prefered.
use t-statistic
\( \bar{X}\pm Tvalue({t_\frac{\alpha }{2}};df)\times \frac{S}{\sqrt{n}} \)
When Variance ( \(\sigma ^{2}\) ) is known
use Z-statistic
\(\bar{X}\pm Zvalue({z_\frac{\alpha }{2}})\times \frac{\sigma}{\sqrt{n}} \)
Of a random variable
Confidence interval = \(Point Estimate \pm(Reliability Factor) \times StandardDeviation \)
Issues when selecting sample
Increasing sample size
Pros
Improve parameter estimates
Cons
Cost more
Mistakenly add data from different distribution
Bias results
Data-mining
(significant relationships that occur by chance)
Look-ahead
(basing a test at a point in time when data is not available)
Sample-selection
(selection is non random)
survivorship
(use only surviving mutual fund)
time-period
(relation does not hold over other time periods)
Sampling error:
When the sample does not represent every element of population