Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 6: Foundation of Statistical Inference (Sampling (Concepts…
Chapter 6: Foundation of Statistical Inference
Sampling
Our hypotheses are predictions about the effect of IV on DV in a population of cases
But we can only usually examine relationships in some sample of these cases
so back to the kinda variable thing
All countries from 1970-1999
A group of registered voters
Cases are variables kinda
Country years
Registered Voters
Concepts
Population: The entire universe of subjects that researcher wants to describe
Population Parameter: Actual Value
Number of people who voted for John Mccain
sampling: gathering a number of observations from a population
#
Sample: Number of cases or observations drawn from a population
Sample statistic: Estimate of population parameter based on sample drawn from population
Number of people who took an exit poll in Idaho who said they voted for John Mccain
Why do we sample?
Often not possible to identify all population members
Collecting info on each pop member is
Time Consuming
Expensive
Logistically difficult
Sample data can be better
Quick to capture mobile populations
Collect without too much awareness
inference: making assumption about population from sample
#
Inferential Statistics
Set of procedures
Deciding how closely a relationship we observe in the sample
Corresponds
To the unobserved Relationship in the population from which the sample was drawn
Real world stuff
Movies
Moneyball
Maximize number of runs, not whether someone looks like a complete package
Criminal Justice
Figure out what predicts violent crimes and repeat offenders
Use those predictors when sentencing
Literary Digest
1936 Presidential Race: FDR v Landon
2.4 million ballets returned, had landon winning by 57%
FDR wins by nearly 60% !
Due to sampling errors
Wrong sampling frame
Systematically picking people who were gonna vote for Landon
1 more item...
Response Bias
People who fill out cards may have opinions that are different than others who chose not to
Sampling
Key terms
Random Sample: Every member of the pop has equal chance of being chosen for sample
Without random sample there is additional error into the sample (isnt reflective of population)
Always random sampling error, want to minimize other errors
How to ensure that everyone has an equal chance of ending up n the sample and of participating
Error decreases as size increases
Variance / √n
N = Size of Sample
The bigger the sample the smaller the error
Variance in population characteristic being measured
The bigger the variation the bigger the error
Variance: Dispersion of cases across the values of the variable
Standard Deviation: The average amount that each point deviates from the mean
1) Calculate each value's deviation from the mean
2) Square each deviation
3) Sum the squared deviations
4) Calculate the average of the sum of the square deviations = variance
5) Take the square root of the variance = standard deviation
Standard Error = Standard Deviation / √n
Other Approaches
Quasi-random sampling/Cluster Sampling
Picking specific localities, then randomly selecting individuals in relation to them
Purposive Sampling
Over representing groups to make comparisons
College student vs Adults
Only for comparing groups
Sampling frame: Method for defining the population the researcher wants to study
Error: some individuals are more likely to be measured than others (wrong sampling frame)
Selection bias: Some members of the population are more likely to be included in the sample than others
Response bias: Some members of the population are more likely to respond than others
Some individuals are more likely to be measured than others
Analyzing
Central Limit Theorem
If you take enough samples, the means of the categories will eventually follow a normal distrubution
So roll some dice like 10 times then find the average of the dice (y), this will be one stage (x)
Do like 50 stages
If you plot your points it follows a normal distribution thats so wild
#
Allows us to calculate the likelihood that a given sample deviates from the true population mean
Basis of inferential statistics
Tools
Z-score [the stuff on the x axis] = ( A value's deviation from the mean ) / (Standard Deviation)
#
Z-Table
Z-Table :question:
95% Confidence Interval: The interval within 95% of all possible samples estimates will fall by chance
Conclude: 95% of all possible random samples yields sample means within these bounds
Lower bound = (Sample Mean) - 1.96 * (Standard Error)
#
Upper Bound = (Sample Mean) + 1.96 * Standard Error
T-Distribution
A Z-distro for small samples