Please enable JavaScript.
Coggle requires JavaScript to display documents.
Statistics (Binomial Probability Distribution (Rules (1) must have a fixed…
Statistics
-
-
-
-
The goal of statistics is to make an inference about a population based on the data collected from a sample
But before an inferences can be made, I must first describe the sample. This is the goal of descriptive statistics. Descriptive statistics, quantitatively describes the main properties of a dataset
If I want to describe my sample using only one number, what number should I choose to do this?
How about finding the "center" of the data, the middle or most common element in my data set. What is occurring most often?
-
-
Box plots
-
The more variability in the data, the larger the inter-quartile range
-
Sampling Distributions
-
Sampling distribution of [statistic] such as (mean, variance, proportion etc.) is:
Assume you take all the samples (of size n) possible from population of size N, and find the desired statistic for each sample and organize it in a table or graph, this is a Sampling Distribution
-
Central Limit Theorem
Requirements
1) n > 30, no matter what the distribution shape looks like for the population, whether it be uniform, skewed left, u shaped, or skewed right etc., the sampling distribution of your statistic of choice will definitely be normally distributed
example: if you take a sample size of greater than 30, calculate the mean for each sample, put that collection of sample means into a table and the distribution of those means will be normal
When taking the average of the sample means taken from all the possible samples in a population of size N, it equals the population mean. mean of sample means = mu
taking the standard deviation of all possible sample standard deviations does NOT equal the population sigma, though! it equals this: st.dev of sample st. dev = sigma/sqr(n) It will be and underestimate because as the sample size of n gets bigger the st. dev of the sample st. dev will get smaller. Some of the data gets lost. prof leonard explans this in this section. this formula is also called the Standard Error
note if the sample size is less than 30, the formulas written here are always true and correct, however the assumption that the sampling distribution is normally distributed is predicated on n being greater than 30!!! If the sample size is less than 30 and you don't know what the underlying population distribution looks like then you can't assume that the sampling distribution is normal
2) n ≤ 30 AND the population is normally distributed, the sampling distribution of sample means will also be normally distributed
3) n ≤ 30, you no nothing about the shape of the distribution of the population, can't do anything yet
Finding a Z score for a sample of data values ( now we are working with not just one individual data point as before, but now a group of individuals from the population; a sample)
-
Confidence Intervals
Any given sample will not perfectly reflect the population because there is variability in samples so instead of dealing with just one statistic number like a mean or a proportion, we can construct a interval where we can state we have some confidence that the true statistic is somewhere in that range/interval.
-
-
-