Please enable JavaScript.
Coggle requires JavaScript to display documents.
Basic Statistics University of Amsterdam (Fourth week (Normal…
Basic Statistics
University of Amsterdam
First week
Levels of Measurement
Categorical
Nominal
Ordinal
Quantitative
Interval
Ratio
Measures of central tendency
Mode
Mean
Median
#
Measures of variability
Range (Xn - Xo)
Interquartile range
Plot Box
Bottom of the box - Q1
Roof of the box - Q3
Middle - Q2
Low whisker = Q1 - 1,5 IQR
Q2 = Median
Q1 and Q2 -- Medians for two intervals left
IQR = Q3 - Q1
#
Standard deviation Sigma
Z-score -- # of SD from the mean
#
Variance
#
Second week
Correlation and regression
Visual ways
Scatterplots
Crosstabs
Contingency-table
Pearson's r
Direction and strength of linear correlation with one number
(sum Zx-Zy)/(n-1)
Only for linear correlation
Regression
Regression line
#
Line with the smallest sum of squared residuals
#
y = a + bx
b = r * (SDy/SDx)
#
a = (mean y) - b * (mean x)
Position residuals (length from the dot to the regression line)
Positive
Negative
r^2 = how smaller prediction error by using regression than when you use the mean
#
r^2 = the amount of variance in your dependant variable (y) that is explained by your independent variable
Third week
Probability
Sample space (all of possible outcomes)
Event (Subset of the sample space)
Disjoint/mutually exclusive
2 or more events in a sample space that do not share any outcomes
Collectively exhaustive
Multiple events which together fill-up the complete sample space
Experiment
Trial
Random Variable
#
Joint Probabilities
Probabilities in the "body" of crosstabs
Probabilities for the intersection of certain outcomes of the variables
Marginal Probabilities
"Total" values of crosstabs
Probabilities for an outcome of each individual variable
Conditional Probability
The probability of an event, given that another event occurs P(A|B) = P(A and B) / P(B)
Equation joint probability
P(A and B) = P(A|B) * P(B)
Bayes Law
P(A|B) = P(A) * P(B|A) / P(B)
In case of Unions, we should minus intersection
Fourth week
Probability Distributions
Random Variable
Discrete
Countable number of distinct values (1,2,3)
Probability mass function
Continuous
Infinite number of possible values (height)
Probability density function
Cumulative Probability
Mean of a random variable
Probability * Waiting Time (in the example in 4.03)
If adjusting "a", so new mean = a+b(nu)
Variance of a random variable
Continious
Formula
Discrete
Formula
var (X+Y) = Var(X)+Var(Y)+2cov(X,Y)
Normal Distribution
Parameters: Sigma and Mu
#
#
Formula
Square is always = 1
3 Sigma
1 Sigma for each side = 0,68
2 sigma for each side 0,95
3 sigma for each side = 0,997
Z distribution
#
Use the table to score it
If x = Mu +z*Sigma,
z = (x-Mu)/Sigma
Important to note, that in some cases answers subsume 3 parts, based on square of the graph
z = (x - mean x)/s
Binomial Distribution
Only successes or failures (boolean)
Requirements
Probability of success does not change
Independence between trials
Formula
Fifth week
Sample and population
Descriptive Statistics
Univariate analysis
Modes
Regression
Means
Standard deviations
Pearson's r
Bivariate analysis
Types of samples
Simple random samples
:red_cross:
Undercoverage
Sampling bias (not every person equally likely to get into sample - convenience sample)
Nonresponse bias
Response bias
Random multi-stage cluster sample (e.g. educational programms of students)
If simple random sample is too expensive
If you don't have a good sampling frame
Stratified random sample (divide to stratas, e.g. universities in London, and from every strata random sample)
Sampling distribution
Sample attributes are equal to population attributes
If # of samples goes to infinity, distribution will be perfectly bell-shaped
sampling mean = sample mean
Sampling sigma = Population sigma / root from n
Sixth week
Confidence Intervals
How much sample mean equals population mean?
Statistical Inference
Interval estimate (mean of population lies between the interval)
Confidence level (probability that interval contains population value)
In Population distribution
In Sample distribution
In Sampling distribution
Sampling Nu = Sample Nu
Margin error (tells how accurately sampling mean is likely to estimate the population mean)
1 more item...
Sampling Sigma = Population Sigma / sq.root of n
Point estimate (mean of sample = population mean)
Seventh week
Significance tests
Hypotheses and significance tests
Hypotheses -- expectations about the parameters researchers are interested in
Expectation about population
Significance test
Null Hypotheses
H0
Less than 3% of all americans have scuba diving experience
Ha: p<0.03
H0: p = 0.03
The parameter takes a specific value
Will be rejected of the data in sample suggest it is unlikely
Assumptions that it is always true
Alternative Hypotheses
Ha
the Parameter falls in the alternative range of values
Significance test
We assume that the population value has a certain value the sample we collected from this population
Sampling distribution. We can determine what the sampling distribution of the sample proportion looks like
How many standard errors between sample value and population value?
t-statistics
Formula
In order to reject or accept the H, we should look at the p-value
p-value comes from the z-table within the cross of t-statistic value
Then we should choose our significance level
If p-value is less than significance level, we reject H0
Rejection region (part of the ND graph) that lies beyond Confidence Level
#
One-tail t-test
One part of the graph
Two-tail t-test
Both parts of the graph
Step-by-step
How many of hours experience?
n = 500
Proportion on mean?
Formulate hypotheses
Check Assumptions
Determine significance level (a)
Compute test statistic
1 more item...
p (>35 hours) = 0.57
Proportion
H0: p = p0
Ha: p != p0
p>p0
H0: p = 0.5
Ha: p>0.5
a = 0.05
Formula
3.13
1 more item...
mean = 35.5
SD = 8
Mean
H0: mean = mean0
Ha: mean != mean0
mean>mean0
H0: mean = 35
Ha: mean>35
a = 0.05
Formula
1.40
1 more item...