Week 8: Statistics and hypothesis testing
Statistics
Conclusive research design needs data collection.
We use statistics to answer RQs (fins associations betweens components and outcomes and differences across groups)
Managers want to know if data patterns exist
Sampling and inference
All RQs are about population charavteristics, however, all the data collected are sample characteristics
Inference: Making statements about a population on the basis of a sample
Sample vs Population
Sample characteristics: what you calculate from the sample
Population characteristics: The true values. They are what your RQs are really asking. Unfortunately these values are unknown.
Therefore, if relationship exists between component X and Y on average in our sample, what is the probability that the same relationship will hold in target population?
The challenge in using sample characteristics to make inference about characteristics
Sample characteristics are not really gonna equal population charcteristics
Variability in sample characeristics
We need to account for this variability in statistics calculated from the sample
- First step is to quantify the variance in sample characteristics in your data (through metrics such as variance and stdev)
- Because these are random variables, the outcome of the test involves probabilities
To what degree of certainty can we say there is a difference in average customer satisfaction between male and female cutomers for HP consumers in the actual population
Managerial implication
Hypothesis testing
The idea behind hypothesis testing is to rest RQs
RQ's need to be answered no only through the analysis of the relationships within the sample. Hypothesis testing needs to be conducted in order to conclude if the results would hold on the population
Process of hypothesis testing
Definition: the formal process to use the statistical properties of the data to evaluate your hypothesis
- Problem definition
- Null and alternate hypothesis
- Choose relevant statistical test
- Calculate P Value
- Compare to a=0.05 (p value over a = cannot reject null, p value under a = reject null)
- Problem definition
Is it a relational or comparative RQ
- State null and alternative hypo
Null hypo (Ho): No difference or association between variables. If null is rejected, no changes will be made
Alternative (h1): there is a difference or association, accepted if null is wrong. Accepting means that there is a relationship that should be reported to manager
- Choose relevant statistical test
What is your type of research question
Test of Association (relational RQ)
Test of differences (comparative RQ)
Types of data (nominal, ordinal, interval, ratio-scaled)
Chi-squared goodness of fit test (proportions) (nominal / ordinal data)
Chi squared test of association (cross tabulation) (nominal / ordinal data)
T-tests (one-sample, independent samples, paired samples (interval /ratio scaled data)
Z tests (interval /ratio scaled data)
Correlation , anova, regression (interval /ratio scaled data)
T Tests (metric data)
one sample
paired samples
Two independent samples
Anova (metric data)
More than two samples
Chi squared test of proportions (categorical data)