Week 8: Statistics and hypothesis testing

Statistics

Conclusive research design needs data collection.

We use statistics to answer RQs (fins associations betweens components and outcomes and differences across groups)

Managers want to know if data patterns exist

Sampling and inference

All RQs are about population charavteristics, however, all the data collected are sample characteristics

Inference: Making statements about a population on the basis of a sample

Sample vs Population

Sample characteristics: what you calculate from the sample

Population characteristics: The true values. They are what your RQs are really asking. Unfortunately these values are unknown.

Therefore, if relationship exists between component X and Y on average in our sample, what is the probability that the same relationship will hold in target population?

The challenge in using sample characteristics to make inference about characteristics

Sample characteristics are not really gonna equal population charcteristics

Variability in sample characeristics

We need to account for this variability in statistics calculated from the sample

  1. First step is to quantify the variance in sample characteristics in your data (through metrics such as variance and stdev)
  1. Because these are random variables, the outcome of the test involves probabilities

To what degree of certainty can we say there is a difference in average customer satisfaction between male and female cutomers for HP consumers in the actual population

Managerial implication

Hypothesis testing

The idea behind hypothesis testing is to rest RQs

RQ's need to be answered no only through the analysis of the relationships within the sample. Hypothesis testing needs to be conducted in order to conclude if the results would hold on the population

Process of hypothesis testing

Definition: the formal process to use the statistical properties of the data to evaluate your hypothesis

  1. Problem definition
  1. Null and alternate hypothesis
  1. Choose relevant statistical test
  1. Calculate P Value
  1. Compare to a=0.05 (p value over a = cannot reject null, p value under a = reject null)
  1. Problem definition

Is it a relational or comparative RQ

  1. State null and alternative hypo

Null hypo (Ho): No difference or association between variables. If null is rejected, no changes will be made

Alternative (h1): there is a difference or association, accepted if null is wrong. Accepting means that there is a relationship that should be reported to manager

  1. Choose relevant statistical test

What is your type of research question

Test of Association (relational RQ)

Test of differences (comparative RQ)

Types of data (nominal, ordinal, interval, ratio-scaled)

Chi-squared goodness of fit test (proportions) (nominal / ordinal data)

Chi squared test of association (cross tabulation) (nominal / ordinal data)

T-tests (one-sample, independent samples, paired samples (interval /ratio scaled data)

Z tests (interval /ratio scaled data)

Correlation , anova, regression (interval /ratio scaled data)

T Tests (metric data)

one sample

paired samples

Two independent samples

Anova (metric data)

More than two samples

Chi squared test of proportions (categorical data)