Please enable JavaScript.

Coggle requires JavaScript to display documents.

Inference 6: Construction of a hypothesis test (Error probabilities and…

- - - - Usually in form of parametric tests
        
        Statements about population value of parameters
  - - - Generally there is asymmetry: H0 is simple and H1 composite: We don't necessarily accept H0 when not rejected - 'absence of evidence is not evidence of absence'
        
        Null hypothesis is not only hypothesis compatible with sample values when not rejected
- - - - i.e. 1-prob of wrongly accepting H0 when H1 is true
      - Prob(Type II error) = Beta, so power = 1-Beta
  - - - Alpha
  - - - Test A: Reject H0 if 5 successes, i.e. observe X=5.
        
        Power: Prob(X=5|H1 true) = (2/3)^5=0.132
        
        Size: Prob(X=5|H0 true) = (1/2)^5=0.031
      - Test B: Reject H0 if observed value of X is 3,4 or 5.
        
        Power: Prob(X=3,4 or 5|H1 is true) = sum[(2/3)i(1/3)5-i(5Ci)~0.79
        
        Size: Prob(X=3, 4 or 5|H0 is true)= 0.5
      - Comparing tests: more likely to wrongly reject H0 using test B, but also more likely to correctly reject H0 in test B
        
        Generally: as we increase power to detect H1, also increase size
        
        In practice, type 1 error probability is fixed, generally at alpha = 0.05 and then test chosen to make power as high as possible given fixed size.
- - - - likelihood ratio value only depends on particular statistic: best test statistic
        
        If likelihood ratio is small then so is log-likelihood ratio (loglikeH0-loglikeH1)
        
        N.B. Not the same as previously mentioned likelihood ratio - this is just ratio of 2 likelihoods - not likelihood at maximum
  - - - Let H0: µ=5 and
        H1: µ=10
        
        From 4.2: see e.g. 6.3.1. on p6.4 notes
        
        We only need to see quantities that vary with the data, so ignore fixed constants(i.e. remove sigma^2 part)
        
        large values of Σxi or any constant multiple of this will make likelihood ratio smaller.
        
        It is convenient to use constant multiple 1/nΣxi, the sample mean.
        
        Thus, best test rejects H0 for large values of sample mean.
        
        2 more items...
- - - - Not just reject/not reject - we can use a continuous measure to quantify the evidence.
        
        larger values of t are more extreme with reference to H0
        
        We can use sampling data to calculate probability of observing data at least as extreme as that observed:
        p=Prob(T≥t|H0)
        
        'one-sided p-value'
        
        The smaller the p-value the more evidence provided by the data against the null hypothesis
        
        We can use the p-value to reject or not reject H0:
        p<alpha is the same as t>c
        
        1 more item...
- - - - e.g. Normal distribution
        
        Previously, with simple hypotheses, large values of sample mean = small values of likelihood ratio, i.e. most powerful test rejects H0 for large values of sample mean, any greater than 5
        
        Any values of sample mean>5 would have resulted in less support for H0, i.e. more support for H1.
        
        Therefore, test that rejects H0 for large values of sample mean is uniformly most powerful test
        
        If no uniformly most powerful test then use scientific knowledge of problem to identify particular theta>theta0 and choose test most powerful for that particular value
    - - No uniformly most powerful test can exist
        
        Testing H1: μ>5 means that most powerful test rejects H0 for large values of sample mean but testing H1: μ<5 most powerful test rejects H0 for SMALL values of sample mean - i.e. most powerful test is different for each context.
- - - - Formally 2x one-sided tests:
        i) H1: theta>theta0
        ii)H1: theta<theta0
        
        first approach: double observed one-sided p-value
        
        observe 'tobs'. A result as a unfavourable for H0 can occur either by:
        i)T≥tobs, where Prob(T≥tobs|H0)=p~
        or
        ii)T≤tobs, where we choose t' such that Prob(T≤t'H0)=p~
        
        Therefore, two-sided p-value is total probability of observicing a result at least as unfavourable to H0 as tobs, p=2p~. In effect we reject for large values of |T| (absolute T value)
        
        N.B. t' not chosen to be equal distance from centre of distribution (as not the case if not symmetrical)but equal in probability terms, i.e. same size unfavourable tail as obtained with tobs.
        
        N.B. in discrete observations: there may not be an opposite tail so defined. Therefore, observed tail is only such unfavourable region and p=p~. Also, there may be problems getting 'exact' p-values (see AT3), nominated significance level may not be possible exactly and need to choose as large as possible without exceeding nominated level.
        
        second approach: (used in AT3), construct one-sided p-value, obtaining p~, then identify in the other tail a probability density equal to that of the observed result and add both tails together.
        
        i.e. find t'' in the other tail such that Probfunction(T=t''|H0)=Probfunction(T=tobs|H0) and add to p~ the tail Prob(T≤t''|H0)
        
        The two approached give similar results unless asymmetric distributions. Generally give v similar p-values but p-value from first will not be less than obtained from the second
- - - - See fig 6.2 on p6.9 for expected distributions of test statistic under H0 and H1 (2 diff distributions with test statistic in same place)
        
        Fixing Type I error at alpha, power is then affected by
        i)size of diff to be detected or
        ii)SE of test statistic which affects dispersion around locations (SE affected by population SD and size of sample)
  - - - 1) When we know a priori that a result in one direction must be due to chance - no possible scientific explanation for a result in that direction
        
        If justification is incorrect then this would increase chance of Type I error (false positive)
      - 2) when consequences of Type I error not as bad as Type II error, e.g. safety data - increasing sensitivity to detect unsafe drugs (at expense of false alarms).
- - - - Obtain sampling distribution for that test statistic under null hypothesis. Sometimes this is easier not using the most powerful test-statistic (i.e. going against Neyman-Pearson lemma)
        
        Define rejection region which gives pre-specified type I error probability (e.g. alpha=0.05)
        
        Calculate value of test statistic for observed sample of data
        
        If observed value of test statistic is in rejection region then conclude data reject H0 in favour of H1. If not in rejection region conclude that data fails to reject H0.
        
        Report p-value quantifying evidence against H0, assessing weight of evidence rather than making a decision to reject or not reject H0.
        
        N.B if test cannot be defined in terms of paramete, sampling distribution may be obtained by considering probability distribution of statistic across sample space of observed results (typical in non-parametric test)