Please enable JavaScript.

Coggle requires JavaScript to display documents.

Statistics - Intro to Data Mining - Coggle Diagram

- - - - given
        
        discrete distribution with i = 1,...,M
        
        Ai = ^ number of events in bin i
        
        ai: expected number (continuous, coming from model/hypothesis)
      - Case 1: Comparison of the a dataset with a given distribution
        
        tools
        
        variance: like Bernoulli experiments np(1-p)
        
        bin i will be realized with probability p = (ai)/sum(ai)
        
        analyzing results
        
        for large Chi2 -> large deviations between the distributions
        
        Pearson: for large M, the denomianator will be normal distributed
        
        errors ai - Ai are normal distributed
        
        probability that an observed value X2^ is larger (X2^ > X2c) just by accident: Q(X2c, v) with strichobenundobenrechts(x)
      - Case 2: Comparison of two datasets
        
        tools
        
        X² = M mal ((Ai-Bi)²/Ai+Bi)
        
        Variance: o²(Ai-Bi) = o²(Ai) + o²(Bi) ungefähr Ai + Bi
    - - given:
        
        two samples {xiA} i = 1,..., NA {xiB} i = 1,..., NB
        
        Optional: comparison of {xiA] with a hypothetical distribution P(x)
      - goal: calculation of the empiric cumulative distribution FA(x) = 1/NA sum( x -xiA ). 1 if x>=0 0 else
      - alternative statistics: P(D>=Dcrit)
- - - - used when: we assume that mü A > mü B (so positive value of t)
      - the area below the student-t density function from tc to infinite is the probability that a value t > tc occurs randomly
    - - used when: | mü A - mü B | ungleich 0
      - the area below the student-t density function from tc to infinite is the probability that a value t > tc occurs randomly
- - - - needed: X², N, J, I
      - results
        
        V = 1 <-- perfect Association (i. e. one variable determines the other)
        
        V = 0 <-- no association at all
        
        If I = J = 2, V is called 'durchstrich o - statistic''
    - - results
        
        value C = 1 will never be reached
      - is only useful to compare the strengths of association of tables with equal (I, J)