Please enable JavaScript.

Coggle requires JavaScript to display documents.

Information Theory - Coggle Diagram

- - - - "=>"
        Typical set:
        
        prob of error -> 0
        
        bounded rate \(\leq H+\epsilon + 2/n\)
      - "<="
        Any vanishing prob of error has lowed bounded rate eventually
        \(\geq H - \epsilon \)
  - - - "=>"
        Decision region:
        
        vanishing type 2 error
        
        bounded type 1 error \(e_1^{(n)} \leq 2^{-n(D-\epsilon)}\)
      - "<="
        Vanishing type 2 error decision region has lower bounded type 1 error eventually \(\leq 2^{-n(D+\epsilon + 1/n)}\)
    - - Optimal hypothesis testing for finite samples using likelihood ratio decision region
- - - - XYZ form a Markov chain
- - - - n-types
        
        \(|\mathcal{P}_n| \leq (n+1)^m \)
        
        \(\mathcal{P}_n = \{ P\in\mathcal{P} : nP(a)\in\mathbb{Z},\ \forall a \in A\}\)
      - Type
        
        Empirical distribution \(\hat{P}_n = \frac{1}{n}\sum_{i=1}^n \mathbf{1} \{X_i = \cdot\}\)
      - Probability simplex
        
        \( \mathcal{P} = \{ P\in [0,1]^m : \sum_{a \in A} P(a) = 1\} \)
    - - Proposition 9.1: Probability of PMF \(Q^n\) on finalph to have type \(\hat{P}_n\)
        
        \(Q^n(x_1^n) = 2^{-n[H(\hat{P}_n) + D(\hat{P}_n || Q)]}\)
      - Type class:
        Under IID, strings of same type have the same probability
        
        \(T(P) = \{ x_1^n \in A^n : \hat{P}_n = P \} \)
        
        \( |T(P)| = \frac{n!}{\prod_{i=1}^m (nP(a_i))!}\)
        
        Proposition 9.3:
        \( (n+1)^{-m}\ 2^{nH(P)} \leq |T(P)| \leq 2^{nH(P)} \)
        
        Auxillary results
        
        \(\frac{k!}{l!} \geq l^{k-l}\)
      - Proposition 9.1 and 9.3 provides a finite-n version of AEP.
      - Proposition 9.5: Probability of Type Class
        Type \(P\in\mathcal{P}_n \), pmf Q on finalph A size m
        
        \( (n+1)^{-m}\ 2^{nD(P||Q)} \leq Q^n(T(P)) \leq 2^{nD(P||Q)} \)
  - - - Empirical averages
      - 1000 dice throws, average of 5 (>3.5, thus a rare event). What are the proportion of 6s? \(P(\#(6) | \bar{X}=5)\)
    - - \(E\subset \mathcal{P}\) closed and convex
        \(Q\notin E\)
        \(P^*\) achieves minimum of Sanov's: \( D(P^*||Q) = \inf_{P\in E} D(P||Q)\)
        Then, \(D(P||Q)\geq D(P||P^*)+D(P^*||Q)\ \forall P\in E\)
- - - - \(D(P||Q)\geq 0\)
        equality iff P=Q
      - \(0\leq H(X) \leq \log|A|\)
        
        =0 iff X constant
        
        log|A| iff X uniformly distributed on A
      - Conditoning Reduction
        \(0\leq H(Y|X)\leq H(Y)\leq \log|B|\)
        
        \(H(Y)-H(Y|X)\)
        \(=H(X)+H(Y)-H(X,Y)\)
        \(=D(P_{XY}||P_XP_Y)\)
      - Data processing inequality
        
        H(f(X)) < H(X)
        equality iff f is bijective
        
        H(f(X)|X)=0
        
        \(D(P_{f(X)}||Q_{f(X)})\leq D(P_X||Q_X)\)
      - Fano's Inequality
        \(H(X|Y)\leq h(P_e) + P_e \log(|A|-1)\)
        
        (X,Y) on AxB
        \(P_e = P(f(Y)\neq X)\)
      - Pinsker's Inequality
        
        convergence in D(P||Q) implies convergence in TV
        \( \Vert P-Q \Vert_{TV}^2 \leq (2 \ln 2)D(P||Q) \)
      - Convexity of relative entropy
      - Concavity of entropy
    - - Independent X,Y
        H(X,Y)=H(X)+H(Y)
    - - H(Y|X)=H(X,Y)-H(X)
    - - Subaddivity of Entropy
        \(\leq \sum H(X_i)\)
        equality iff Xs independent
    - - Chain Rule
      - Conditioning reduction
      - Subadditivity
    - - \(\sum_{x\in A} |P(x)-Q(x)|\)
      - \( 2 \sum_{x:P>Q} P(x)-Q(x) \)
        \( 2 \sum_{x:Q>P} Q(x)-P(x) \)
- - - - binomial-to-Poisson convergence
      - \(D_e(Bern(p)||Po(p)) \leq p^2 \)
- - - - Stationary, ergodic source:
        \( \mathbb{E}[f(X_{-\infty}^\infty )]<\infty \) =>
        \( \lim \frac{1}{n} \sum_{i=1}^n f(T^i X_{-\infty}^\infty) \to \mathbb{E}[f(X_{-\infty}^\infty)]\)