Please enable JavaScript.

Coggle requires JavaScript to display documents.

5 [IR] Statistical Language Model (Statistical Language Models (Summary of…

- - - - using the language model to assign a probability to a text
    - - observing relevant and estimating the probability of each word
      - Steps
        
        Obtain relevant text for a language model to be estimated
        
        Perform tokenization on the collect text
        
        do not remove stop words
        
        Counting
  - - - Grammar-based models
      - Multinomial /Multiple-Bernoulli
        
        Multinomial
        
        can account for multiple word occurences in the query
        
        Widely used in many NLP areas
        
        Possibility for integration with ASR, MT, NLP
        
        Multiple-Bernoulli
        
        May suit to IR(directly check on the presence od query terms)
        
        Provisions for explicit negation of query terms("A AND NOT B")
        
        increasingly less popular than multinomial method
- - - - Treat query as generated by mixture of topic and background
      - estimat relevance model from related documents (query expansion)
      - relevance feedback is easily incorporated
      - But: different document length, probabilities are not comparable
- - - - Intuitions are probabilistic rather than geometric