Please enable JavaScript.
Coggle requires JavaScript to display documents.
Information Retrieval
2 (Retrieval
Function (Vector Space
Model…
Information Retrieval
2
Indexing
-
-
Process
-
-
Tokenization
Issue
Apostrophes can be a part of a word
-
Capitalized words can have different meaning
-
-
Inverted file
- Store Inverted index with
counts/position
Retrieval
Function
-
Vector Space
Model
-
-
Term
Frequency
-
\( tf_{t,d} = \frac{f_{t,d}}{\sum^{N}_{j=1}f_{j,d}} \)
, where
\( tf_{t,d} \) is total term t in document d
\( f_{j,d} \) is total term j in document d
Boolean Retrieval
Advantages
- Result predictable, easy to explain
- Efficient processing, many documents can be eliminated
Disadvantages
- Simple query doesn't work well
- Complex query are difficult
Similarity
Measure
\(cos(D_d, Q) = \frac{\sum^{T}_{t=1}D_{d,t}\cdot Q_t}{\sqrt{\sum^{T}_{t=1}(D_{d,t})^2 \sum^{T}_{t=1}(Q_{t})^2}} \)
Evaluation
Evaluate function, preprocessing steps
Performance Metric
Precision
- \( P = \frac{TP}{TP + TN}\)
Recall
- \(R = \frac{TP}{TP + FN} \)