Please enable JavaScript.
Coggle requires JavaScript to display documents.
Information Retrieval 2 (Boolean Model (Issues (Information is translated…
Information Retrieval 2
Boolean Model
a model for IR in which we can pose any query in the form of an expression of terms (AND, OR, NOT)
-
-
-
-
Web Crawling
Required Features
Robustness - crawlers must not get caught in spider traps, that mislead crawlers into fetching an infinite number of pages for a particular domain
Politeness - web servers have policies regulating the rate at which a crawler can visit them - must be respected
Desired Features
-
Scalable - architecture should permit scaling up the crawl rate by adding extra machines and bandwidth
Performance & Efficiency - should make efficient use of various system resources including processor, storage, and network bandwidth
-
-
Inverted Index
-
for each term, we maintain a list that records which documents the terms occur in (posting => postings list => postings)
Steps
-
-
tokenize the text, remove stopwords