POS Formal
Grammar
Classes
Open
- New ones can be created all the time
- Noun, verbs, adjective, adverb
Close
- Relative fixed membership
- Prepositions (new prepositions are rarely coined)
- Particles (up,down)
- Determiner (the, a, an)
- Pronoun (he, she)
- Conjunction
- Auxiliary verb (can, should, would)
- Numerals
POS tagging
Process of assigning lexical class to each word
Penn Treebank Tagset
- English has 45-tag
POS = Disambiguation task
Word = Ambigious
POS
Tagging Method
Rule-Based
Stochastic/
Probabilistic
Assign Potential POS Tags
Assign lists of potential POS tags
to each word based on dictionary
Apply rules to eliminate tag
Apply hand-written constraints until
each word has only one possible POS
Eg. DT cannot immediately precede a Verb
The/DT run/VBP
Bayes Rules
ˆtn1
=P(tn1|wn1)
=argmaxˆtn1P(wn1|tn1)P(tn1)P(wn1)
Simplified
\(= argmax_{\widehat {t}^n_1} P(w_1^n | t_1^n) P(t_1^n) \)
\( = \prod_1^n P(w_i | t_i) P(t_1 | t_{i-1}) \)
Constituency
Group of words behave like as a single unit/constituent
Evidence
Appear in similar syntactic environments
- Noun Phrases can occur before verbs
Preposed or postposed constructions
- on September seventeenth
Context Free Grammar
Formal system for modeling constituent structure
Rules/
Productions
Terminal
- Symbol that correspond to words in language
Non-Terminal
- Symbol that express abstractions
G = (N, Σ, R, S)
- N = Set of non-terminal symbol
- Σ = Set of terminal symbols
- R = Set of rules of productions
- S = Start symbol
Sentence-Level
Construction
Imperative
- Begin with verb phrase & no subject
Declarative
- Subject noun phrase follow by verb phrase
Wh-Subject Question
Yes-No Question
- Begin with auxiliary verb
Wh-Non-Subject Question
- Same as Yes-No question