POS Formal
Grammar

Classes

Open

  • New ones can be created all the time
  • Noun, verbs, adjective, adverb

Close

  • Relative fixed membership
  • Prepositions (new prepositions are rarely coined)
  • Particles (up,down)
  • Determiner (the, a, an)
  • Pronoun (he, she)
  • Conjunction
  • Auxiliary verb (can, should, would)
  • Numerals

POS tagging

Process of assigning lexical class to each word

Penn Treebank Tagset

  • English has 45-tag

POS = Disambiguation task
Word = Ambigious

POS
Tagging Method

Rule-Based

Stochastic/
Probabilistic

Assign Potential POS Tags
Assign lists of potential POS tags
to each word based on dictionary

Apply rules to eliminate tag
Apply hand-written constraints until
each word has only one possible POS

Eg. DT cannot immediately precede a Verb
The/DT run/VBP

Bayes Rules

ˆtn1


=P(tn1|wn1)


=argmaxˆtn1P(wn1|tn1)P(tn1)P(wn1)

Simplified


\(= argmax_{\widehat {t}^n_1} P(w_1^n | t_1^n) P(t_1^n) \)
\( = \prod_1^n P(w_i | t_i) P(t_1 | t_{i-1}) \)

Constituency

Group of words behave like as a single unit/constituent

Evidence

Appear in similar syntactic environments

  • Noun Phrases can occur before verbs

Preposed or postposed constructions

  • on September seventeenth

Context Free Grammar

Formal system for modeling constituent structure

Rules/
Productions

Terminal

  • Symbol that correspond to words in language

Non-Terminal

  • Symbol that express abstractions

G = (N, Σ, R, S)

  • N = Set of non-terminal symbol
  • Σ = Set of terminal symbols
  • R = Set of rules of productions
  • S = Start symbol

Sentence-Level
Construction

Imperative

  • Begin with verb phrase & no subject

Declarative

  • Subject noun phrase follow by verb phrase

Wh-Subject Question

Yes-No Question

  • Begin with auxiliary verb

Wh-Non-Subject Question

  • Same as Yes-No question