Natural Language Processing 2 (Parsing (CYK - Write out the sentence with…
Natural Language Processing 2
Discourse Representation Theory (DRT)
DRS to Logic - Box(A B) with dog(A), man(B) translates to ∃x∃y[dog(x) ^ man(y)]
Presupposition - Eg. "The dog chased a cat". Means there was a dog. A presupposition box has dotted lines and would specify the dog(x).
DR Structures - A way to graphically represent the DRT. Each box is a sentence and can be combined together when updated. You can also have nested DRS boxes.
Updates the discourse. So the representation of the events/objects.
Instead of using an ∃ with indefinite noun phrases, convert it with a ∀.
CYK - Write out the sentence with word in each box. Row 1 - Looks at one word so N -> Adam. Row 2 - Looks at two words so NP -> Det N etc so row n looks at the next n words.
Bottom Up - Where the parser starts from the Sentance and uses rules to get to S.
Chart Parser - Loops and backtracks through a sentence to build a grammar. First loop basically details what each word is. Eg. Fish -> N | V. Inactive Edges - An edge that has a fully complete grammar. Active Edges - An edge that still needs some more words. Eg. S -> NP . VP is an active edge where the parser has found a NP and is looking for a VP afterwards. DIAGRAM TO DO. Can be top down or bottom up.
Shift Reduce Parser - Uses a stack to parse a sentence. 3 operations: Shift - Adds the next word to the stack. Reduce - Uses the current stack to reduce the problem down into a set of symbols. Backtrack - If S but still words, backtracks. DIAGRAM TO DO. Doesn't work very well with real big sentences
Top Down Depth First - Start at S and expand a node as deep as possible then discard and move on to the next branch. Both are too wasteful/inefficient
Top Down Breadth First - Start at S and build a parsing tree using BF until all words are accounted for
Homonyms - Same word different meanings. Eg bank
Homographs/phones - Same sound/form
Synonyms - Different word same meaning
Polysemy - Related but distinct senses. Eg. baked a cake vs baked a potato
Most common words have the most senses
Good at Speech Generation and Speech Understanding. Not at Information Retrieval.
Preference Semantics - Basically adds tags to words to describe what style of word it is. Eg. Drink - Liquid, edible
Requires syntactic analysis beforehand
Manual set up
Lesk's Algorithm - For a single word, look at the words in its definitions and try to match that up with other word's definitions
Simulated Annealing - Improves it by starting with a guess and then generating permutations and evaluating it to get closer to an answer.
Statistical Approach - Building statistical models from Roget Thesaurus. Then used this on a large encyclopedia corpus.
Type of Semantic Analysis. Relating words together and trying to figure out what words are talking about each other. Eg. Sally has a broom. It was red
Wordnet - Manually constructed. Incomplete, many new words are not added yet
Discourse Problem - "A dog arrived", "The dog barked". ∃x[dog(x) ^ barked(x)]. But then how do you reference the same event? Can be solved with ^'s but still doesn't fix the problem
Indefinite noun phrase - Can be treated as an existential quantifier. Eg. "Every child owns a dog" - ∀x[child(x) -> ∃y[dog(y) ^ owns(x, y)]]
Better solution - Having an event e and then many separate classifiers. Eg. eat(sandwich, e) ^ eatenby(me, e) ^ place(work, e). For this, events would need to be distinguishable.
Events - Eg. "I ate a sandwich" - ate2(me, sandwich). "I ate a sandwich at work" - ate3(me, sandwich, work). Very messy. Slightly better solution is to have one predicate with all possible arguments.
Categories - Can't just categorise everything. Eg. best_name(Adam). Would become infinitely huge. Solution is to use a few common classifiers such as "is a"
Sentences can be converted into predicate logic as a way to represent the semantics of the sentance. Eg. John ran -> Run(j) ^ Tense(past)