Natural Language Processing 4 (Sentiment Analysis (Extraction of the…
Natural Language Processing 4
Deep NN - Uses huge corpora like twitter to train.
Parry (1972) - Tries to bring you into a conversation about the mafia and police
Eliza (1964) - Psychotherapist. Just asks questions about the previous response. Uses basic patterns
Goal Orientated Dialogue - Speech with a goal in mind. Eg. "When is the next train to Washington?" Probably means the person wants to go to Washington and is uninterested about fully booked trains
Explicit Performatives - Things that aren't true until you say them. Eg. "I name this boat Stalin."
Scalar - Uses a scale. Eg. "The rose was light red." Therefore its not quite pink
Non-conventional - Breaking context
Non-detachable - "I really loved this meal." Sarcasm
Defeasible - Eg. "I have 5 pounds." could be more
Flouting - Breaking a maxim
Maxim of Manner - Avoid ambiguity/obscurity. Be brief. etc.
Maxim of Relation - Be relevant
Maxim of Quantity - Make your contribution as informative as possible
Maxim of Quality - Do not say what you believe to be false
Study of language in use (so like speech)
Extraction of the sentiment (positive/negative orientation) of text.
General Inquirer - Basic Lexicons with a scale of valiance, so whether a word is positive or negative. 1966
MPQA Subjectivity Lexicon - Combination of sources. Also labelled for reliability. 2005
Polarity Lexicon - Created using wordnet. 2004.
Semi-supervised - Start with seed words (base) to build lexicon. Then build using a resource to find similar words. Words like "and" and "but" are used to strengthen the words. "un", "il" etc negate the answer. Used to build a polarity graph, which classifies the data.
Supervised Lexicons - Using reviews on websites to track positive and negative words. 5 star rating reviews are more likely to have positive words. Can be graphed and used as a scale of how positive/negative the word is.
Emphatics/Attenuators - Abverbs which emphasise/attenuate the adjectives after them. Eg. Absolutely good/bad. Pretty good/bad/
Lexicons for SA - Can just use a ratio of the positive/negative words. More complex version weights each word, and maybe even the adverbs before them
Applications - Spam detection, library organisation, detecting author similarities, etc.
Classification done via hand crafted rules. Rules are fragile as they need to adapt over time. Solution: Supervised machine learning
Generative Classifier - Builds a model of each class using documents.So given an observation, tries to predict the most likely class. Eg Naive Bayes
Discriminative Classifier - Learns what features discriminate between classes
Naive Bayes - Uses Bayes rules. P(c|d) = P(d|c)P(c) / P(d). P(d|c) = P(f1..fn|c) = P(f1|c)
Answer Type Prediction - Determines if its a who, what when etc. question. Built using QA
Question Expansion - Pulls out keywords from the question. Does this in stages that get more specific each stage
Passages - Finds paragraphs of text using the keywords and ranks them based on their answers. Finds the highest frequency answer
Recognising Textual Entailment (RTE) - Comparing two sentences to see if they semantically are the same, or generating a semantically correct version. Eg. "India buys missiles.", "India acquires arms."
Answers generated can be reasoned to see how similar they are.
Email Spam Filtering - Information Retrieval
Web Search - Information Retrieval. Searches for keywords
Machine Translation - Translating text between languages. One of the oldest problems in NLP