Please enable JavaScript.
Coggle requires JavaScript to display documents.
Natural Language Processing 4 (Sentiment Analysis (Extraction of the…
Natural Language Processing 4
Sentiment Analysis
Extraction of the sentiment (positive/negative orientation) of text.
General Inquirer - Basic Lexicons with a scale of valiance, so whether a word is positive or negative. 1966
MPQA Subjectivity Lexicon - Combination of sources. Also labelled for reliability. 2005
Polarity Lexicon - Created using wordnet. 2004.
Semi-supervised - Start with seed words (base) to build lexicon. Then build using a resource to find similar words. Words like "and" and "but" are used to strengthen the words. "un", "il" etc negate the answer. Used to build a polarity graph, which classifies the data.
Supervised Lexicons - Using reviews on websites to track positive and negative words. 5 star rating reviews are more likely to have positive words. Can be graphed and used as a scale of how positive/negative the word is.
Emphatics/Attenuators - Abverbs which emphasise/attenuate the adjectives after them. Eg. Absolutely good/bad. Pretty good/bad/
Lexicons for SA - Can just use a ratio of the positive/negative words. More complex version weights each word, and maybe even the adverbs before them
SA Classification
Applications - Spam detection, library organisation, detecting author similarities, etc.
Classification done via hand crafted rules. Rules are fragile as they need to adapt over time. Solution: Supervised machine learning
Generative Classifier - Builds a model of each class using documents.So given an observation, tries to predict the most likely class. Eg Naive Bayes
Discriminative Classifier - Learns what features discriminate between classes
Naive Bayes - Uses Bayes rules. P(c|d) = P(d|c)P(c) / P(d). P(d|c) = P(f1..fn|c) = P(f1|c)
...
P(fn|c)
Beyond QA
Question Processing
Answer Type Prediction - Determines if its a who, what when etc. question. Built using QA
Question Expansion - Pulls out keywords from the question. Does this in stages that get more specific each stage
Passages - Finds paragraphs of text using the keywords and ranks them based on their answers. Finds the highest frequency answer
Recognising Textual Entailment (RTE) - Comparing two sentences to see if they semantically are the same, or generating a semantically correct version. Eg. "India buys missiles.", "India acquires arms."
Answers generated can be reasoned to see how similar they are.
Pragmatics
Study of language in use (so like speech)
Cooperative Principle
Maxim of Quality - Do not say what you believe to be false
Maxim of Quantity - Make your contribution as informative as possible
Maxim of Relation - Be relevant
Maxim of Manner - Avoid ambiguity/obscurity. Be brief. etc.
Flouting - Breaking a maxim
Implicatures
Defeasible - Eg. "I have 5 pounds." could be more
Non-detachable - "I really loved this meal." Sarcasm
Non-conventional - Breaking context
Scalar - Uses a scale. Eg. "The rose was light red." Therefore its not quite pink
Explicit Performatives - Things that aren't true until you say them. Eg. "I name this boat Stalin."
Goal Orientated Dialogue - Speech with a goal in mind. Eg. "When is the next train to Washington?" Probably means the person wants to go to Washington and is uninterested about fully booked trains
Chatbots
Eliza (1964) - Psychotherapist. Just asks questions about the previous response. Uses basic patterns
Parry (1972) - Tries to bring you into a conversation about the mafia and police
Deep NN - Uses huge corpora like twitter to train.
Applications
Machine Translation - Translating text between languages. One of the oldest problems in NLP
Web Search - Information Retrieval. Searches for keywords
Predictive Text
Email Spam Filtering - Information Retrieval