Detecting Sarcasm in Text: An Obvious Solution to a Trivial Problem…
Detecting Sarcasm in Text: An Obvious Solution to a Trivial Problem
Recent advances in natural language sentence generation research have seen increasing interests in measuring negativity and positivity from the sentiment of words or phrases
However, accuracy and robustness of results are often affected by untruthful sentiments that are of sarcasm nature and this is often left untreated
Sarcasm detection is a very important process to filter out noisy data (in this case, sarcastic sentences) from training data inputs, which can be used for natural language sentence generation
Design a machine learning algorithm for sarcasm detection in text by leveraging the work done by Mathieu Cliche of www.thesarcasmdetector.com and improving upon it
Analysis on social media has attracted much interest in the research areas of NLP over the past decade (Pta´cek et al. ˇ , 2014).
Dataset and Features
Baseline Model Description
Aside from a value of 0.1 for the penalty parameter
C, all other configuration options are left as default.
Support vector machine (SVM) as implemented by the LinearSVC function from scikitlearn, a popular open source machine learning library in Python
Features are extracted from the raw Twitter data to create training examples that are fed into the SVM to create a hypothesis model.
The tweets were collected over a span of several months in 2014. The sanitation processed included removing all the hashtags, non ASCII characters, and http links
In addition, each tweet is tokenized, stemmed, and uncapitalized through the use of the Python NLTK library.
For each tweet, features that are hypothesized to be crucial to sarcasm detection are extracted. The features fall broadly into 5 categories: n-grams, sentiments, parts of speeches, capitalizations, and topics.
Individual tokens (i.e. unigrams) and bigrams are placed into a binary feature dictionary.
Bigrams are extracted using the same library and are defined as pairs of words that typically go together. Examples include artificial intelligence, peanut butter, etc.
A tweet is broken up into two and three parts
Sentiment scores are calculated using two libraries (SentiWordNet and TextBlob)
Positive and negative sentiment scores are collected for the overall tweet as well as each individual part. Furthermore, the contrast between the parts are inserted into the features
Parts of Speech
The parts of speech in each tweet are counted and inserted into the features.
A binary flag indicating whether the tweet contains at least 4 tokens that start with a capitalization is inserted into the features
The python library gensim which implements topic modeling using latent Dirichlet allocation (LDA) is used to learn the topics.
The collection of topics for each tweet is then inserted into the features.
Methods and Analysis
. Analysis of Baseline Model
Initial analysis of the baseline model quickly reveals that the testing error far exceeds the training error. The large gap between training and testing error suggests that the model is suffering from high variance
Model Improvement Methods
ONE CLASS SVM
Targeted Areas for Improvement
Also, non-sarcastic sentences can still have both positive and negative sentiments.
Also, we should test whether adding sentiments improves our classification by a significant factor. While it is true that some sarcastic sentences have words words with negative sentiments and others with positive sentiments, many other sentences do not have this property. Thus, adding this feature might not be useful.
In both cases, we think it is important to reduce the dimension of feature space and use relevant features. For instance, the benefit of adding some features such as bigrams, sentiments and topics is not clear. Bigrams might have the same effect as unigrams.
The high testing error of the model at hand implies that we are fitting noise. This problem could be caused by the fact that we have a high dimensional feature space. Another possibility is that there are features that are not relevant for detecting sarcasm
We also think that finding the sentiments in each training example takes a lot of time.
For each training example, we have to look for the sentiment of each word in a dictionary, which takes a lot of time.
We also want to investigate the topics that are added. Topic modeling using LDA might be returning similar words as the unigrams of the training example, and we might end up getting redundant information.
However, we think that categorizing the training examples into a set of topics can be useful in a different way than it is used. Instead of adding topics as a separate feature, we might split our classifier to n-classifiers, where n is the number of topics in the training set. In other words, we build a classifier for each topic.
Jan Wei Pan
Naive Bayes and one-class SVM. Both of them misclassify most of the sarcastic data.
We also see that the accuracy greatly depend on a mixture of feature types. Unigrams and bigrams alone are insufficient in designing an accurate classifier. When combined with other types such as topic modeling, the accuracy is greatly increased
We found more questions than answers but that in of itself is a small step in the right direction.