An Impact Analysis of Features in a Classification Approach to Irony…
An Impact Analysis of Features in a Classification Approach to Irony Detection in Product Reviews
Finally, we use a decision tree (Breiman et al., 1984; Hastie et al., 2009) and a random forest classifier (Breiman, 2001).
The naive Bayes classifier is employed with a multinomial prior (Zhang, 2004; Manning et al., 2008). This classifier might suffer from the issue of over-counting correlated features, such that we compare it to the logistic regression classifier as well (Yu et al., 2011).
We use a support vector machine (SVM, Cortes and Vapnik (1995)) with a linear kernel in the implementation provided by libSVM (Fan et al., 2005; Chang and Lin, 2011).
The feature Emoticon indicates the occurrence of an emoticon. In order to capture a range of emotions, it combines a variety of emoticons such as happy, laughing, winking, surprised, dissatisfied, sad, crying, and sticking tongue out.
The Interjection feature indicates the occurrence of terms like “wow” and “huh”, and Laughter measures onomatopoeia (“haha”) as well as acronyms for grin or laughter (“
The Punctuation feature conveys the presence of
an ellipses as well as multiple question or exclamation marks or a combination of the latter two.
Ellipsis and Punctuation indicates that an ellipsis is followed by multiple exclamation marks or a combination of an exclamation and a question mark.
The feature Pos/Neg&Ellipsis indicates that such a positive or negative span ends with an ellipsis (“. . . ”). Ellipsis and Punctuation indicates that an ellipsis is followed by multiple exclamation marks or a combination of an exclamation and a question mark.
The feature Pos/Neg&Punctuation indicates that a span of up to four words contains at least one positive (negative) but no negative (positive) word and ends with at least two exclamation marks or a sequence of a question mark and an exclamation mark (Carvalho et al., 2009)
The feature Hyperbole (Gibbs, 2007) indicates the occurrence of a sequence of three positive or negative words in a row
Similarly, the feature Quotes indicates that up to two consecutive adjectives or nouns in quotation marks have a positive or negative polarity
We formulate the problem as a supervised classification task and evaluate different classifiers, reaching an F1-measure of up to 74 % using logistic regression
However, no baseline for classification of ironic
or sarcastic reviews has been provided.
Specific cases of irony and sarcasm have been studied in different contexts but, to the best of our knowledge, only recently the first publicly available corpus including annotations about whether a text is ironic or not has been published by Filatova (2012)