Lexicon-based Sentiment Analysis for Reviews of Products in Brazilian…
Lexicon-based Sentiment Analysis for Reviews of Products in Brazilian Portuguese
Lucas V. Avanc¸o
Maria G. V. Nunes
Presents some results on lexicon-based classification of sentiment polarity in web reviews of products written in Brazilian Portuguese
The evaluation shows the performance of 3 different sentiment lexicons combined with simple strategies. It is also discussed the risk of considering the rating provided by the writers for the purpose of evaluating the algorithms.
The results show that the better combination is the version of the algorithm that deals also with negation and intensification and uses the sentiment lexicon Sentilex
Sentiment Analysis can be seen as a natural language processing (NLP) task that aims to analyze opinions, sentiments, and emotions expressed in unstructured data
A common task in this research area is polarity classification, which consists in classifying the overall sentiment present in a document or sentence. Usually this task is simplified by classifying a text or a sentence in 3 classes: positive, negative or neutral.
The classification of a text or a sentence according to its semantic orientation or polarity (positive, negative or neutral) can be performed by machine learning or lexicon based methods or even hybrid methods.
Despite of the high accuracy reached by these approaches, when the classifier is used for another domain, its performance decreases significantly .
Most of machine learning approaches use algorithms such as Support Vector Machine, Naive Bayes and Maximum Entropy, which are trained on a particular dataset for one specific domain
Rely only on linguistic knowledge, and they are more robust across domains and texts 
Nevertheless, high accuracy is harder to achieve
Basically they use a sentiment lexicon consisting in a set of pairs of word and its polarity. Words belonging to a sentiment lexicon are called sentiment words.
It is important to notice that not every word has a polarity value (and hence belong to the lexicon); usually adjectives, adverbs and some substantives and verbs have polarity values
Combine lexicon-based and supervised learning, and even manually written linguistic rules.
Different unsupervised learning methods can also be used in a cascade way such that whether one classifier fails, the next one tries to classify, and so on, until the text is classified or there is no more classifier to use
The prior polarities are defined by the sentiment lexicon. We have used separately 3 sentiment lexicons for BP: OpinionLexicon, SentiLex and a subset of LIWC.
In this work we consider the classes positive and
negative; the class neutral is not being considered yet.
It considers the prior polarity of words according to a sentiment lexicon and uses some linguistic knowledge about contextual valence shifting (negation and intensification) to compute the polarity value of each sentence and text
The method for building the lexicon-based classifier (LBC) proposed by this paper is basically a variation of the method developed by 
The polarity value of a text is the sum of the prior polarities of its sentiment words, eventually modified by the contextual valence. If the sum is positive (strictly greater than zero) the opinion is classified as positive, otherwise it is classified as negative. To deal with contextual shifting, a set of negation words and booster-reducer words is used
Three possible scenarios demand change of polarity value:
The comments are written in a free format within a template with three sections: Pros, Cons, and Opinion. The reviews selected are specific about mobiles and smartphones
To evaluate our classifier we used a dataset composed by reviews of products crawled from the database of one of the most traditional online services in Brazil, called Buscape,´
If a sentiment word occurs in an intensification context, its polarity value is tripled if a booster word was found (line 8 of algorithm), or divided by three if a reducer word was found (line 14). When negation and intensification words are in the same context, the amplifier turns a downtoner, or the opposite (lines 6 and 12).
The context is defined by a window whose size was empirically chosen as of 4 words (choosing 3 or 5 words produced worse results) to the left of the sentiment word. In the first case (only negation), if there is some sentiment word and a negation word in the same context, the polarity value is flipped (line 17 of algorithm in Frame 1).
and both of them together.
This classification is based on the reviewer’s final recommendation (or not) about the product.
However the analysis of a small sample has revealed how inconsistent can be the classification (final recommendation) given by the writer when one takes into account the corresponding text.