Please enable JavaScript.

Coggle requires JavaScript to display documents.

Machine Learning and Lexicon based Methods for Sentiment Classification: A…

- - - - Trains a text classifier on a human labeled training dataset
      - Feature Selection For Sentiment Analysis
      - They test several features to fined optimal feature set: unigrams, bigrams, adjective and position of words were used as features, and found that the best performance was achieved when the unigrams were used in SVM classifier.
      - In the later work, Pang and Lee[20] reported improvement by adding a preprocessing filter to remove objective sentences which allowed the classifier to focus only on subjective sentences, raising the accuracy from 82.9% to 86.4% in a movie reviews dataset.
      - When the set of training data is small, a naive Bayes classifier might be more appropriate since SVM must be exposed to a large set of data in order to build a high-quality classifier.
      - Supervised & Unsupervised
        
        Supervised
        
        Require a large corpus of training data and its performance depends on a good match between the training and testing data with respect to the domain, topic and time-period
        
        Input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.
        
        It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process.
        
        Supervised learning problems can be further grouped into regression and classification problems.
        
        Classification:
        
        A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”.
        
        Regression
        
        Example
        
        3 more items...
        
        Regression
        
        A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
        
        Unsupervised
        
        Example
        
        k-means for clustering problems
        
        Group
        
        Clustering
        
        A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
        
        Association
        
        An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
      - Feature Selection For Sentiment Analysis
        
        A critical task in sentiment analysis and effectively selected representative features from subjective text can improve sentiment based classification.
        
        It was shown that using unigrams as features in classification performed well with either naive Bayesian or SVM.
      - Sentiment classification based on machine learning can be formulated as a supervised learning problem
    - - Opinion lexicon methods adopt a lexicon to perform sentiment analysis, by counting and weighting sentiment-related words that have been evaluated and tagged.
      - Collecting the opinion word list
        
        Manual approach
        
        Time consuming and thus it is not usually used alone, but combined with automated approaches as the final check because automated methods make mistakes.
        
        Corpus-based approach
        
        Use a seed set of sentient words with known polarity and exploit syntactic patterns of co-occurrence patterns to identify new sentiment words and their polarity in a large corpus
        
        This method scanned through a review looking for phrases that match certain part of speech patterns (adjective and adverbs), and then added up all sentiment orientation to compute the orientation of a document.
        
        For example, Turkey determined whether words are positive or negative and how strong the evaluation is by computing the words' point-wise mutual information (PMI) for their co-occurrence with the word's sentiment orientation
        
        Dictionary-based approach
        
        Exploit available lexicographical resources like WordNet or HowNet.
        
        The main strategy in these methods is to collect an initial seed set of sentiment words and their orientation manually, and then searching in a dictionary to find their synonyms and antonyms to expand this set
        
        The manual approach is very time consuming and thus it is not usually used alone, but combined with automated approaches
        
        Using corpus-based approach alone to identify all opinion words, is not as effective as the dictionary-based approach because it is hard to prepare a huge corpus to cover all words.
        
        However, this approach has a major advantage it can help to find domain-specific opinion words and their orientations if a corpus from only the specific domain is used in the discovery process.