SENTIMENT ANALYSIS OF INDONESIAN LOW COST GREEN CARS WITH TWITTER DATA

Meta

Information

Goals

Result

Year

2014

Author

Avin Mohanza Kasim

Methodology

Objective

Examines the sentiment analysis of Low Cost Green Car such by using tweets for measuring the satisfaction of people

Acquire a better accuracy in sentiment analysis, to predict the opinion of twitter users properly.

The other objective of this research is to provide car manufacturer with the latest feedback from their product based on customer sentiment

Research Problem

Data extraction for Indonesian language

Limited data collection time

Finding good rules for information extraction

The sentiment from twitter is sarcasm

The accuracy of classification based on Indonesia language

Data Collection

Data Preprocessing

Creating Sentiment Lexicon Dictionary

Sentiment Analysis Process

Evaluation Analysis

Experiment

Design and Instrumentation

RapidMiner

A tool for machine earning, data mining, text mining and predictive analyitics

Crawling

This research uses tweets data from anonymous users in Indonesia, using Twitter hashtags example #ayla, #brio to identify negative or positive and neutral tweets

Data Analysis

Subjectivity validation

experimental results generated by the proposed method should be evaluated and checked by human being

Twitter crawling technique will be used for collecting and crawling the data in twitter

Twitter crawling will use google drive spreadsheets that extended with google script language. The result will be stored in database and processed with Microsoft excel.

Literature Review

Sentiment Analysis

Sentiment analysis sometime referred to opinion mining, is the field of study that analyzes people opinion, behavior and attitude to an object and its attribute.

Theresaet al (Theresa Wilson, 2009) determined that sentiment analysis is a task to identify positive or negative opinions, emotions and evaluation.

Emoticon Sentiment Analysis

Methodology Classification

Based on Rohit Kumar Jha(Rohit Kumar Jha, 2013), there 3 methodologiesto classify the sentiment of each tweet

Bag Words of Models

Naïve Bayesian Classifiers

Make the demonstration used in natural language processing (NLP) and information retrieval (IR) simple

A text (such as sentence or document) is represented as unordered collections of words, ignoring the grammar and even sequence of word

Word list then present the score of each word. Positivity or negativity or sentiment strength are concluded based on the cumulative score of all the words in the text

A simple probability methodology classifiers applying the Bayes theorem with strong (naive) independence hypothesis.

Mostly, maximum entropy classifiers are applied as one of the among choices to naïve Bayesian classifiers.

Support Vector Machine

Machine Learning Methods

Naïve Bayes

Maximum Entropy

Support Vector Machine

Web Crawling

Data Selection

The tweets are only scooped only for selected Indonesian Low Cost Green Car information

Data Storing

Data are stored in google spread sheet and export it to Microsoft excel form. Below is the illustration of zapier communication pattern bin data collection.

Simplifying the collected data becomes less, more informative and easier to obtain the result

done manually because of the complication of Indonesian language and the complex of the data.

Substeps

Removing promotion tweets

Removing foreign language

Removing link

Removing instatweet

Removing youtube link

Removing duplicate tweet

Sentiment dictionary is a dictionary contain positive and negative adjective from the simplified tweet in each LCGC tweets

Dictionary is manually created because of the complication of Indonesian language.

Sentiment lexicon dictionary is a training set to detect the low cost green car dataset. The detection is not manually done but using machine learning method.

In this research, rapidminer is the application that becomes machine. Computer is not able to learn Indonesia language and find its meaning. Rapidminer provides training modeling that designed by the user and let the machine learn with the model that is created by the user.

users begin to define each sentiment from each tweet that have already collected and simplified with sentiment lexicon inspection by using rapid miner.

Rapid miner provides several algorithms such as naïve bayes and support vector machine.

Support Vector Machine and Text Classification

Basic SVM needs a set input of data and do the prediction, for each give input is the possible class forms the output, making it non-probabilistic linear classifier.

SVM is very universal learner. One outstanding function of svm is that its ability to learn can be independent of the dimensional of the feature space.

The goal of text classification is to classify documents into a fixed number of predefine categories. Using machine learning, the objective is to learn classifiers from the available document which performs the category without human helps.

Given a set of training examples, each marked as one of three categories, an SVM training algorithm forms a model a new example into one category or the other based on certain feature vector.

Few Irrelevant Features

Document vector are sparse

High Dimensional Input Space

When learning text classifiers, one has to deal many features(more than 10000)

Since SVM uses overfitting protection, which does not necessarily depend on the number of features, they have potential to handle these large feature space

click to edit