Please enable JavaScript.
Coggle requires JavaScript to display documents.
SENTIMENT ANALYSIS OF INDONESIAN LOW COST GREEN CARS WITH TWITTER DATA…
SENTIMENT ANALYSIS OF INDONESIAN LOW COST GREEN CARS WITH TWITTER DATA
Meta
Year
2014
Author
Avin Mohanza Kasim
Research Problem
Data extraction for Indonesian language
Limited data collection time
Finding good rules for information extraction
The sentiment from twitter is sarcasm
The accuracy of classification based on Indonesia language
Information
Methodology
Data Collection
Web Crawling
Data Selection
The tweets are only scooped only for selected Indonesian Low Cost Green Car information
Data Storing
Data are stored in google spread sheet and export it to Microsoft excel form. Below is the illustration of zapier communication pattern bin data collection.
Data Preprocessing
Simplifying the collected data becomes less, more informative and easier to obtain the result
done manually because of the complication of Indonesian language and the complex of the data.
Substeps
Removing promotion tweets
Removing foreign language
Removing link
Removing instatweet
Removing youtube link
Removing duplicate tweet
Creating Sentiment Lexicon Dictionary
Sentiment dictionary is a dictionary contain positive and negative adjective from the simplified tweet in each LCGC tweets
Dictionary is manually created because of the complication of Indonesian language.
Sentiment lexicon dictionary is a training set to detect the low cost green car dataset. The detection is not manually done but using machine learning method.
In this research, rapidminer is the application that becomes machine. Computer is not able to learn Indonesia language and find its meaning. Rapidminer provides training modeling that designed by the user and let the machine learn with the model that is created by the user.
Sentiment Analysis Process
users begin to define each sentiment from each tweet that have already collected and simplified with sentiment lexicon inspection by using rapid miner.
Rapid miner provides several algorithms such as naïve bayes and support vector machine.
Evaluation Analysis
Experiment
Design and Instrumentation
RapidMiner
A tool for machine earning, data mining, text mining and predictive analyitics
Crawling
This research uses tweets data from anonymous users in Indonesia, using Twitter hashtags example #ayla, #brio to identify negative or positive and neutral tweets
Twitter crawling technique will be used for collecting and crawling the data in twitter
Twitter crawling will use google drive spreadsheets that extended with google script language. The result will be stored in database and processed with Microsoft excel.
Data Analysis
Subjectivity validation
experimental results generated by the proposed method should be evaluated and checked by human being
Literature Review
Sentiment Analysis
Sentiment analysis sometime referred to opinion mining, is the field of study that analyzes people opinion, behavior and attitude to an object and its attribute.
Theresaet al (Theresa Wilson, 2009) determined that sentiment analysis is a task to identify positive or negative opinions, emotions and evaluation.
Emoticon Sentiment Analysis
Methodology Classification
Based on Rohit Kumar Jha(Rohit Kumar Jha, 2013), there 3 methodologiesto classify the sentiment of each tweet
Bag Words of Models
Make the demonstration used in natural language processing (NLP) and information retrieval (IR) simple
A text (such as sentence or document) is represented as unordered collections of words, ignoring the grammar and even sequence of word
Word list then present the score of each word. Positivity or negativity or sentiment strength are concluded based on the cumulative score of all the words in the text
Naïve Bayesian Classifiers
A simple probability methodology classifiers applying the Bayes theorem with strong (naive) independence hypothesis.
Mostly, maximum entropy classifiers are applied as one of the among choices to naïve Bayesian classifiers.
Support Vector Machine
Support Vector Machine and Text Classification
Few Irrelevant Features
Document vector are sparse
High Dimensional Input Space
When learning text classifiers, one has to deal many features(more than 10000)
Since SVM uses overfitting protection, which does not necessarily depend on the number of features, they have potential to handle these large feature space
Basic SVM needs a set input of data and do the prediction, for each give input is the possible class forms the output, making it non-probabilistic linear classifier.
SVM is very universal learner. One outstanding function of svm is that its ability to learn can be independent of the dimensional of the feature space.
The goal of text classification is to classify documents into a fixed number of predefine categories. Using machine learning, the objective is to learn classifiers from the available document which performs the category without human helps.
Given a set of training examples, each marked as one of three categories, an SVM training algorithm forms a model a new example into one category or the other based on certain feature vector.
Machine Learning Methods
Naïve Bayes
Maximum Entropy
Support Vector Machine
Goals
Objective
Examines the sentiment analysis of Low Cost Green Car such by using tweets for measuring the satisfaction of people
Acquire a better accuracy in sentiment analysis, to predict the opinion of twitter users properly.
The other objective of this research is to provide car manufacturer with the latest feedback from their product based on customer sentiment
Result