Please enable JavaScript.

Coggle requires JavaScript to display documents.

SENTIMENT ANALYSIS OF INDONESIAN MOBILE OPERATOR WITH TWITTER DATA…

- - - - Analysis and Design
        
        Analyze and design the model of rapid miner including selection of operators, e.g. “validation”, etc. that may assist in prediction analysis
      - Data Gathering
        
        Gather data from Twitter in limited period of time in approximately a month with specific categories, e.g. network performance, promotion and product of offering price
      - Preprocessing the Data
        
        Preprocess the data with case folding
      - Information analysis and sentiment analysis
        
        Calculate accuracy of rapid miner model optimally by means of determining among true/false positive and negative results of sentiments found.
      - Diagram and Statistic creation
  - - - Data used is gathered from social media Twitter.
      - The twitter data has been used for corpus creation based on the research by Alexander & Paurobek at 2010“Twitter as a Corpus for Sentiment Analysis and Opinion Mining”with data modification using Indonesian language tweets only.
      - Web Crawling
        
        Twitter has already implemented an Application Interface (API) to make developer or researcher to utilize API
        
        API is used to get data from the twitter or known as crawling
      - Manual Selection
        
        Manual Selection is the step to manually select and check if the tweet is having correct sentiment produced by Web Crawling process
        
        The tweet then also checked to match selected category
    - - Reduce the noise, normalize the words, reduce the word volume, and to remove duplicated tweet, remove Emoticon, Remove retweet, remove URL and Username in tweet, remove Special Characters
    - - The result that gathered in the corpus then separated into training set and testing set or validation
      - Feature Selection
        
        Only Unigrams feature is selected, because in Indonesiathere are no WordNet to easily classify words, and unigrams feature are classified every words in the document as a feature
      - Weighting
        
        Based on the previous Feature Selection step, the document are represented as vector-space. From the vector document, the keywords are ranked based on the importance in the document out of all documents.
        
        Weighting feature
        
        Term Frequency
        
        This method is counting the frequency of word in a document.
        
        Term Presence
        
        Counting the presence or absence of a word in a document. And the frequencies in a document is not affecting the count, because it is only counting if its presence or absence.
        
        Term Frequency –Inverse Document Frequency (TFIDF)
        
        TF-IDF are statistically reflecting the important of a keyword in a document or collection of documents.
      - SVM Classification
        
        One of the learning method in this research is SVM,SVM will differ the training set data into two class, negative and positive.
      - Validation and Evaluation (Testing)
      - Decision Tree
    - - X-Validation
        
        Used to validate the training set, because not all data in training set can be represented in training process. X-Validation is one of the technique to assessing how the results of a statically analysis will generate.
        
        X-Validation is one of the technique to assessing how the results of a statically analysis will generate.
        
        Validation process with cross validation using k-fold theory which is separate the data to the same k amount.So thev alidation process will be based on how many k simultaneously.
        
        X-Validation or Cross validation is a validation process which divide the model into several sections and it is called fold,
        
        After the model divide into folds, then the machine will make the first fold as a training set, and the data for training set will be taken from fold 2, 3 and 4. In this case the algorithms never seen the fold 1 before, and it is like a simulator for the data from the real world,
        
        After the error rate are acquired from the first round of x-validation, the fold are swap, in this case the swapped fold are fold 1 with fold 2, or it could be any fold as long it is never measured the error rate before, and the same algorithm used to measure fold 2 with the same process with the first round
        
        After the process of swap and measure error rate are finish, and there are no more fold to swap.The error rate for the model are measured by the average, and the error rate are called as X-Validation error.
      - Evaluation
        
        The performance model will be used, the model is based on the Rapidminer model. And it is using confusion matrix, which is the output from SVM.The training classification will used 1200 positive and 1200 negative data.
    - - Statistical Analysis
        
        The statistical analysis will be created with Microsoft Excel, and it will create some fancy facts based on the time, mobile operators, and many more. For example in the statistical analysis, we can know when is the most frequent of Indonesian users are tweeting.
      - Tag Word Creation
        
        Tag word analysis can create an insight of the data more specifically, for example we can know which operator that related to most of keywords.
      - Consist of statistic creation, tag words creation and also another interesting facts that can gather from the corpus
      - Based on the next-level analysis, the business can implement the data driven decision creation which is becoming a trend in the market.
  - - - Document classification into opinion or facts,class, or known as subjectivity classification
      - Document classification to the positive or negative, or known as sentiment analysis
    - - Term object used to show the entity has been commented or an object having components and a set of attribute (Liu, Sentiment Analysis and Subjectivity, 2010).
      - “The speed of Ferrari is so fast” then the object is “Ferrari” and the attribute that is commented is “Speed”.
      - Sentiment sentences
        
        Sentence that express emotion negative or positive as explicitly or implicitly
        
        Sentiment sentences also can be as subjective or objective sentences (Liu, Sentiment Analysis and Subjectivity, 2010)
    - - In the sentiment analysis, the document must be an opinion that having a sentiment, to gather sentiment document, every words inside the document should be analyze to prove the document is consist of sentiment words or not.
      - The collection of the words is called opinion lexicon. (Liu, Sentiment Analysis and Subjectivity, 2010).
      - (Liu, Sentiment analysis and subjectivity, 2010) Stated that gathering the words that having sentiment can used a dictionary. The strategy is to list all known sentiment in a words, and queried to the dictionary to get the synonym and the antonym of the words. The result from query is used as a parameter for the next query.
    - - A Sentiment as a feature f from an object is a positive, negative, emotion or objection respond from f from opinion holder (Liu, Sentiment analysis and subjectivity, 2010).
      - Opinion holder
        
        Individual or organization that express the sentiment
      - The orientation of opinion is related to polarity(Yi et al., Extracting sentiments about a given topic using natural language processing techniques, 2003).
      - Emotion is defined as subjective feeling and thought from individual or organization (Liu, Sentiment analysis and subjectivity, 2010).
    - - Topic based classification which classified the document based on the topic that have been described, for example sports, science and economic
      - Divided
        
        Coarse-grained sentiment analysis
        
        This means to gather insight whether the document have positive or negative sentiment, or known as document level sentiment classification.
        
        Fined-grained sentiment analysis
        
        The point of this sentiment analysis is to classify the subjectivity of a sentiment, or known as sentence level subjectivity classification, this is a step to define whether a sentence is subjective or objective and the opinion association.
    - - A closer look, 2011) the sarcasm is transforming the polarity of an apparently positive or negative utterance into its opposite. Sarcasm is considered as difficult problem in text mining (Nigam et al., towards a robust metric in polarity, 2006)
      - Based on (Roberto et al., Identifying Sarcasm in Twitter: A closer look, 2011) the proper way to minimalize mistake from sarcasm is by categorized the sarcastic document from non-sarcastic document that directly convey positive and negative sentiment.
    - - n-gram Feature
        
        Feature the analysis such as syntactic, semantic, POS and link based. The feature is extracted from the whole document and the relationship between words in the document is analyze before to get the relationship between words (O'Keefe, T. & Koprinska, I., 2009. Feature Selection and Weighting Methods in Sentiment Analysis).
      - Unigram Feature
        
        Symbol and unigram are represented as a vector and every words counted as one feature(O'Keefe, T. & Koprinska, I., 2009. Feature Selection and Weighting Methods in Sentiment Analysis).
      - A document can be stated as vector space model, if the document has a vector from the extracted keyword and from the vector the document is weighted to know the importance of the keyword in the document (O'Keefe, T. & Koprinska, I., 2009. Feature Selection and Weighting Methods in Sentiment Analysis)
  - - - The data can directly create a decision tree without any assumption
      - Can be classify numerical and class data.
    - - The classifier output is a class
      - The classification process can take long time, it depends on the data
      - The algorithm is unstable
      - The output is only have one attribute
      - The numerical data can produce complicated tree
  - - - Phonetic and Phonology
        
        Phonetic and Phonology are related to how the sound can produce known words.
      - Morphology
        
        Morphology is differing the words by its form. By this phase the words is separated between the words and the element
      - Syntaxes
        
        Related to the location of the word in the sentences, and the relationship between another words in a process of creating a systematic sentences.
      - Semantic
        
        Semantic is the mapping of the syntax with using every words to the root words without related to sentences.Semantic is learning the meaning of the word, and how the words are related to the meaning of the sentences
      - Pragmatic
        
        The knowledge in the pragmatic phase is related to the each context depends on the situation and the reason of the system creation.
      - Discourage Knowledge
        
        Discourage Knowledge is detecting if the sentences is already read and recognized can affect the next sentences.
      - Word Knowledge
        
        Word knowledge is related to the differentiation of the meaning of each words in the sentences or another context.
  - - - Binary classification
        
        The binary classification is only have two outputs, for example positive or negative
      - Multiclass classification
        
        The output from multiclass classification can be more than two class, for example in the mood analysis, happy, sad, angry, and bad.
      - Large scale multiclass classification
        
        The output is like multiclass classification, but the class can be thousands.