Please enable JavaScript.
Coggle requires JavaScript to display documents.
SENTIMENT ANALYSIS USING AUTOMATIC CLASSIFICATION ON ONLINE MEDIA ARTICLE
SENTIMENT ANALYSIS USING AUTOMATIC CLASSIFICATION ON ONLINE MEDIA ARTICLE
Information
Sentiment Analysis
Algorithm
Turney (2010) presents a simple unsupervised learning algorithm for classifying reviews as recommended or not recommended. The classification is predicted by the average of semantic orientation. A review classified as recommended if it has a good associations (e.g., “subtle nuances”) and not recommended reviews is a review that has a bad association(e.g., “very cavalier”)
Pang (2008) sentiment classification using machine learning technique classifying documents not by topic, but by the overall sentiment
Sentiment analysis is a process of classifying articles as a positive or negative
WordNet
Lexical
a lexical databasein any language Consist of sets of synonyms (synsets), definition, and semantic relations between the synsets
Purpose
Support automation text analysis and artificial intelligence (AI) applications
Combining thesaurus and dictionary to produce more intuitively usable information
Includes the following semantic relations
Synonymy
Anatomymy
Hyponymy
Troponymy
is for verbs what hyponymy is for noun, although the resulting hierarchies are much shallower.
Entailment
Relations between verbs are also coded in WordNet
K-nearest Neighbors Algorithm for Data Classification
Sentiment analysis divided into two typesof tasks
Basic task
Classifying the expressed opinion in a document, sentence, or feature/aspect level(Haaff, 2010)
Advanced task
Advanced task look for specific emotional states such as “angry”, “sad”, and “happy”.
Method
Pang and Lee (2005) expanded the basic task of classifying a movie review. The proposed method determines whether the review about some topic has positive or negative reviews. The result is used to predict the star ratings (on a five-star scale)
Featured/aspect-based sentiment analysis (the most common model in sentiment analysis) such as research by Hu and Liu.(2004) determines the sentiment expressed in attributes or component of entities (e.g., a digital camera)
Wu & Palmer
Measure calculates relatedness by considering the depths of the two synsets in the WordNet taxonomies, along with the depth of the LeastCommon Subsumer (LCS)(Wu & Palmer, 1994)
Similarity Matrix
Similarity matrix is a matrix that contains similarity score between two sentences or articles.Similarity matrix can be used to calculate the overall similarity between two articles
NLP
natural language processing
Computerized approach to analyzing text(Liddy, 2001)
In general it could be defined as computational techniques for analyzing and representing text at many levels of linguistics to achieve human-like languageprocessing
NLP systems would be able to (Liddy, 2001):
Paraphrase an input text
Translate the text into another language
Answer questions about the contents of the text
NLP was originally mixed from various disciplines (e.g., Linguistics, Computer Science, Cognitive and Psychology)
Categories
Symbolic
Perform deep analysis of linguistic phenomena and are based on explicit representation of facts about language through well-understood knowledge representation schemes and associated algorithms (Basili, 1996)
Statistical
Used many mathematical techniques and often used for large text. The primary source of evidence that used in this approach is using the observable data.
Hidden Markov Model (HMM)
Statistical approach work by using this large text corpus to develop generalized models without added with significant linguistic or world knowledge.
Connectionlist
Use statistical learning and theories of representation.The statistical learning used in this approach is the same with with statistical approaches
This approach uses connectionlist mode, -a network of interconnected simple processing units with knowledge stored in the weights of the connection between units (Rumelhart, 1998).
Hybrid
Frequent applications of NLP
Information Retrieval (IR)
Information Extraction (IE)
Question-Answering (QA)
Summarization
reduces larger text into a shorter, but contains the most important information in the text
Dialogue Systems
NLP Levels
Phonology
Thislevel looks to interpretation of speech sound between words
Morphology
deals with words that are composed with morphemes. The purpose is to gain and represent the real meaning of the word itself form the morpheme.For example,the word punched with suffix–ed.The system will know that the verb punched took place in the past.
Lexical
NLP system interprets the meaning of individual words.There are several type of processing in word-level understanding.First, assign each word with its part-of-speech tag.If a word has many part-of-speeches, it will be assigned with the most probable part-of speech tag according to the context
Lexicon may be simple information of words and their part-of-speech, or complex information (e.g., the semantic class, the argument, semantic limitations, and definitionsof the sense).
Syntactic
focuses on analyzing the grammatical structure of the sentences.The output of this level is information of structural dependency relationships between words
Semantic
Focuses on determines the possible meaning of a sentence
Achieved by analyzing the interactions of word-level meaning in the sentence.
Discourse
Discourse focuses on the properties of text as a whole that convey meaning rather than interpret multi-sentence texts
type of discourse processing
anaphora resolution
replacing words with the appropriate entity
dicourse recognition
adds meaningful representation of the text by determining the functions of the sentences in the text
What is
Lexical
Methodology
Text Crawler Framework
Selenium Framework
Evaluation
Subjectivity Measurement
Measuring the performance and accuracy of the sentiment analysis system
Each news article has been classified as negative news or positive news by the sentiment analysis system. In this live experiment, participants need to define the accuracy of the classification done by the system, whether the positive or negative sentiment generated by the system is correct or not
Goal
Objective
Design and implement a system that can analyze sentiment of text in Bahasa Indonesia
Meta
Author
Feizal Badri Asmoro
Year
2013
Research Question
How do we analyze the sentiment of an online media streaming article in Bahasa Indonesia?
Is the data suitable for the system to be analyzed?
What is the most appropriate method for this problem?
Hypothesis
sentiment of an online media streaming can be obtained by using pairwise matching, vector matches and similarity matrix
It is possible that online media streaming article has some script, images, and many noisy texts within the data. Therefore, pre-processing is necessary so that the data is compatible and can be analyzed by the system.
k-NNis an appropriate solution to classify the sentiment in an online media streaming article.
Result