Please enable JavaScript.

Coggle requires JavaScript to display documents.

A Sentiment Analysis on Miley Cyrus’ Instagram Accounts (Information…

- - - - As a result of the Instagram API’s security measure, it was only possible to obtain the first 150 comments of each picture or video
      - The data were collected from the beginning of May 2016 until the beginning of June 2016 via the official Instagram API
      - All comments were saved into a database with the ID of the picture or video, as well as the ID of the account and the timestamp. The database consisted of approximately one million records after the data extraction
  - - - Sum of the polarities for each word or phrase is the polarity of the document (Kaushik and Mishra, 2014)
      - Kaushik and Mishra (2014) found a lexicon based approach for sentiment analysis that works fast
    - - Preprocessing
        
        Before analyzing the collected data, they had to be preprocessed by a python script. Spam such as chain mails, advertisements or comments with limited content like “first”
        
        Usernames and links in the comments were reduced to a more general term, namely “USERNAME” and “LINK”, without having an impact on the sentiment
        
        Also, the language of the comments was checked and automatically translated to English. Replacing abbreviations with their actual term was not required in this investigation due to repeating characters having emotionality themselves
      - Sentiment Analysis
        
        As a solution, the Natural Language Toolkit (http://www.nltk.org/) for Python programs is used. With a POS-Tagger, the right part-of-speech is recognized, which leads to more correct sentiment word values.
        
        AFINN-111 is used as the emotion lexicon in this sentiment analysis. Two words, “like” and “lie”, need a special treatment, because the lexicon itself cannot deal with ambiguity problems.
        
        AFINN is a list of English words rated for valence with an integer between -5 and +5. Finn Årup Nielsen (2011) labeled the words manually in 2009 to 2011 for sentiment analysis on microblogs
        
        Using SentiStrength (Thelwall et al., 2010) as a model, the Python based sentiment analysis program consists of an emoticon list, an emotion lexicon, a negation lexicon, a lexicon for booster words like “very” or “totally” as well as a lexicon for phrases
        
        The following approach detects the sentiment strength (positive, neutral and negative) within an interval of -5 to +5 (from negative until positive)
        
        Sentiment strength of 0 is considered as a neutral sentiment
        
        The sentiment analysis program operates different steps and assigns the final sentiment. Each comment gets a sentiment for the written text as well as one for the emoticons – those were combined to the final sentiment of the comment
        
        Steps
        
        First, the comment gets tokenized into sentences and next the sentences into words. To calculate the text sentiment, each word gets a sentiment value from the emotion lexicon
        
        Words in quotation marks are considered as quotes and assessed as neutral because they often do not reflect the users’ emotionality.
        
        Phrases that are present in the phrase lexicon get the sentiment value of that particular phrase. If the words of the phrase appear in the emotion lexicon as well, only the phrase value is important for the final comment sentiment.
        
        Also, the other lexicons were checked for negotiations (which can change the sentiment of a word from positive to negative, e.g. “not very happy”) and booster words like “very”
        
        All those sentiment values add up to the final text sentiment of a comment.
        
        After the sentiment analysis, each final sentiment of a comment is normalized to an interval of -5 to 5
        
        Additional Sentiment Rules
        
        idiom list
        
        The following rules are incorporated into SentiStrength (Thelwall et al. 2012a).
        
        The word “miss” is a special case with a positive strength of 2 and a negative strength of 2. It is frequently used to express sadness and love simultaneously, as in the common phrase, “I miss you”.
        
        A spelling correction algorithm
        
        At least two repeated letters
        
        A booster word list
        
        A negating word list
        
        An emoticon list with polarities
        
        Sentences with exclamation marks have a minimum positive strength of 2, unless negative (e.g., “hello Pardeep!!!”).
        
        Repeated punctuation
        
        Two consecutive moderate or strong negative terms
        
        Disabled rules
        
        Sentiment terms in CAPITAL letters receive a strength increase of 1.
        
        Two consecutive moderate or strong positive terms with strength at least 3 increase the strength of the second word by 1.
        
        Sentences containing irony
        
        Irony is operationalized by having positive sentiment in conjunction with the presence of a term or phrase from a user-defined list
        
        Some of the rules also need to be modified for non-English versions of SentiStrength and there are some options for this
        
        Supervised and Unsupervised Modes