A Sentiment Analysis on Miley Cyrus’ Instagram Accounts (Information…
A Sentiment Analysis on Miley Cyrus’ Instagram Accounts
Do comments belong to haters or admirers?
Includes a time series on the sentiment towards
A dictionary-based sentiment analysis on more than 660,000 filtered comments of media concerning to Miley Cyrus has been performed.
First the data has been collected through the Instagram API from the official as well as fan-based Miley Cyrus accounts. Afterwards the comments were preprocessed by a python script
The comments have been translated, words without any impact have been replaced with a general term (e.g. usernames or links), and comments with no sense (e.g. “first”), advertisements, as well as chain mails have been deleted
Finally, the sentiment of each comment has been
As a result of the Instagram API’s security measure, it was only possible to obtain the first 150 comments of each picture or video
The data were collected from the beginning of May 2016 until the beginning of June 2016 via the official Instagram API
All comments were saved into a database with the ID of the picture or video, as well as the ID of the account and the timestamp. The database consisted of approximately one million records after the data extraction
Method of sentiment analysis can be differentiated in two main strategies: lexicon based and machine learning based technique
Before the analysis of an unknown dataset the machine learning based algorithm has to be trained by a training data set
Sum of the polarities for each word or phrase is the polarity of the document (Kaushik and Mishra, 2014)
Kaushik and Mishra (2014) found a lexicon based approach for sentiment analysis that works fast
Machine learning based
Khan, Atique and Thakare (2015) combine the methods of lexicon as well as machine learning based methods to improve precision and get a high recall.
Sentiment analysis in social media (Pozzi et al., 2017) is different from “classical” sentiment analysis of newspaper articles, for instance. Here, we have text and we have additionally emojis
We were only able to identify very few approaches of sentiment analysis of Instagram hashtags (Nam, Lee and Shin, 2015) and Instagram texts (Ranaweera and Rajapakse, 2016)
We conducted for the first time a lexicon based sentiment analysis of Instagram post’s comments with a very large data base
Before analyzing the collected data, they had to be preprocessed by a python script. Spam such as chain mails, advertisements or comments with limited content like “first”
Usernames and links in the comments were reduced to a more general term, namely “USERNAME” and “LINK”, without having an impact on the sentiment
Also, the language of the comments was checked and automatically translated to English. Replacing abbreviations with their actual term was not required in this investigation due to repeating characters having emotionality themselves
As a solution, the Natural Language Toolkit (
) for Python programs is used. With a POS-Tagger, the right part-of-speech is recognized, which leads to more correct sentiment word values.
AFINN-111 is used as the emotion lexicon in this sentiment analysis. Two words, “like” and “lie”, need a special treatment, because the lexicon itself cannot deal with ambiguity problems.
AFINN is a list of English words rated for valence with an integer between -5 and +5. Finn Årup Nielsen (2011) labeled the words manually in 2009 to 2011 for sentiment analysis on microblogs
Using SentiStrength (Thelwall et al., 2010) as a model, the Python based sentiment analysis program consists of an emoticon list, an emotion lexicon, a negation lexicon, a lexicon for booster words like “very” or “totally” as well as a lexicon for phrases
The following approach detects the sentiment strength (positive, neutral and negative) within an interval of -5 to +5 (from negative until positive)
Sentiment strength of 0 is considered as a neutral sentiment
The sentiment analysis program operates different steps and assigns the final sentiment. Each comment gets a sentiment for the written text as well as one for the emoticons – those were combined to the final sentiment of the comment
First, the comment gets tokenized into sentences and next the sentences into words. To calculate the text sentiment, each word gets a sentiment value from the emotion lexicon
Words in quotation marks are considered as quotes and assessed as neutral because they often do not reflect the users’ emotionality.
Phrases that are present in the phrase lexicon get the sentiment value of that particular phrase. If the words of the phrase appear in the emotion lexicon as well, only the phrase value is important for the final comment sentiment.
Also, the other lexicons were checked for negotiations (which can change the sentiment of a word from positive to negative, e.g. “not very happy”) and booster words like “very”
All those sentiment values add up to the final text sentiment of a comment.
After the sentiment analysis, each final sentiment of a comment is normalized to an interval of -5 to 5
Additional Sentiment Rules
The following rules are incorporated into SentiStrength (Thelwall et al. 2012a).
The word “miss” is a special case with a positive strength of 2 and a negative strength of 2. It is frequently used to express sadness and love simultaneously, as in the common phrase, “I miss you”.
A spelling correction algorithm
At least two repeated letters
A booster word list
A negating word list
An emoticon list with polarities
Sentences with exclamation marks have a minimum positive strength of 2, unless negative (e.g., “hello Pardeep!!!”).
Two consecutive moderate or strong negative terms
Sentiment terms in CAPITAL letters receive a strength increase of 1.
Two consecutive moderate or strong positive terms with strength at least 3 increase the strength of the second word by 1.
Sentences containing irony
Irony is operationalized by having positive sentiment in conjunction with the presence of a term or phrase from a user-defined list
Some of the rules also need to be modified for non-English versions of SentiStrength and there are some options for this
Supervised and Unsupervised Modes
Prof. Vincent Cunnane
Dr. Niall Corcoran
The amount of analyzed comments in the sentiment analysis is N = 662,883
46% (306,648) of them are neutral, 39% (258,320) are positive and 15% (97,914) are negative
Does a polarizing celebrity like Miley Cyrus get positive or negative response on social media for her behavior