Please enable JavaScript.
Coggle requires JavaScript to display documents.
Worth its Weight in Likes: Towards Detecting Fake Likes on Instagram…
Worth its Weight in Likes: Towards Detecting Fake Likes on Instagram
Meta
Year
2018
Author
Sen, I., Aggarwal
A., Mian,
S., Singh,
S., Kumaraguru, P
Datta, A
Information
We enumerate the potential factors which contribute towards a genuine like on Instagram
Based on our analysis of liking behaviour, we build an automated mechanism to detect fake likes on Instagram which achieves a high precision of 83.5%
In this study, we instead focus on inorganic engagement received by a user.
Previous studies aiming to detect fake liking behaviour, assume that if a user has given one or two fake likes, all her likes are fake
We propose that the true reach / social-worth of the user should be determined by canceling out the effect of fake engagement which she receives, and should largely depend only on the organic engagement
Our goal is to identify the ingenuity of likes by determining user’s intention of liking a post
Given a liker L, who likes a specific post p of a poster S – Find out the features of L, p and S, to determine the probability of liker L genuinely liking a post p
Related Works
User Behavior
While not explored as widely as detection of fake entities, fake engagement on OSNs has been previously studied on Facebook, Twitter, and Youtube
Data
Fake Like Instances
Sources
paid web-services or apps, trading platforms where a user participates in a giving likes in exchange for likes,
and bots which are triggered based on hashtags
We assume that if a video has received likes, but has zero views, then the like instances are fake, because they were generated without properly seeing the content.
We capture 16,448 such like instances (information about the liker, post, and source user), and add it to FakeLike_data
Random Like Instances
This gives us a sample of 1 million Instagram users, from which we take a smaller subset of users and extract their posts, and likes on each of those posts. In this manner, we obtain a dataset of 134,669 like instances
Hard to obtain a true positive dataset of genuine likes. Therefore, instead we collect a much larger random set of like instances to draw comparison with fake likes, and to use as negative class to build a machine learning model to identify fake likes.
Since Instagram does not provide a direct way to sample random users/posts, we obtain a seed set of Instagram users, 2 and extract their follower and followee connections in a breadth-first-search manner
this sample is much larger (more than 8 times) than the fake like instance dataset. Therefore, despite the noise, we assume that predominantly, the like instances in RandLike_data would be genuine
Noisy dataset is one of our current limitations, but with a clean negative dataset, our results showing differences between fake and other likes, and supervised learning based identification of fake likes would only improve.
Analysis
Given a poster S whose post p has been liked by liker L, we define a like instance as the tuple (L, p, S)
While it is virtually impossible to know why a user might like a post
It is possible to understand how the user could have come across the post, which is a non-trivial prerequisite for liking
Like instance is designed to contain post properties to ensure that a liker is evaluated on the basis of individual posts she likes
Network Effects
H1: A liker L is more likely to genuinely like S’s post if L
is a follower of S
H2: A liker L is more likely to genuinely like S’s post if
L is a follower of S’s followers
H1: L will receive the content posted by S in her home feed, and hence there is a higher chance of L genuinely liking that post. In addition, if L is following S, we can assume that L is interested in S’s content
H2: Instagram also lets the users follow the activities of the users being followed. Therefore, a liker L can also come across the liked post p if it is liked by one of the users which L follows. We also consider such an instance to indicate a higher level of confidence in the genuineness of the like instance
For fake like engagements, there are significantly less proportion of likers which are followers of the poster. In case of fake engagements, only 16.8% of likers of a post are followers of the poster, as compared to a much higher fraction of 39.1% likers being followers in case of random like engagements
Liking Frequency
We observe that legitimate likers keep coming back to the same poster
90% posters with fake likes get 7% repeated likers on their posts, as compared to the same fraction of posters with other likes getting 42% repeated likes
Link Farming Hashtags to get Fake Likes
Hashtags have been shown to play an important role in Instagram in spreading the reach of posts and attracting more likes
A user S is more likely to attract fake likes if she uses link farming hashtags in her posts
We curate a list of 112 such hashtags and find that 20.8% posts with fake likes have at least one link farming hashtag as compared to 1.8% posts with random likes
Influential Poster
User L will have a higher chance of genuinely liking S’s photo if S is an ‘influential’ user or a celebrity
We use the Instagram verification badge as a proxy for celebrity users
Topical Hashtags
User S with genuine likes will have topical hashtags in their posts
Two-step process to detect topical hashtags
First filter out all link farming hashtags as well as popular
non-topical hashtags
Next, we segment these hashtags and use Wikifier to see what proportion of hashtags pertain to a topic
Topical hashtags used in a post, instead of occasional (#throwbackthursday, #ootd), trending (#mayweather) or link farming hashtags (#like4like)
Interest Overlap
A user L will have a higher chance of genuinely liking S’s post if L and S share interests
Topic Matching
To match topics we utilize word2vec similarities [18] between two tuples of interests
Affinity
We consider topical affinity as one of the distinguishing
features to identify fake liking engagement
We extract a user’s interest profile from her bio, and by converting the post image into relevant text using Densecap
Topic Extraction
Topics are inferred from user's bio and posts
We infer topics from textual sources such as bio and post captions using Wikification
We leverage using Densecap captioning [14] to obtain meaningful captions. Wikification is applied on these captions to extract fine-grained topics
This metric is not commutative and therefore penalizes likers with
a very wide variety of interests, which is an indicator of suspicious
behavior
We found that 60% of fake likers have an affinity value of 0.475, as compared to 0.58 affinity for same proportion of random set of likers
DETECTING FAKE LIKES
Building a Classification Model
This proportion ensures that any machine learning model trained on such a dataset can perform well ‘in-the-wild’ where the ratio of likes would be highly imbalanced
While the actual ratio of fake to genuine likes in Instagram is unknown, based on previous literature on spam detection [4], we maintain a ratio of roughly 1:8
Therefore, we obtain the aforementioned features from FakeLike_data and RandLike_data, and train a supervised model on these features with fake likes as the positive class
Classification algorithms
Logistic Regression,
Random Forest,
SVM (RBF kernel),
AdaBoost (with Random Forest as base initiator), and
XGBoost
Baseline
We use Badri et al.’s method to detect fake likers on Facebook. As the source code was unavailable, we implemented this method on our own based on the features detailed in the paper
Supervised method for the detection
of fake likes based on
Profile
length of biography
lifespan of account
number of bidirectional connections
Posting activity
Average number total posts
Maximum posts per day
Skewness of posting
Page liking
Category entropy of pages liked
Proportion of verified pages
Social attention
of the liker
Average number of likes
Comments received
Discarded Features
Proportion of shared photos
Average number of shares received, since there is no concept of sharing posts on Instagram
Experimental Results
We use Precision, Recall and Area under the ROC curve (AUC) to measure the performance of all models in detecting fake likes
We achieve highest performance using the MLP with an average precision of 83% and recall of 81% (AUC of 89%) in detecting fake likes
However, we achieve highest performance using the MLP with an average precision of 83% and recall of 81% (AUC of 89%) in detecting fake likes
We believe our system can detect fake likes given by genuine looking entities
Error Analysis
We find that in 27 fake like instances, the likers were followers of the poster, potentially leading our model to misclassify such like instances as genuine
We randomly sample 100 undetected fake likes and manually inspect them.
It suggests that some posters have fake followers and the fake likes are from such followers, something our current methodology is unable to capture.
However, our approach can be modularly applied in a cascade, after detecting fake followers using previous techniques [6].
Furthermore, we found that 61 likers had a high topical interest overlap with the posts they had liked.
A more thorough analysis showed that this was happening due to small set of interests (just one or two) of the liker, which results in high affinity value
Background
Problems
This emerging market has led to users artificially bolstering the likes they get to project an inflated social worth
Objectives
Detecting Fake Likes on Instagram
Serves an important first step in reducing the effect of fake likes on Instagram influencer market
The number of likes on posts serves as a proxy for social reputation of the users
In some cases, social media influencers with an extensive reach are compensated by marketers to promote products
Result
Limitation
Our affinity metric which has unpredictable behaviour when user interest tuples are small
For collecting our ground truth data, we restrict ourselves to videos with likes but no views. In future, we plan to explore other sources such as trading web services and mobile apps