Please enable JavaScript.

Coggle requires JavaScript to display documents.

Worth its Weight in Likes: Towards Detecting Fake Likes on Instagram…

- - - - Sources
        
        paid web-services or apps, trading platforms where a user participates in a giving likes in exchange for likes,
        
        and bots which are triggered based on hashtags
      - We assume that if a video has received likes, but has zero views, then the like instances are fake, because they were generated without properly seeing the content.
      - We capture 16,448 such like instances (information about the liker, post, and source user), and add it to FakeLike_data
    - - This gives us a sample of 1 million Instagram users, from which we take a smaller subset of users and extract their posts, and likes on each of those posts. In this manner, we obtain a dataset of 134,669 like instances
      - Hard to obtain a true positive dataset of genuine likes. Therefore, instead we collect a much larger random set of like instances to draw comparison with fake likes, and to use as negative class to build a machine learning model to identify fake likes.
      - Since Instagram does not provide a direct way to sample random users/posts, we obtain a seed set of Instagram users, 2 and extract their follower and followee connections in a breadth-first-search manner
      - this sample is much larger (more than 8 times) than the fake like instance dataset. Therefore, despite the noise, we assume that predominantly, the like instances in RandLike_data would be genuine
      - Noisy dataset is one of our current limitations, but with a clean negative dataset, our results showing differences between fake and other likes, and supervised learning based identification of fake likes would only improve.
  - - - H1: A liker L is more likely to genuinely like S’s post if L
        is a follower of S
      - H2: A liker L is more likely to genuinely like S’s post if
        L is a follower of S’s followers
      - H1: L will receive the content posted by S in her home feed, and hence there is a higher chance of L genuinely liking that post. In addition, if L is following S, we can assume that L is interested in S’s content
      - H2: Instagram also lets the users follow the activities of the users being followed. Therefore, a liker L can also come across the liked post p if it is liked by one of the users which L follows. We also consider such an instance to indicate a higher level of confidence in the genuineness of the like instance
      - For fake like engagements, there are significantly less proportion of likers which are followers of the poster. In case of fake engagements, only 16.8% of likers of a post are followers of the poster, as compared to a much higher fraction of 39.1% likers being followers in case of random like engagements
    - - We observe that legitimate likers keep coming back to the same poster
      - 90% posters with fake likes get 7% repeated likers on their posts, as compared to the same fraction of posters with other likes getting 42% repeated likes
    - - Hashtags have been shown to play an important role in Instagram in spreading the reach of posts and attracting more likes
      - A user S is more likely to attract fake likes if she uses link farming hashtags in her posts
      - We curate a list of 112 such hashtags and find that 20.8% posts with fake likes have at least one link farming hashtag as compared to 1.8% posts with random likes
    - - User L will have a higher chance of genuinely liking S’s photo if S is an ‘influential’ user or a celebrity
      - We use the Instagram verification badge as a proxy for celebrity users
    - - User S with genuine likes will have topical hashtags in their posts
      - Two-step process to detect topical hashtags
        
        First filter out all link farming hashtags as well as popular
        non-topical hashtags
        
        Next, we segment these hashtags and use Wikifier to see what proportion of hashtags pertain to a topic
      - Topical hashtags used in a post, instead of occasional (#throwbackthursday, #ootd), trending (#mayweather) or link farming hashtags (#like4like)
    - - A user L will have a higher chance of genuinely liking S’s post if L and S share interests
      - Topic Matching
        
        To match topics we utilize word2vec similarities [18] between two tuples of interests
      - Affinity
        
        We consider topical affinity as one of the distinguishing
        features to identify fake liking engagement
      - We extract a user’s interest profile from her bio, and by converting the post image into relevant text using Densecap
      - Topic Extraction
        
        Topics are inferred from user's bio and posts
        
        We infer topics from textual sources such as bio and post captions using Wikification
        
        We leverage using Densecap captioning [14] to obtain meaningful captions. Wikification is applied on these captions to extract fine-grained topics
      - This metric is not commutative and therefore penalizes likers with
        a very wide variety of interests, which is an indicator of suspicious
        behavior
      - We found that 60% of fake likers have an affinity value of 0.475, as compared to 0.58 affinity for same proportion of random set of likers
  - - - This proportion ensures that any machine learning model trained on such a dataset can perform well ‘in-the-wild’ where the ratio of likes would be highly imbalanced
      - While the actual ratio of fake to genuine likes in Instagram is unknown, based on previous literature on spam detection [4], we maintain a ratio of roughly 1:8
      - Therefore, we obtain the aforementioned features from FakeLike_data and RandLike_data, and train a supervised model on these features with fake likes as the positive class
    - - Logistic Regression,
      - Random Forest,
      - SVM (RBF kernel),
      - AdaBoost (with Random Forest as base initiator), and
      - XGBoost
    - - We use Badri et al.’s method to detect fake likers on Facebook. As the source code was unavailable, we implemented this method on our own based on the features detailed in the paper
      - Supervised method for the detection
        of fake likes based on
        
        Profile
        
        length of biography
        
        lifespan of account
        
        number of bidirectional connections
        
        Posting activity
        
        Average number total posts
        
        Maximum posts per day
        
        Skewness of posting
        
        Page liking
        
        Category entropy of pages liked
        
        Proportion of verified pages
        
        Social attention
        of the liker
        
        Average number of likes
        
        Comments received
        
        Discarded Features
        
        Proportion of shared photos
        
        Average number of shares received, since there is no concept of sharing posts on Instagram
    - - We use Precision, Recall and Area under the ROC curve (AUC) to measure the performance of all models in detecting fake likes
      - We achieve highest performance using the MLP with an average precision of 83% and recall of 81% (AUC of 89%) in detecting fake likes
      - However, we achieve highest performance using the MLP with an average precision of 83% and recall of 81% (AUC of 89%) in detecting fake likes
      - We believe our system can detect fake likes given by genuine looking entities
      - Error Analysis
        
        We find that in 27 fake like instances, the likers were followers of the poster, potentially leading our model to misclassify such like instances as genuine
        
        We randomly sample 100 undetected fake likes and manually inspect them.
        
        It suggests that some posters have fake followers and the fake likes are from such followers, something our current methodology is unable to capture.
        
        However, our approach can be modularly applied in a cascade, after detecting fake followers using previous techniques [6].
        
        Furthermore, we found that 61 likers had a high topical interest overlap with the posts they had liked.
        
        A more thorough analysis showed that this was happening due to small set of interests (just one or two) of the liker, which results in high affinity value