Profile characteristics of fake
Twitter accounts

Information

Meta

Result

Goals

Year

Author

Supraja Gurajala,

Joshua S White

Brian Hudson

Brian R Voter

Jeanna N Matthews

2016

In online social networks, the audience size commanded by an organization or an individual is a critical measure of that
entity’s popularity

Problem

Measure popularity of users or exploit knowledge about their audience are complicated by the presence of fake profiles

Combination of a pattern-matching algorithm on screen names and an analysis of update times, a reasonable number (0.1% of total users out of 62 million) of highly reliable fake user accounts were identified

Analysis of profile creation times and URLs of these fake accounts revealed their distinct behavior relative to a ground truth data set

The characteristics of friends and followers of users in the two data sets further revealed the very different nature of the two groups

An analysis of the temporal evolution of accounts over 2 years showed that the friends-to-followers ratio increased over time for fake profiles while they decreased for ground truth users

Fake account

The presence of fake profiles (or Sybils/Socialbots) generated by cyber-opportunists (or cyber-criminals) that are nearly indistinguishable from real profiles complicates authentication of user accounts (Douceur, 2002).

Some fake accounts are created to mimic a specific person’s account while others are created simply as a general account to serve as a fake follower.

Many are created to serve as ‘‘followers for hire’’ and are used for inflating follower numbers for other accounts. The presence of fake followers can: affect the popularity rating of individuals or organizations measured based on follower number count (Kwak et al., 2010); alter the characteristics of the audience (Stringhini et al., 2013); or create a legitimacy problem for individuals/organizations (Parmelee and Bichard, 2011).

Related Works

Twitter-specific approaches to identify
spammers

Use of tweet/ tweeter characteristics such as ‘‘reputation score’’, ‘‘number of duplicate Tweets’’, and ‘‘number of URLs’’ (Wang, 2010)

Comparison of tweet links (URLs) to publicly blacklisted URLs/domains (Grier et al., 2010).

Detection based on tweet-content (e.g. ‘‘number of hashtags per word of each tweet, number of followers and number of followings’’ (Benevenuto et al., 2010)

Thomas et al. (2013) used a multivariable pattern-recognition approach based on userprofile- name, screen-name, and email parameters.

Most recent approaches to detect fake accounts in Twitter and other OSNs have focused on detection of clustered fake accounts based on their activity patterns.

Objective

Examine the possibility of identifying fake accounts merely based on profile information to enable early classification of clustered fake accounts in Twitter

Method

We use a user profile-pattern detection based approach with the inclusion of user activity time stamp information, to develop a new process for detection of fake profiles with high reliability

Crawling

Twitter API

Crawler obtained 33 different attributes for each
Twitter profile, then they are analyzed patterns among combinations of these attributes to identify a highly reliable core set of fake profiles, which provided the basis for identifying key distinguishing characteristics of fake accounts based on their publicly available profile information

The key attributes that were either user-selected or varied with account-usage

id

followers_count

friends_count

verified

created_at

description

location

updated

profile_image_url

screen_name

Algorithm

1.Filter by name

  1. Filter by name, description,
    and location
  1. Filter by screen name patterns

Shannon entropy-based analysis of the screen names in each group was conducted

steps

One of the screen names in a group was selected as a base screen name and its Shannon entropy was determined

Screen name in the group was concatenated with the selected base name and the Shannon entropy of the concatenated string was calculated

If the entropy of the concatenated string was greater than that of the base name by a threshold value (0.1), then the concatenated screen name was added to a collection list

Repeated with all screen names in the group

All screen names that were not accumulated in the collection list associated with the first screen name were then re-grouped and analyzed with the same procedure until all screen names were either placed in a collection list or identified as not being a part of any collection.

A regular expression pattern (more than four characters long) search was then conducted within each collection list to obtain any pattern(s) that might exist in their screen names

Screen names associated with a pattern formed a ‘‘pattern list’’. This procedure was able to identify and group mass fake profiles with screen names that were seemingly generated automatically

FINDING FAKE ACCOUNTS

The designed highlights are differentiated into 3 different gatherings: portraying the data character of the accounts, the connections between record and others and lastly the record’s behaviour and messages.

The ratio of number of followers-to-friends for ground truth users was 1, consistent with past observations, while the fake profiles had a median ratio 30, indicating that the fake users we identified were primarily focused on gathering friends

Fake profiles (or their operators) send requests to ‘‘follow’’ or ‘‘friend’’ OSN users and these requests are often accepted (80% probability when there are several common ‘‘friends’’) by unsuspecting users (Yang et al., 2014).

There have been several detection strategies developed to tackle the problem of Spam in social networks. These techniques have largely relied on using a graph theory approach to characterize social graph properties of Sybil accounts (Danezis and Mittal, 2009).

In response, ‘‘spammers’’ have worked to integrate Sybils into authentic user communities by creating accounts with full profiles and background information similar to authentic users (Yang et al., 2011).

Similarly, Clark et al. (2016) used a classification scheme based on natural language trained on organic users to then identify messages from automated accounts and detect fake accounts. Such activity-based techniques can identify fake accounts after they establish their tweet-history.

different approach to identify fake accounts is based on analysis of a combination of features including tweets and user profiles for early and efficient identification of fake profiles (Xiao et al., 2015).

El Azab et al. (2015) showed that fake accounts on Twitter could be identified with high efficiency based on an established minimum set of factors, including number of followers, availability of geo-information, used a hashtag in a tweet,

Xiao et al. (2015) used a supervised machine-learning pipeline to compare text frequencies in features such as name, email address, etc. to classify LinkedIn accounts as either malicious or legitimate.

What is

Shannon entrophy