Profile characteristics of fake Twitter accounts (Information (Method…
Profile characteristics of fake
In online social networks, the audience size commanded by an organization or an individual is a critical measure of that
Combination of a pattern-matching algorithm on screen names and an analysis of update times, a reasonable number (0.1% of total users out of 62 million) of highly reliable fake user accounts were identified
Analysis of profile creation times and URLs of these fake accounts revealed their distinct behavior relative to a ground truth data set
The characteristics of friends and followers of users in the two data sets further revealed the very different nature of the two groups
An analysis of the temporal evolution of accounts over 2 years showed that the friends-to-followers ratio increased over time for fake profiles while they decreased for ground truth users
The presence of fake profiles (or Sybils/Socialbots) generated by cyber-opportunists (or cyber-criminals) that are nearly indistinguishable from real profiles complicates authentication of user accounts (Douceur, 2002).
Some fake accounts are created to mimic a specific person’s account while others are created simply as a general account to serve as a fake follower.
Many are created to serve as ‘‘followers for hire’’ and are used for inflating follower numbers for other accounts. The presence of fake followers can: affect the popularity rating of individuals or organizations measured based on follower number count (Kwak et al., 2010); alter the characteristics of the audience (Stringhini et al., 2013); or create a legitimacy problem for individuals/organizations (Parmelee and Bichard, 2011).
Fake profiles (or their operators) send requests to ‘‘follow’’ or ‘‘friend’’ OSN users and these requests are often accepted (80% probability when there are several common ‘‘friends’’) by unsuspecting users (Yang et al., 2014).
Twitter-specific approaches to identify
Use of tweet/ tweeter characteristics such as ‘‘reputation score’’, ‘‘number of duplicate Tweets’’, and ‘‘number of URLs’’ (Wang, 2010)
Comparison of tweet links (URLs) to publicly blacklisted URLs/domains (Grier et al., 2010).
Detection based on tweet-content (e.g. ‘‘number of hashtags per word of each tweet, number of followers and number of followings’’ (Benevenuto et al., 2010)
Thomas et al. (2013) used a multivariable pattern-recognition approach based on userprofile- name, screen-name, and email parameters.
Most recent approaches to detect fake accounts in Twitter and other OSNs have focused on detection of clustered fake accounts based on their activity patterns.
Similarly, Clark et al. (2016) used a classification scheme based on natural language trained on organic users to then identify messages from automated accounts and detect fake accounts. Such activity-based techniques can identify fake accounts after they establish their tweet-history.
different approach to identify fake accounts is based on analysis of a combination of features including tweets and user profiles for early and efficient identification of fake profiles (Xiao et al., 2015).
El Azab et al. (2015) showed that fake accounts on Twitter could be identified with high efficiency based on an established minimum set of factors, including number of followers, availability of geo-information, used a hashtag in a tweet,
There have been several detection strategies developed to tackle the problem of Spam in social networks. These techniques have largely relied on using a graph theory approach to characterize social graph properties of Sybil accounts (Danezis and Mittal, 2009).
In response, ‘‘spammers’’ have worked to integrate Sybils into authentic user communities by creating accounts with full profiles and background information similar to authentic users (Yang et al., 2011).
Xiao et al. (2015) used a supervised machine-learning pipeline to compare text frequencies in features such as name, email address, etc. to classify LinkedIn accounts as either malicious or legitimate.
We use a user profile-pattern detection based approach with the inclusion of user activity time stamp information, to develop a new process for detection of fake profiles with high reliability
Crawler obtained 33 different attributes for each
Twitter profile, then they are analyzed patterns among combinations of these attributes to identify a highly reliable core set of fake profiles, which provided the basis for identifying key distinguishing characteristics of fake accounts based on their publicly available profile information
The key attributes that were either user-selected or varied with account-usage
1.Filter by name
Filter by name, description,
Filter by screen name patterns
Shannon entropy-based analysis of the screen names in each group was conducted
One of the screen names in a group was selected as a base screen name and its Shannon entropy was determined
Screen name in the group was concatenated with the selected base name and the Shannon entropy of the concatenated string was calculated
If the entropy of the concatenated string was greater than that of the base name by a threshold value (0.1), then the concatenated screen name was added to a collection list
Repeated with all screen names in the group
All screen names that were not accumulated in the collection list associated with the first screen name were then re-grouped and analyzed with the same procedure until all screen names were either placed in a collection list or identified as not being a part of any collection.
A regular expression pattern (more than four characters long) search was then conducted within each collection list to obtain any pattern(s) that might exist in their screen names
Screen names associated with a pattern formed a ‘‘pattern list’’. This procedure was able to identify and group mass fake profiles with screen names that were seemingly generated automatically
FINDING FAKE ACCOUNTS
The designed highlights are differentiated into 3 different gatherings: portraying the data character of the accounts, the connections between record and others and lastly the record’s behaviour and messages.
The ratio of number of followers-to-friends for ground truth users was 1, consistent with past observations, while the fake profiles had a median ratio 30, indicating that the fake users we identified were primarily focused on gathering friends
Joshua S White
Brian R Voter
Jeanna N Matthews
Measure popularity of users or exploit knowledge about their audience are complicated by the presence of fake profiles
Examine the possibility of identifying fake accounts merely based on profile information to enable early classification of clustered fake accounts in Twitter