Please enable JavaScript.
Coggle requires JavaScript to display documents.
Big Data and Social Science -book review- (Chapter 6: Machine Learning…
Big Data and Social Science
-book review-
Chapter 6: Machine Learning
Originates from computer science (machine learning)
Became popular because it was suddenly popular to automate rule based systems
Commercial applications
speech recognition
autonomous cars
Fraud detection
Personalized ads
Face recognition
Process
problem and goal
formulate machine learning problem
data exploration and preparation
feature engineering
method selection
Unsupervised learning
: understanding
natural clusters and patters in the data
clustering
group data points that are smilar
principal components analysis
finding patterns and structure in data
association rules
goal is to find associations of items that occur together more often than you would randomly expect
Supervised learning
: goal is to train an algorithm in an existing
data points to build a model capable of making predictions and minimizing generalization errors
train the model
use model to score new data
Classification techniques
support vector machines: best performing classification method today 2017
decision trees
Minus: small changes in data can result in very different trees
Plus: easy to interpret a tree
k-nearest neighbor
random forests: one of the most accurate methods
stacking: used for regression and classification
Ensemble (combined) techniques
boosting
bagging: good with decision trees
neural networks and deep learning: lots of research happening here but less applicability in the social sciences
evaluation
deployment
Applicability for the Social Sciences
Prediction: machine learning offers better prediction methods and methodology over traditional statistics
Better text analysis
Adaptive surveys: type job occupation and the survey suggest which category you might be in
Estimating heterogeneous treatment effect (method to establish causal inference)
Variable selection: when working with large amounts of data
Further reading
semi-supervised learning
active learning
reinforcement learning
streaming data
anomaly detection
recommender systems
Software
Python
scikit-learn
pandas
R
cloud based
AzureML
Amazon ML
Commercial
IBM Modeler
SAS Enterprise Miner
Matlab