Please enable JavaScript.
Coggle requires JavaScript to display documents.
Reading for 10/18; chapters 2-7 (Chapter 2- Machine Learning (8 criteria…
Reading for 10/18; chapters 2-7
Chapter 2- Machine Learning
ML Life Cycle
not a linear process
define project objectives
Acquire and Explore Data
Model Data
Interpret and communicate
implement, document, maintain
Auto ML
any machine learning system that automates the repetitive tasks required for effective ML
What is it?
true potential is that it enables the democratization of data science; makes available and understandable to most, and makes subject matter expertise more important because it may now be faster to train an expert in use of AutoML that it is to train a data scientist to understand the subject matter at hand
what it's not
IS NOT automatic ML
several decisions that must be made by analyst and a certain skillset for evaluating the results of applying ml to any data set
tools and platforms
two types
context specific tools
implemented within another system or for a specific purpose
general platforms
designed for general purpose ML; splits into two types
open source
tools tend to be developed by and for computer and data scientists and generally require knowledge of programming languages (python, R)
commercial
provided by a commercial vendor; presumably for price; several also require coding skills (google prediction API, Amazon ML)
8 criteria for AutoML excellence
accuracy; most important, w/o this there's no reason for AutoML
productivity
ease of use
understanding and learning
resource availability
process transparency
generalizable across context
recommend actions
Chapter 3- Specify Business Problem
problem and opportunity are equally important
by specifying problem, can evaluate options in a precise manner
should we proceed to address problem
is there a better problem to invest time and resources in
is ground truth for problem available
once we have requisite data, project must be described to be shared w/stakeholders
any proposed project evaluated against these criteria
is the project statement presented in the language of business
does the project statement specify actions that should result from the project
how could solving this problem impact bottom line
Chapter 4- acquire subject matter expertise
importance of subject matter expertise
will not be capable of providing complete insights; constitutes deep experience in specific domain; important for early ID of potential obstacles or opportunites
helps to set realistic expectations for model performance; no model will ever be completely wrong or completely right- typically it will perform in the middle
expect sme to suggest ideas for data collection- to know where relevant data is including external data; should suggest alternatives as well
if there is no SME available....
discuss the domain to be modeled w/data science colleagues, interview sme's, read trade journals, read internet sources, read definitions
Chapter 5- decide on unit of analysis
what is a unit of analysis
is the what, who, where, and when of analysis
for each unit of analysis, there can be numerous outcomes that we might want to predict
how to determine unit of analysis
think about what the prediction target is; Lending club ex.
Chapter 6- Define Prediction Target
what is a prediction target
the behavior of a thing we need to know about the future
how is it important for ML
without a target, there is no way for humans or machines to learn what associations drive an outcome
types
classification- predicts the category to which a new case belongs
regression- predict the target numeric values; can be simplified into 'buckets'
Chapter 7- Success, risk and continuation
identify success criteria
start small with clear questions and goals
foresee risks
difficult to calculate
to get at risks, need to be creative and play devils advocate
a model that makes bad recommendations is far worse than no model at all
model risks may relate to model being insufficiently predictive or simple mistakes like target leakage features in the model; or
decide whether to continue
after weighing the risks against the rewards, evaluate whether to move forward with your project