CHPT 2: AUTOMATING MACHINE LEARNING
define project objectives
acquire & explore data
interpret & communicate
implement, document & maintain
not altogether a linear process
Air BnB uses automated machine learning - customer lifetime value models
allows them to make decisions about individual hosts as well as aggregated markets
Areas where repetitive tasks negatively impact productivity of data scientists & auto ML had a positive impact on productivity
exploratory data analysis
algorithm selection and hyper-parameter tuning
Auto ML is NOT automatic ML
"subject matter expert must decide which problems are worth solving, determine which ideas are worthy of testing, and develop a solid understanding of common pitfalls and model evaluation skills"
8 criteria essential for AutoML to have significant impact
accuracy, productivity, ease of use, understanding & learning, resource availability, process transparency, generalizable across contexts, recommend actions
CHAPTER 4: ACQUIRE SUBJECT MATTER EXPERTISE
provides knowledge of features and what they mean
setting realistic expectations for model performance
suggest ideas for data collection
If subject matter expertise is NOT avalaible
"discussing the domain to be modeled with data science colleagues, interviewing SMEs, reading trade journals, reading internet sources on the subject, or even reading the definitions of available features found in data dictionaries around the web"
indication of obstacles and opportunities
CHAPTER 5: DECIDE ON UNIT OF ANALYSIS
what, who, where, and when of our analysis
identify prediction target
Lending club: loan itself is the unit of analysis
lean on your subject matter expert and work with him or her to share knowledge of the problem context - figure out unit of analysis
CHPT 6: DEFINE PREDICTION TARGET
behavior of a “thing” we need to know about the future
Lending club: "loan is bad" is prediction target
without a target - no way for humans or machines to learn what associations drive an outcome
kinds of targets
classification- predicts the category to which a new case belong
regression- to predict the target numeric values
CHPT 7: SUCCESS, RISK & CONTINUATION
Who will use the model?
Is management on board with the project?
Can the model drivers be visualized?
How much value can the model produce?
"management support is especially important at the end of a project when decisions are made about model implementation into the information workflow"
model being insufficiently predictive
target leakage features in the model - models that are too predictive
data may be missing or be of insufficient quality
model risks, ethical risks, cultural risks, and environmental risks
ethical risk = privacy
decide whether to continue
weighing the risks against the rewards
evaluate whether to move forward with your project
CHPT 3: SPECIFY BUSINESS PROBLEM
Any proposed project should be evaluated against three criteria
Is the project statement presented in the language of business?
Does the project statement specify actions that should result from the
How could solving this problem impact the bottom line?
includes specifics - # of customers affected, costs, etc.