Larsen - Ch. 2-7 (p.30-70) (Ch2 (. ... …
Larsen - Ch. 2-7 (p.30-70)
Machine Learning Life Cycle
Define Project Objectives
Acquire and explore data
Interpret & communicate
Implement, Document, & maintain
Not a linear process - Future discoveries or interpretations may result in an an upset with the original objectives
Key productivity increasers
Exploratory data analysis - Return all stats + relationships
Feature engineering - Clean the data, deal with NULLS, etc
Choose an accurate algorithm - Then parameter tuning
Model diagnostics - Evaluate the top models and probability cutoffs
This is not automatic analysis
Worthy questions and model evaluation skills are required
Python is used for general ML platforms
. ... 8 Criteria for AutoML Excellence
Understanding and Learning
Explain your findings, front and back
Instill trust in the system - what manipulated/cleaned the data?
Ease of Use
Generalizable across Contexts
Sample down, handle any amount of data, across any field
Accurate and quick
Program context in so a decision can be made
Critical. This must have smart selection and ranking algorithms
Integrate with already fetched/obtainable data
UNIT OF ANALYSIS
The WWWW[Nix the Why and the How]
Who was readmitted into hospice?
Where will the crime take place?
What ad did the user click on?
When will the machine break down?
Examine the prediction target
Discover the logical unit of analysis - what the target goal hinges on
Start with a Business Problem
"I want to predict....."
Grounded in requisite data, backed and approved by stakeholders
How can solving this issue impact the bottom line? It's all about that profit
Does the project statement specify actions that should result from the project?
Is the project statement presented in the language of business?
Subject Matter Expertise - And why it's so crucial
The business problem can be about anything - any subject in biz
Having knowledge about [accounting/supply chain/medicine] can prevent unseen obstacles
Or can lead to logical problems being overcome without ML - thus cleaning the data/results
Also you want to educate yourself on the company/workings of the sector
Especially for when it comes time to present the data!
"Hospice care" example
An expert can also assist with data collection
No model will be on the nose
Finalized column that shows the predicted target behavior
Interesting, check the data date, time
since collection for the set
Perhaps omit incomplete rows - EX: Loan payback
Targets are required
As this is how computers (and humans!) 'learn' the data through associations
Predicts the category an entry will be placed into
Target numerical values
How many years were they together?
In management supporting the project?
Can the model drivers be visualized?
Who will use the model?
How much value can the model produce?
Value can only be found after data has been cleaned and a MODEL has been considered, evaluated
Try playing Devil's advocate
Consider facing issues in obtaining data
or any biases at play during the collection
Then utilize AutoML
Be wary of black swan data changes
"Flash crash" in the May 2010 market
And then always consider if the project should rightfully move forward