Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Science and Analytics (Analytics: Measuring Stuff that matters…
Data Science and Analytics
Data Types
Collectible, factual, descriptive, useless in natural form
Structured
Unstructured
Sources
People
Software
Logs / transactions
Databases
Sensors
PIPEDA
legislation to
protect data usage
Personal Information Protection and Electronic Documents Act
Information
Contextual
Aggregated data
Structured data
Visualized
Interpreted
Actionable
Analytics: Measuring Stuff that matters
Inputs raw data collected and stored in data bases
Data computed to produce ratios / other indicators
Outputs often called KPIs (Key performance Indicators)
eg. 70 km/hr instead of 70km and 1 hr
Analytics Maturity table Appendix 2 table 1
Donald Rumasfield'lytics (Appendix 2, table 2)
things we know
Business Intelligence
we know - facts to be continually checked
we don't know - question we can answer by reporting and should automate
things we don't know
Predictive Analytics
we know - are intuitions we should try to quantify
we don't know - are explorations where unfair adv and interesting epiphanies lie
Metrics
Qualitative v/s quantitative
Vanity v/s actionable
Exploratory v/s reporting
leading v/s lagging
Correlated v/s causal
For more detail
http://onstartups.com/tabid/3339/bid/96738/Measuring-What-Matters-How-To-Pick-A-Good-Metric.aspx
Tests
A/B - Changing one thing then measuring result
Segment - Cross sectional analysis of all ppl divided by an attribute (ie age)
Multi-variate Analysis - Changing several things at once to check which correlates with a result
Cohort - Comparison of a similar group along a timeline
Data Science: Using measurement to Predict stuff that matters
Collect data from various systems
Processes by industry relevant algorithm
Outputs include predicted performance
Methodology
OODA Loop
Scientific Method
The cyclical system
Explore data - Visualize Trends, removing outliers, making normal ...
Preliminary Analysis (mean, median ...)
Statistical Data Analysis (variance b/w diff test, combining data where possible, T test ...)
Model Development
Model Testing
Operationalize (start applying
Start over since last step produces more data
What is it really - code that can learn to diff b/w data - form of trainable ML / AI
Requires
Training set
Test Set
Automation
Appendix 2, table 2 for "underfitting and overfitting problems"
Visualizations - Important and to be appropriate, tell a story
Lean Analytics
Predictive Analytics and Biz systems
Iterative Process and Decision making
Tools
Data cubes, Data bases
Programming languages
Data Storage
flatfile eg csv
Relational - multiple tables, connect by a key
In-memory - data stored within a processing device
Data warehouse / mart / store
Tiered Storage
Tier 1 - data currently being used
2 - not as often accessed at 1 (1/2 weeks)
3 - Archived - longer to retrieve (cold data)
Spinning v/s Solid state - CD v/s USB
USB faster, in CD - closer to center Tier 1 data
If you can't measure it, you can't manage it
If you can't measure it, you can't improve it.
Peter Drucker