Please enable JavaScript.
Coggle requires JavaScript to display documents.
Practitioner Seesion (Machine Learning and ensemble methods
(session 2 …
Practitioner Seesion
-
Rule System (session 1)
-
Thin Files: When not enough data is available
Ways to adjust for thin files:
- alternate data
- Utility bills
- Psychological profiling (Psychography)
- Social Media Data
- SMS data
- Companies using these alternatives: Lenddo and Beewise, Hello SODA
-
-
IOT: Internet of Things
Real Time Data- Hadoop and Spark
Real Time Analytics
NLP and RegEx is used to convert text unstructured data to structured data
NLP gives an accuracy of 60%
RegEx in this regard is better
-
-
-
Miscellaneous
Feature Reduction: The Space(co-ordinates) do not change
Generally information based technique
Dimension Reduction: Change in the co-ordinate system hence one system maps to a different system
Cross Validation
- Hold out some data for testing
- repeat h hold out method each time and est as many number of times as possible
Give the result in an ensemble.
Distributions
These are a plot of the frequencies
Parameters: for Normal distributions the parameters are Mean and Std Dev.
For t-distribution the Parameters are mean and the degrees of freedom