Please enable JavaScript.
Coggle requires JavaScript to display documents.
CH 3 (probability estimation (frequency-based estimate (use instance…
CH 3
probability estimation
underfitting
each leaf needs estimate of probability of membership in class
overfitting
frequency-based estimate
use instance counts at each leave to compute class probability
models, induction and prediction
predictive
forecast of future event
induction
creates the model from the data
input data = training/labeled data
value of target variable is known
formula for estimating the unknown value of interest
deduction
creates model from general rules and specific facts
creating other specific facts
descriptive
gain insight into underlying phenomenon or process
less accurate preferred if easier to understand :
judged on intelligibility
analyzing subset of data
tried and true method
use informative attributes to select informative subset
attribute selection before data driven modeling increase accuracy of model
Supervised Segmentation
selecting inform attribute
compications
not every attribute is binary
atrributes very rarely split a group perfectly
some attributes take on numeral values
formula that evaluates how good each attributes has spli
with respect to target variable
based on purity measures
most common splitting criteria
information gain
based on purity measure = entopy
Entropy can be used to measure
improvement (decrease) in
entropy over segmentation
measure of disorder
Disorder corresponds to how impure
the segment is w/ respect to the target variable
Just because pure doesn't mean
shouldn't be split into two large
relatively pure subsets
how inform are attributes about target
numeric variable
categorize the umbers
supervised segmentation with tree structured model
First node = test
of an attribute
Followed by branches which are
distinct values of attribute
leafs segment the data
Data with unknown classification can be
predicted by finding segment
target variable
helps find informative attributes
reduce unceratinty
provides insight into business problems
visualizing segmentations
decisoin trees
decision lines and hyperplane
decision line equals line separating regions
hyperplane equals separating surface
trees as sets of rules
if then rules
larger models people prefer tree or rule set