Please enable JavaScript.
Coggle requires JavaScript to display documents.
Classification and Regression Tree (basic (Object (classify, predict), set…
Classification and Regression Tree
basic
aka Decision Tree / Tree
Object
classify
predict
set of rules
tree diagram
recursive partitioning
maximize homogeneity
pruning
overfitting
simplify
step
Recursive Partitioning
procedure
select predictor
select the value
numerical
midpoint
categorical
possible combination
calculate 'pure'
containing record of mostly one class
measure
Gini index
min number 0: most pure
max number: equally distributed
Entropy index
min number 0: most pure
max num: equally distributed
rules
if ,and, then
Pruning
stopping growth
graph of error rate change on training data and validation
CHID
chi square test
statistically valid decreasing impurity
pruning
procedure
tree with full extent
pruning successively
when multiple possible case
complexity cost
CC(T) = ERR(T) + alpha * LEAF(T)
minimum error tree
best pruned tree
Regression Tree
voting -> average
Gini Index, Entropy index -> sum of squared deviation from the leaf mean
Error rate -> RMSE
Evaluation
pros
simple
can get rules
understandable
no statistical assumption
variable selection & reduction automatically
no need to work on missing data
cons
cannot be divided horizontally or vertically
cannot capture interaction between the predictors