Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 3 (Supervised segmentation (Segment groups that differ from each…

- - - - e.g. which customers will leave or respond to an ad
      - Informative variables are attributes
        
        Information is a quantity that reduces uncertainty
        
        Attributes correlate with the value target
- - - - Prediction: estimating an unknown value
      - Supervised learning
        
        describes relationship between selected variables and target
      - instance (row): a fact or data point
- - - - Independent Variable= Exploratory Variable
- - - - Want to reduce Entropy or the shaded area
        
        Odor reduces Entropy by .1 so it is an informative attribute
- - - - Some attributes are non-binary and some are numeric
      - Can evaluate using purity measure
      - Most common splitting criterion is Informative Gain
        
        Based on Purity Measure: Entropy
        
        Entropy: A measure of disorder applied to a set
        
        Disorder corresponds to how mixed the segment is in regards to target variable
- - - - Included in most data mining packages
      - Takes a divide and conquer approach
        
        Start with whole data set
        
        Then apply variable selection to create purest subgroups
- - - - Problems with Probabilities
        
        Some leaves might have 100% probability due to single instances
        
        This is a case of overfitting
    - - Start by measuring informative gain of each variable
        
        Place highest informative variable as root
        
        The other nodes are important because they rely on each other