Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 12: Data Reduction and Splitting (Filtering (Filtering is an often…

- - - - Complete match removal: Removal of full rows based on identical content in all columns.
- - - - select only the device id and date columns
        
        unique functioning retaining only the first row from each day
        
        Using the summarize function, each unique device id can be
        grouped with each unique configuration of longitude and latitude into discreet
        buckets
        
        the most frequent location of the call is likely to be where a person lives
- - - - calculate the mean age for these people and then use this value as the
        age value for all the rows in the test dataset
  - - - DataRobot is a “living” tool that is
        constantly updated and improved
- - - - unbalanced dataset occurs when one value
        is underrepresented relative to the other in what are called binary targets
        
        filtering tool to create two tables, one for each class of target, before downsampling
        the majority class
        
        Holdback Sample: The first set of data that extracted
        
        used for the final evaluation of the model
        
        Randomize the order of the data and select a holdout sample
        
        First, set aside Fold 1 (illustrated in yellow), combine the rows in the
        remaining four folds (2–5), and use these rows to create a model of which
        features (columns) drive (explain) the target
        
        1 more item...