Please enable JavaScript.

Coggle requires JavaScript to display documents.

Ch16: Understanding the Process (Will more data improve predictive…

- - - - Diminishing marginal improvement: greater the amount of relative data at the beginning, less likely additional data will improve predicability
        
        Learning Curves: validation score on Y axis and percent of available data on X axis
        
        lower scores on Y axis are preferable because LogLoss is a loss measure
        
        calculate the cost of using additional data
        
        when adding data use cross validation results
        
        performance addressed at the 16%, 32%, and 64% levels
- - - - y-axis = logloss score, x-axis = time to score more records
        
        Blueprint pane shows model that did better than the others
        
        convert categorical features to ordinal >= one-hot encoding