Please enable JavaScript.

Coggle requires JavaScript to display documents.

ch. 12 data reduction and splitting (12.3 sampling (benefit for sampling…

- - - - ex. home address of cell phone. to find user's home address, sort data by device id and then by date and time. then select only device id and date columns applying a unique function (keeps only unique rows, discarding already encountered data)
        
        then summarize function each unique device id can be grouped with each unique config of longitude and latitude into buckets
- - - - S2: Build model with combo of folds 1+2+3+4+5, validate on fold 2. only occurs if cross validation is deemed appropriate and valuable for given model. This 2-5 step is conducted as a single process cross validation. Validation sample now hidden from algorithm. Once 1+3+4+5 algorithm creates model, its applied to validation sample (fold 2) which also predicts state of target variable for each row and then calculates success accuracy score
        
        S3. But four in the book. Validation follows same actions in step 2. validation sample moved down to fold 3. 1+2+4+5 conducts training with accuracy of .25 percent. (1/4 rows correctly assigned)
        
        S4: But five in the book. Validation sample moved down to 4. Training conducted with combined folds 1+2+3+5. Validation run against fold 4 with .75 accuracy 3/4 rows correctly assinged
        
        S5: But six in the book. Validation moved down to fold 5. 1+2+3+4. with accuracy of .25 (1/4 rows correctly assigned)
        
        S6:seven? overall accuracy for cross validation calculated (every non-holdout row in training used 5x, 4x to construct model and 1x to eval other model) Model accuracy calculated by checking true target value for all 20 rows against their predicted values.
        ex. number of correct predictions was 3+2+1+3+1=10 out of 20 accuracy of 50 percent.