Please enable JavaScript.
Coggle requires JavaScript to display documents.
Larson Chapter 12 (Filtering (necessary for ensuring that changes or made…
Larson Chapter 12
Filtering
necessary for ensuring that changes or made uniformly to training and test rows
only training sets can be used for instructing a machine learning model, or else they will interfere with the predictive accuracy of the model
EXAMPLE
using filter tool to partition the Quantity field
incorrectly yields "kg" search within "pkgs" suffix
still yields table of cans and bottles by the measurement of "oz" (ounces)
Removing rows
Necessary when there are duplicate rows
Partial match removal
removing full rows based on identical content found in a few columns
Data must me ordered from data to keep to data to search for duplicates in
EXAMPLE
Determining user home address
ordering data by first app interaction
removing any identical matches that follow
results in a list of single uses at the beginning of each day
1 more item...
Complete match removal
removing full columns based on identical content across columns
Northwind Data example
Goal: to count the # of orders containing a dairy order
use a unique function based on the OrderID row
remaining product fields should be removed