Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 12: Data Reduction and Splitting (12.2 Filtering (Titanic dataset…
Chapter 12: Data Reduction and Splitting
12.1 Unique Rows
delete duplicate rows
partial match removal
removal of full rows based on identical content of a few columns
data first must sorted, listing rows to keep first
follow by a specification of which columns should be identical for duplicated to be removed
Example
home address via cell phone data
make assumption then sort by date and time
longitude/latitude of day's first use
complete match removal
removal of full rows based on identical content in all columns
for a row to be removed, all values in all columns must match the same values in a prior row
12.2 Filtering
split data into two separate tables based on characteristics
Titanic dataset is missing age
after a union to combine training and test
filter can be placed to create a new set
advanced alternative: train machine-learning model to predict the age of passengers based on other features
split data to examine quantity type