Please enable JavaScript.
Coggle requires JavaScript to display documents.
Machine Learning: Chapter 8: Accessing and Storing Data (Track Down…
Machine Learning: Chapter 8: Accessing and Storing Data
Track Down Relevant Data
Data needed dependent on definite project objectives
What is the business problem?
What is the unit of analysis and prediction target?
Evaluate potential sources of data
perform additional research on accessing data
access one or more data sources and turn each into a matrix
Remove unhelpful features of data
reduces processing cost and time
data generally have an "identity" field that enables integration with other tables
Many start with internal data
Often found in a database table
Can purchase additional data
Examine Data and Remove Columns
First need a tool to interact
Excel if rows are below 200,000
Alteryx: if more rows and not a programmer
Python and then R, if a programmer
SQL: if data is in a database
Initial data exploration
Examine the following:
Column names
Data types
Number of rows
Understand content of data in relation to project objectives
remove any irrelavant columns
Example Dataset
Microsoft Northwind Database
tables tied together with relationships
"one-to-many" relationship
infinity signs denotes the "many" side
Excercises
What are the types of data discussed in this chapter?
Strings
Numeric
Continuous: infinite number of possible responses
Discrete: finite number of options
Categorical (Nominal): groups are different, but no meaningful ranking
Binary: Boolean
Ordinal: meaningful order, but distance is unequal
Interval: meaningful order, space between groups is even
Dates
Ratio
Date/Time
Spatial objects
Arbitrary data