Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 11: Summarization (Summarize (Level of analysis (any column that…
Chapter 11: Summarization
Summarize
AKA group by group
Level of analysis
any column that does not change after level of analysis can be used as a bucket
above the desired level analysis
examine whether any row ever changes for same orderID
Can summarize by more than one column
EX: if number of rows refering to a given employee on each day is needed
data summarized by two columns, EmployeeID and order date
many more buckets would then be created
functions that can be applied to data in each bucket
count, sum, min, max, first, last, average, median, mode, standard deviation
One or more columns by which to group data is selected
creating a virtual "bucket" for each unique group
EX: interested in employees, summarize by EmployeeID, each unique would be assigned a bucket
all rows to each employee placed in bucket
once all relevant rows are in the bucket, data can be summarized
Crosstab
uses content inside columns to create new columns
used to take data in "skinny" form and transform what is currently listed in rows to column form
Skinny happens with sales when one or more customers make purchases over time
Also, with internet of things devices reporting status back all the time
makes data available in an intuitive and readable fashion
sometimes creating new features for ML to better predict a target
unknown how data will be combined at time of database design, each "event" is stored as its own row in a database or file
means every customer, device, or user on web is involved in a number of different
types
of events
EX: grocery store customer purchases products from multiple categories
The info exists at order detail level, not individual customer level
Reduces both number of columns and rows