Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 11: Summarization (11.1: Summarize (Northwind (group by OrderId,…
Chapter 11: Summarization
11.1: Summarize
buckets
one or more columns by which to group data is selected
functions that can be applied to data in bucket
first
last
min
average
max
medium
sum
mode
count
standard deviation
Northwind
every row belonging to that employee would then be placed inside the bucket.
group by OrderId, meaning each row with the same OrderId ends in same “bucket”
function can now be applied across buckets
aka group or group by.
provides summary and aggregate information on existing columns
11.2: Crosstab
crosstab is a way to deal with data in “skinny” form and transform rows to column-form
skinny: when customers make many purchases over time
Example
what drink choices suggest about employees
drink-name column to become the new row
specifies drink-type column
coffee
tea
energy drink
unknown how data will be combined = ind. “events”
OrderId: rows of interest and CategoryName: content
count: how many of each product were in each order
uses content inside columns to create new columns
Introduction
available data that does not focus on right unit
must be aggregated/summarized
Summarization
reduces both the number of columns rows
join data
data summarized to get new column
ex: Northwind Dataset