Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 11 Summarization (11.1 Summarize (typical functions that can be…
Chapter 11 Summarization
when data is available that does not focus directly on the right unit, it must be aggregated or summarized
ex: if a customer called support 5 times, info won't exist in customer table, but in a customer support table. after joining these, data is summarized to get a new column about each customer stating how many times they call the support line
11.1 Summarize
-
one or more columns to group data is selected, you create a virtual "bucket" for each unique group
-
to understand whether a column is above the desired level of analysis, examine whether any row ever changes for the same "OrderID"
you may notice a specific order is always made for the same customer and is always sold by the same employee, which is why these additional columns can be included without affecting the summarize result
11.2 Crosstab
where summarize provides summary and aggregate info on existing columns, crosstab uses the content inside columns to create new columns
way to deal with data that is in "skinny" form and transform what is currently listed in rows to column form
this "skinny" data is seldom at the right level for analysis, and crosstab makes data available in an intuitive and readable fashion, sometimes also creating new features for machine learning to better predict a target
happens with data from sales, or Internet of Things devices that report their status back to their owner every sub-second
usually unknown how specifically data will be combined at time of database design, so each "event" is stored as its own row in a database table or file
this means customers, devices, or users of the Web are involved in many different types of events
-