Chapter 11: Summarization (11.1 Summarize (List of typical functions that…
Chapter 11: Summarization
When data is available that does not focus directly on the right "unit of analysis" it must be aggregated or summarized
AKA group or group by
When summarizing, one or more columns by
which to group data is selected, essentially creating a virtual “bucket” for each
Once all the relevant rows for that
employee are in the bucket, the data within are available to be summarized.
List of typical functions that can be applied to data inside each bucket
Summarize can be run with any information
above the level of analysis
Crosstab uses the content inside columns to create new columns
Crosstab is a way to deal with data that is currently in “skinny” form and transform
what is currently listed in rows to column-form.
This kind of “skinny” data is seldom
at the right level for analysis, and crosstab makes data available in an intuitive and
readable fashion, sometimes also creating new features for machine learning to
better predict a target
The skinny table is a common shape for data to take.