Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 11: Summarization (11.1 Summarize (List of typical functions that…
Chapter 11: Summarization
When data is available that does not focus directly on the right "unit of analysis" it must be aggregated or summarized
11.1 Summarize
AKA group or group by
When summarizing, one or more columns by
which to group data is selected, essentially creating a virtual “bucket” for each
unique group.
Once all the relevant rows for that
employee are in the bucket, the data within are available to be summarized.
List of typical functions that can be applied to data inside each bucket
Sum
Count
Min
Max
First
Last
Average
Median
Mode
Standard Deviation
Summarize can be run with any information
above the level of analysis
11.2 Crosstab
Crosstab uses the content inside columns to create new columns
Crosstab is a way to deal with data that is currently in “skinny” form and transform
what is currently listed in rows to column-form.
This kind of “skinny” data is seldom
at the right level for analysis, and crosstab makes data available in an intuitive and
readable fashion, sometimes also creating new features for machine learning to
better predict a target
The skinny table is a common shape for data to take.