Please enable JavaScript.
Coggle requires JavaScript to display documents.
Statistics for Business - Coggle Diagram
Statistics for Business
Descriptive Statistics
(1.1 p8-10)
Provide methods of describing a set of data in a convenient and informative way
Measure of
Central Tendency
Mean
(1.2 p8-10)
Average
best used when distribution is symmetric
Median
(1.2 p11-13)
Middle value
best used when distribution is skewed
Mode
(1.2 p14-15)
Most frequently occurring
best used for categorical data
Variability or Spread
Range
(1.2 p21)
Total spread of values
Variance
(1.2 p22)
How much the data deviates from the mean value
Standard Deviation
(1.2 p23)
square root of the variance
Coefficient of Variance
(1.2 p25-26)
ratio of the standard deviation to the mean
Shape
Skewness
Degree of distortion from the symmetrical bell curve
Types
(1.2 p30)
Positive skew
Negative skew
No skew
Normal distribution
(1.2 p24)
Given the mean and SD, we can find out the probability of x.
Interpretation
(1.2 p31)
Highly Skewed
+1 < skewness < -1
Moderately skewed
-0.5 < skewness < -1
+0.5 < skewness < +1
Approximately symmetric
-0.5 < skewness < +0.5
Kurtosis
(1.2 p33-35)
Measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution
High Kurtosis
Data has heavy tails
or outliers
Low Kurtosis
Data has light tails or
lack of outliers
Inferential Statistics
(1.1 p15-20)
Provide you with methods on how to draw conclusions about a larger group based on data from a small sample group
2 forms
Estimation
Hypothesis testing
Sources of data
(1.1 p8-10)
Primary data
Direct methods of data collection involve
collecting new data for a specific study
Examples
Surveys
Interviews
Experiments
Secondary data
Indirect methods of data collection involve sourcing and accessing existing data that were not originally collected for the purpose of the study
Examples
Customer records
Online transactions
Data types
(1.1 p29-31)
Categorical
Nominal
No specific ordering
Examples
True, false
chinese, malay, tamil
Ordinal
Strict ordering
Examples
Agree, neutral, disagree
Good, better, best
Numerical
Interval
No true zero
Examples
Date
Temperature
Ratio
With a true zero
Examples
Distance
Weights
Data Preparation
(1.3 p4-10)
Data cleaning
(1.3 p7)
Data transformation
(1.3 p8)
Data standardization
(z-score)
Data normalization
(min-max scaling)
Data construction
Data integration
Data reduction
Data Visualization
(2.1 p32)
Summary reports
Histograms
(2.1 p14-16)
Bar Charts
(2.1 p17-20)
Pie Charts
(2.1 p21-23)
Line Plots
(2.1 p24-26)
Scatter Plots
(2.1 p27-30)
Combo Charts
(2.1 p31)
Analysis Methods
Simple Regression Analysis
(2.3 p3-12)
Goodness of Fit Measure
(R-squared)
Generally, as a rule of thumb, R-squared greater than 0.7 is a good model fit.
The closer R-squared is to 1, the better the model fit
Estimating the relationships between a dependent variable and one or more independent variables
Correlation Analysis
(2.2 p5-17)
Measures the association between two sets of interval scaled or ratio scaled variables
Correlation does not imply causation
Correlation coefficient
Between -1 and 1
Zero - no correlation
Negative sign - inverse or negative correlation
Usually for the correlation to be considered significant, the correlation must be 0 5 or above in either direction
Positive sign - direct or positive correlation
Decision Trees
(3.1 p6-31)
Decision support tool that uses a tree-like model of decisions and their possible consequences
(3.1 p10)
Involve a model-building process
Splitting data, best if node purity is highest.
(3.1 p16-29)
Tree pruning, to cut back on the tree
(3.1 p30)
Can be used to identify critical factors
Advantages
(3.1 p31)
Cluster Analysis
(3.2 p4-28)
Multivariate data exploratory technique by uncovering natural patterns in data
Grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups
Types
Hierarchical
Clustering
Agglomerative
Divisive
Partitional
Clustering
Hard Clustering
K-means
(3.2 p22-26)
Soft Clustering
Fuzzy-C
Classification
Grouping observations into
known
categories.
Supervised learning
Clustering
Grouping observations into
unknown
categories.
Unsupervised learning
Interpreting Output
(3.2 p15-20)
Explain in practical terms
Look at distinguishing characteristics
Look at cluster quality
(3.2 p16-20)
Silhouette Score
Calculate the goodness of a clustering technique
ranges from -1 to 1
More than 0.4 is considered acceptable
Look for clusters of anomalies or outliers