Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 4 - Numerical Descriptive Techniques - Coggle Diagram
Chapter 4 - Numerical Descriptive Techniques
Measures of relative standing
Non-central tendancy
Quartiles, percentiles
Provide information about the position of particular values relative to the entire data set
Quartiles
25th, 50th and 75th percentiles are referred to as quartiles
lower quartile -> 25th percentile
Q2 = 50th percentile, also the median
upper quartile -> Q3 = 75th percentile
Quartiles split the ranked data into 4 segments with an equal number of values per segment:
Q3 = 0.75(n+1) ranked value
Interquartile range = Q3 - Q1
Q1 = 0.25(n+1) ranked value
Q2 = 0.5(n+1) ranked value
Outlilers can be overcome by making use of the interquartile range
Identifying outliers
Z - score
A value with a Z - score of above 3 or below -3 is considered an outlier
A measure of distance in standard deviations from the mean
Tukey's rule
Calculate the upperbound -> Q3 + 1.5IQR
Calculate the lower bound -> Q1 - 1.5IQR
An outlier is a value that is at a distance of more than 1.5 times the IQR outside the box
Values larger than the upperbound or smaller than the lowerbound are outliers
Measures of central tendency
Sample mean
is a sample statistic
describes the sample
Mode
Not affected by extreme values
There may be no mode or there may be several modes
Value that occurs the most often
population mean
is a population parameter
describes the population
Which measure to choose
Median -> not sensitive to extreme values
Mean -> not valid for ordinal and nominal data
Mean -> unless extreme values (outliers) exist
Median -> appropriate for ordinal data
mean = sum of observations / number of observations
Arithmetic mean
Measure of central tendency
Affected by extreme values (outliers)
Mean, median, mode
Median
Number of values is odd -> median is the middle value
Number of values is even -> median is average of the two middle numbers
Of an ordered set of data, median is located at the 0.5(n+1) ranked value
Not affected by extreme values (outliers)
Give an indication of what a typical value from the sample or population is likely to be
Geometric mean
The nth root of the product of n observations
Measure of variation
Range
Range = largest observation - smallest observation
Ignores the way in which data is distributed
Simplest measure of variability
Sensitive to outliers
Variance
Variance of a population:
Variance of a sample:
Measures variability
Sample variance calculation formula:
most important measure of variability
Range, variance, standard deviation, coefficient of variation, interquartile range (IQR)
Standard deviation
An average distance of all observations from the mean
Has the same unit as the observed data
Empirical rule
approximately 68% of all observations fall within 1 standard deviation of the mean
approximately 95% of all observations fall within 2 standard deviation of the mean
Bell-shaped distributions
approximately 99.7% of all observations fall within 3 standard deviation of the mean
Indicate to what extent the values of the sample or population are dispersed, scattered or spread
Coefficient of variation
Population coefficient of variation->
Sample coefficient of variation ->
Divide the standard deviation by the mean
Measure of relative spread
Can be expressed as a %
Approximating the mean from a frequency distribution
Raw data is not available
Approximating the standard deviation from a frequency distribution
Box and whisker plot
Graphical display of the 5 number summary
Measures of linear relationship
Fixed and variable costs
Variable costs are costs that vary directly with the number of products produced
Fixed costs are cost that must be paid whether or not any units are produced
Coefficient of determination
Calculated by squaring the coefficient of correlation R^2
Measures the amount of variation in the dependent variable that is explained by the variation in the independent variable
Covariance
Measures the strength of the linear relationship between two numerical variables (bivariate data)
Population covariance ->
Sample covariance ->
Calculation formula ->
When 2 variables move in the opposite direction -> covariance is a large negative number
When there is no particular pattern, the covariance is a small number
When 2 variables move in the same direction -> covariance will be a large positive number
Coefficient of correlation
Population coefficient of correlation ->
Sample coefficient of correlation ->
Covariance divided by the standard deviation of the variables
Two variable strongly negative relationship, the coefficient value is close to -1 (strong negative linear relationship)
Two variables strongly positive relationship, the coefficient value is close to 1 (strong positive linear relationship)
Covariance, correlation, coefficient of determination, least squares line
Least squares method
Produces a straight line drawn through the points so that the sum of squared deviations between the points and the line is minimized
This line is represented by ->
Equation of a line -> y = mx + b
Objective of the scatter diagram is to measure the strength and direction of the linear relationship
Measures the relationship between two numerical variables