Chapter 4 - Numerical Descriptive Techniques

Measures of relative standing

Identifying outliers

Measures of central tendency

Measure of variation

Approximating the mean from a frequency distribution

Approximating the standard deviation from a frequency distribution

Box and whisker plot

Measures of linear relationship

Sample mean

Mode

population mean

Which measure to choose

mean = sum of observations / number of observations

Arithmetic mean

Mean, median, mode

Median

Give an indication of what a typical value from the sample or population is likely to be

Geometric mean

Range

Variance

Range, variance, standard deviation, coefficient of variation, interquartile range (IQR)

Standard deviation

Indicate to what extent the values of the sample or population are dispersed, scattered or spread

Coefficient of variation

Non-central tendancy

Quartiles, percentiles

Provide information about the position of particular values relative to the entire data set

Quartiles

Fixed and variable costs

Coefficient of determination

Covariance

Coefficient of correlation

Covariance, correlation, coefficient of determination, least squares line

Least squares method

Measures the relationship between two numerical variables

Graphical display of the 5 number summary

Z - score

Tukey's rule

Raw data is not available

click to edit

is a population parameter

describes the population

click to edit

is a sample statistic

describes the sample

Measure of central tendency

Affected by extreme values (outliers)

Number of values is odd -> median is the middle value

Number of values is even -> median is average of the two middle numbers

Of an ordered set of data, median is located at the 0.5(n+1) ranked value

Not affected by extreme values (outliers)

Not affected by extreme values

There may be no mode or there may be several modes

Value that occurs the most often

Median -> not sensitive to extreme values

Mean -> not valid for ordinal and nominal data

Mean -> unless extreme values (outliers) exist

Median -> appropriate for ordinal data

click to edit

The nth root of the product of n observations

Range = largest observation - smallest observation

Ignores the way in which data is distributed

Simplest measure of variability

Sensitive to outliers

Variance of a population:

Variance of a sample:

Measures variability

Sample variance calculation formula:

most important measure of variability

An average distance of all observations from the mean

Has the same unit as the observed data

Empirical rule

approximately 68% of all observations fall within 1 standard deviation of the mean

approximately 95% of all observations fall within 2 standard deviation of the mean

Bell-shaped distributions

approximately 99.7% of all observations fall within 3 standard deviation of the mean

Population coefficient of variation->

Sample coefficient of variation ->

Divide the standard deviation by the mean

Measure of relative spread

Can be expressed as a %

25th, 50th and 75th percentiles are referred to as quartiles

lower quartile -> 25th percentile

Q2 = 50th percentile, also the median

upper quartile -> Q3 = 75th percentile

Quartiles split the ranked data into 4 segments with an equal number of values per segment:

Q3 = 0.75(n+1) ranked value

Interquartile range = Q3 - Q1

Q1 = 0.25(n+1) ranked value

Q2 = 0.5(n+1) ranked value

Outlilers can be overcome by making use of the interquartile range

click to edit

A value with a Z - score of above 3 or below -3 is considered an outlier

click to edit

A measure of distance in standard deviations from the mean

Calculate the upperbound -> Q3 + 1.5IQR

Calculate the lower bound -> Q1 - 1.5IQR

An outlier is a value that is at a distance of more than 1.5 times the IQR outside the box

Values larger than the upperbound or smaller than the lowerbound are outliers

Measures the strength of the linear relationship between two numerical variables (bivariate data)

Population covariance ->

Sample covariance ->

Calculation formula ->

When 2 variables move in the opposite direction -> covariance is a large negative number

When there is no particular pattern, the covariance is a small number

When 2 variables move in the same direction -> covariance will be a large positive number

Population coefficient of correlation ->

Sample coefficient of correlation ->

Covariance divided by the standard deviation of the variables

Two variable strongly negative relationship, the coefficient value is close to -1 (strong negative linear relationship)

Two variables strongly positive relationship, the coefficient value is close to 1 (strong positive linear relationship)

Produces a straight line drawn through the points so that the sum of squared deviations between the points and the line is minimized

This line is represented by ->

Equation of a line -> y = mx + b

click to edit

Objective of the scatter diagram is to measure the strength and direction of the linear relationship

click to edit

Variable costs are costs that vary directly with the number of products produced

click to edit

Fixed costs are cost that must be paid whether or not any units are produced

Calculated by squaring the coefficient of correlation R^2

Measures the amount of variation in the dependent variable that is explained by the variation in the independent variable

click to edit

click to edit

click to edit