Please enable JavaScript.
Coggle requires JavaScript to display documents.
Description and classical measures - Coggle Diagram
Description and classical measures
Robustness and fragility
Robustness - measures that are resilient to others
Fragility - unduly troubled by unusually large or small values; outliers, not really part of dist
Variability
Coefficient of variation
• Done where more than one data set may have the SD but clearly more variance around one than the other e.g. : A- Mean = 20 B- mean == 20000 – they can both have standard deviation of 10 but A has much more variation around the mean
Coefficient provides the relative spread to the mean
Variance - the size of the squared deviations
Can be checked using SD, coefficient of variation and variance? - for al the larger the number, the larger the variability
Boxplots and IQR
Boxplots show distribution
Boxplots show: middle, extremes , IQR and identifies ouliers
Quartile1 : bottom of IR , Quartile 3: top of IQR , Quartile 2 :mediann
Whisters extend too highest and lowest data points , excl outliers
Outliers = either abover UQ+1.5
IQ OR below LQ-1.5
IQ
Used to suggest hypotheses about causation/ trends
Can be used comparatively
Boxplots are good summary measures esp when comparing batches of data
Spread
variability around the middle
IQR[midspread]
Uq - Lq; Q3-Q1
IQR shows spread of 'middle 50%' of data
half of the observations lie within the IQR of median
median = middle quartile
mean - vaiable
measured using :variance, SD, coefficient of variation . Larger numbers = varaibility
SD most often used because in ‘well behaved Data; theoretically privileged’- 2/3rds of data should lie +/- 1 SD of the mean and 95% 2SD of the mean
types of data
nominal - quantitative - no coherent order e.g. mode
ordinal - percentiles - median
Ratio - doesn't have negative data - coefficient and variation
Interval data - can have negatives - standards deviation
GIGO
GIGO (garbage in, garbage out) is a concept common to computer science and mathematics: the quality of output is determined by the quality of the input. So, for example, if a mathematical equation is improperly stated, the answer is unlikely to be correct.
• Machine follows instruction which may or may not be appropriate
Follows BIDMAS- assumes a linear scale
Stem and Leaf plot
Shows frequency distribution
Moderately sized data works well
can be done back to back for 2 data comparisons
doen by creating a stem - integers in order between the extremes of the dataset
2 - add the leaves - correspond the the numberafter the decimal point of each integer
3 - sort the leaves - put them in descending order
Discrete data
mean cannot be used
mode is needed
scale of measurement determines the summary measure discrete(nominal) - need diff avg
mode shows most commonly occuring value - % observations in that category
e.g. if 50 males and 20 females - avg would suggest 35 of each. the mode would be 50 males - 71%