Please enable JavaScript.
Coggle requires JavaScript to display documents.
Lecture 2 Descriptive Statics - Coggle Diagram
Lecture 2 Descriptive Statics
Measures of central tendency
Mode
:<3:
most frequently occuring value
unimodal, bimodal
Not sensitive to outliers
Median :star:
the mid point
ordered before calculating
n= (odd number + 1) / 2
n = (even number +1) / 2
add below and above value / 2
Mean :explode:
Senstitive to outliers
Arithmetic mean
=
Measures of variability (Dispersion/ deviation/ spread)
Range
Min value -- Max Value
Most limited
afftected by furtherst outliers
only about extreme values
Varience
How spread out
average of
squared deviation from mean
Standard Deviation
How spread out
Average
distance from any point in the data set to mean
Square root of variance
Inference
Small SD - values close to middle
Large SD - values farther away from middle
Properties
never negative
smallest = 0
same unit of variable
standard deviation is the average distance from the mean value of all values in a set of data.
Quantile
IQR
Normal Distribution
Characters
Bell shaped symetrical curve
tails -
asymptotic
never touch horizontal line
mean=median=mode
Area = 1
68
-
95
-
99.7 rule
All values within +/- 3 SD
Z score
- Standard Normal Distribution
Relative standing - how many sd away
(x-mean)/standard deviation
useful to compare vaules from different data sets
Z score significance
p<0.05 if z >1.96
p<0.001 if z >3.29
p<0.01 if z >2.58
Non-normal Distribution
Kurtosis
Types
Leptokurtic -above 3 - thin bell with a high peak
Thickening of tails(flat) -Platykurtic
Mesokurtic
Degree of peak ness
Methods to deal
Transformation - to make normal
log or square root
moderate right skewness
Box-cox transformation
X
techniques not based on normal distribution
exponential
Weibull
Lognormal
Skewness
one side further from the middle than other side
Types
Positively skewed
elongated tail at the right (more data)
mean > median > mode
Negatively skewed
elongated tail at the left (more data)
Mean < median < mode
asymmetry in the distribution
Histogram
Population & Sample
Population
Parameter
Sample - randomly selected
Represent population
Statistic
Variable
- characteristic of interest which can vary
Qualitative variable
or
Categorical
Nominal
e.g. hair colour - percentage of gray haired indivuduals
Features
Names or numbers
Not used to calculate
No order
Report
Frequency
Percentage
Ordinal
e.g. disease severity
Features
Rank ordered
Distance b/w - no meaning
1 more item...
Calculate
median or mode
Quantitative variable
Continuous
Desimals possible
e.g height
Further divided
Interval
Features
3 more items...
e.g. temperature :fire:, credit score
Ratio
Features
3 more items...
e.g. Height, weight
Discrete
Whole number
e.g no of children
Descriptive statistics
Describe sample characters - summary
Single number
Graphically
Choice of measure
Central tendency & variation
depend on
Shape of distribution
Scale
Mean & SD
Numerical data (Quantitative Data)
symmetrical distribution
Median & range/ IQR
Ordinal data
skewed numerical data
Relative standing
Feature
location of values relative to other values
Compare values
within data set
different data set
Quantiles
Quartiles
k = 4, Quantile = 3
Q1 Lower quartile - lowest 25% cut off
Q2 Median - 50% cutoff
Q3 upper quartile - lowest 75% cut off
IQR= Q3 - Q1
divides the data into groups of equal observations
Percentile
below this percentage - certain % group fall
order from lower to higher
k = 100, Quintile = 99
Median
k = 2 Quantile 1
50% below & above
Quintile
k=5, Quantile = 4
20% below
Deciles
k=10, Quantile =9
10% below