Please enable JavaScript.
Coggle requires JavaScript to display documents.
Statistics :chart_with_upwards_trend: Representations of Data (How do you…
Statistics :chart_with_upwards_trend:
Representations of Data
How do you compare data?
Key Terms :bookmark_tabs:
Measure of location
: any value that represents a position on a a data set.
Measure of central tendency
: A method where a single value at the middle of a distribution can tell us something about the whole set of data.
lower quartile
: 1/4 way throigh data set. If the answer is not a whole number
round up
!
Deviation
:How far a measurement is from the mean.
upper quartile
: 3/4 way through data set. If the answer is not a whole number
round up
!
Interpolation
: A technique used estimate the median, quartiles and percentiles in a grouped frequency table.
Interquartile range
:Looks at the spread of the middle 50% of values, which is favourable as it never looks at extreme values.
Measure of spread
: Shows how spread out the data is.
When comparing data you always need to compare two things, which are the
measure of location
, and the
measure of spread
.
Measures of
location
/central tendency
.
Mean
: For
quantitative data
and this gives a true measure of the data because it uses all the data. It is affected by extreme values. (x bar, x̅)
You can use your calculator to help you find all these!
Go to statistic mode (6) and select 1 (as we are using 1 variable only). If the frequency column does not show up, press shift, menu and then 3. If using grouped data enter the midpoints into x (variable column). Now press OPTN and 3 for 1-variable calculations.
Median, Q2
: The middle value when the data values are put in order. Used for
quantitative data
and
extreme values
as it gives outliers less influence on the final result. Arrange data in order, use question (n+1)/2 and find the number in the list. If this value happens to be between two numbers, find the mean of those two numbers.
Mode
: The value/ class that occurs the most often. This can be
both qualitative and quantitative
for single modes or two (binomial) modes. The most popular value, :dress:. It is most useful when you have a large data set.
Measures of
spread
Variance and Standard Deviation :strawberry::lollipop: ( for grouped data on a frequency table)
Standard deviation σ
is the square root of the variance. It helps us visualise how spread measurements are from the mean.
Square root of sxx/n
Variance σ2
is used to work out a spread of a data set using all the data given. It can help investors work out the risk of a product :money_with_wings:.
Range
: Difference between smallest and largest values in the data set.
Interquartile range
: The difference between values for two given percentiles.
Exam Questions
The pixies collect 100g of mushrooms one day and 600g another.
:mushroom: It will increase the mean as 600 > old mean
:mushroom: Mode will stay the same.
:mushroom: The median will be smaller because there is an even number of values and the new mean will be...
The mode is used when planning production numbers for dresses because it is an
actual data value
and gives the most common size.
Some wrecking balls take different times to smash walls. What is the best way to work out the average time if there are extreme values and if the mode is close to the median?
:hammer_and_wrench: mode
coding :alien:
Coding is a way of simplifying statistical calculations.
There are 3 different ways of presenting coding.
Regular coding:
Mean of coded data:
Standard Deviation of coded data:
Sometimes you will need to un-code data
To find the
mean
of the original data when you are given the coded data:
To find the
standard deviation
of the original data when you are given the coded data:
How to do any grouped frequency table question
The first thing you should always do is write down the find the midpoints, and make a
CUMULATIVE FREQUENCY TABLE
(which adds frequencies)!
An exam could ask you how to find the median, quartiles or percentiles of a grouped frequency tables, this is how to solve these and get 100% of the marks.
Use
Interpolation
!Note that by doing this, you are assuming that all the data values are evenly distributed with the class.
For this example, we will find the median
Find how many values there are (the sum of all the frequencies)
Half this number to give the n'th value.
Find which group (x column) the n'th value belongs too.
Put all relevant values in you special, magical digram that was invented to save your life in the exam.
Substitute values into equation (Q2 - LB)/ (q2 - lf) = (UB -LB)/ (uf -lf)
Rearrange this equation to find Q2.
Representations of Data
Key Terms
Extreme
: A value that lies outside the overall pattern of data.
Outliers
Outliers are either more than Q3 + k(Q3-Q1) or less than Q1 -k(Q3-Q1).
Key terms
Outliers
:smiling_imp: An extreme value that lies outside the overal patter on data.
Anomalies
: outliers that should be removed from the data since it is clearly and eror and would be misleading to keep it in.
Cleaning the data
:wastebasket: The process of removing anomalies from the data.
Histogram
:two_women_holding_hands:: A graph that presented
grouped
,
continuous
data. It helps us to visualise how data is distributed, by letting us see general locations, the general shape and how spread out the data is. A key thing to note about histograms is that there are no gaps between the bars.
You need to learn these 3 ways of representing data
Box Plots
Comparing Box plots
Compare the quartiles
Compare the minimum and maximum
location: median
spread: IQR
Advantages:
:check: helps us see the spread of data easily
:check: easy to compare stratified samples
Disadvantages:
:green_cross: Original data not shown in box plot
:green_cross: Mean and mode are not represented (easily misunderstood).
Histograms
Frequency density is needed to calculate the height. :straight_ruler:
area of bar = frequency x k, class width
Frequency density = frequency/ class width
Comparing data
When comparing two or more sets of data you must always comment on
1) The measure of location
2) The measure of spread
When comparing data that has no extreme values compare:
:slightly_smiling_face: mean
:slightly_smiling_face: standard deviation
When comparing data that has extreme values compare:
:slightly_smiling_face: interquartile range
:slightly_smiling_face:median
Grouped Frequency Tables
The first thing you always have to do is find the
midpoints
(x)- so easy!
If you need to
estimated the mean
just create another column called
fx
. Divide sigma fx by the total frequency :smile:! Remember that the answers are only estimated because the exact values are unknown.
Interpolation
Standard Deviation & Variance
Mean, Mode and Median
Coding