Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data handling and Analysis (Presentation and display of quantitative data …
Data handling and Analysis
Quantitative and Qualitative data
difference between collection techniques
Quantitative data
closed questions in questionnaires
tally of behavioural categories in an observational study
psychologists develop meausres of psychological variables
looking at averages and differences between groups
qualitative data
open questions in questionnaires
researchers in an observational study can describe what they see
Quantitative data
numerical
produces objective, less detailed, more reliable data
behaviour is measured in numbers or quantities
Strengths
easy to analyse, using descriptive statistics
conclusions are easily drawn
weaknesses
may oversimplify reality
therefore conclusions may be meaningless
Qualitative data
non-numerical (more descriptive)
produces, subjective, detailed, less reliable data
can be converted into quantitative data through content analysis
strengths
provide rich and detaliend info about people's experiences
can provide unexpected insights into thoughts and behaviours because the answers are not restricted by previous expectations
concerns with attitudes, fear, beliefs and emotions
weaknesses
complexity makes it more difficult to analyse such data and draw conclusions
Primary and secondary data
secondary data
meta analysis
combines findings from several studies in a certain research area
allows identification of trends and relationships that were not possible in individual studies
helpful when individual studies have contradictory findings to give clearer insight
data originally collected towards another research aim
could be from the researcher or another person
e.g. could use government statistics
strengths
can be collected from several sources giving a clearer insight into research area
simpler and cheaper to access someone else's data
some data may have gone through statistical testing so it is already known that it is statistically significant
weaknesses
may not fit the exact needs of the study
Primary data
strengths
has not been manipulated making it more reliable and valid
researcher has control over the data
weaknesses
gathering primary data is a lengthy and expensive process
you have to design a study, recruit pps etc.
original data that has been collected specifically for the research aim and has not been published before
study could be experiment, questionnaire or observation
descriptive statistics
measures of central tendency
mean
what it is
mid-point of combined values of a set of data
can only be used with ratio or interval level data
strengths
most sensitive measure of central tendency because it takes account of exact distance between all the data
weaknesses
because it is sensitive it can be easily distorted by extreme values
then is misrepresentative of the data as a whole
cannot be used with nominal data
doesn't makes sense with discrete values such as average number of children
how to do it
add them all up and divide them by how many there are
median
what it is
middle value in an ordered list
can be used for ratio, interval and ordinal data
how to do it
if there is an odd number it is the middle number
if it is an even number use mid point between the two central numbers
strengths
not affected by extreme scores
appropriate for ordinal data
easier to calculate than the mean
weaknesses
not as sensitive as the mean because exact values are not reflected in the final calculation
unrepresentative of small sets of data because doesn't account for all scores
mode
what is it
most common number in a set of scores
nominal data it is the category that has the highest frequency count
ordinal and interval it is the data item that occurs ost frequently
If there are two most common they are bi-modal
strengths
unaffected by extreme values
more useful with extreme data than median
only method that can be used in categories i.e. nominal data
Weaknesses
not useful when there are several modes
doesn't tell anything about other values in a distribution
does not use all the scores
measures of dispersion
range
what it is distance between the top and bottom values in the set of data
how to do it
top data - bottom data (add 1??)
strengths
easy to calculate
takes full account of extreme values
weaknesses
can be distorted by extreme values
does not show how data is spread around the mean - fails to take account of distribution of the numbers
standard deviation
what is it
shows the spread of the scores around the mean
large standard deviation = larger spread of scores
how to do it
add all scores up and divide them by total number of scores
subtract mean from each individual score
square each score
add squared scores together
divide sum of squares by number of scores -1 (variance)
Square root the variance = standard deviation
strengths
more sensitive than range because all the scores are used
allows for interpretation of individual scores
weaknesses
more complicated to calculate
less meaningful if data is not normally distributed
These provide measure of variability i.e. the spread of the scores
correlations
What is it
association between two continuous variables
Positive correlation: one variable increases as other increases
negative correlation: one co-variable decreases as other increases
Zero correlation: no relationship between co-variables
Correlational hypothesis
states expected association between co-variables
Correlation co-efficient
calculated to see extent of correlation between two co-variables
perfect positive correlation = +1
perfect negative correlation = -1
ranges from -1 to +1
Strengths
allows predictions to be made based off of this research
allows quantification of relationship between the two co-variables
no manipulation required = good for studies that would be unethical
procedure can usually be easily repeated
Weaknesses
correlation that appear low can be significant if the number score is high
can't make a conclusion because there is no manipulation of variables
cannot infer causality
there may be intervening variables = extraneous relationship
only works for linear relationships e.g. if something correlates to a certain point and then doesn't - doesn't work
Presentation and display of quantitative data
line graphs
continuous data on x axis
draw dot at the middle top of bar in histogram
each dot connected by a line
two or more frequency distributions can be compared on the same graph
tables
raw data can be set out in table
raw data can be summarised using measures of central tendency and dispersion in a table
good for interpreting the data
scattergrams
used when doing a correlational analysis
bar charts
height of each mar represents frequency of each item
especially useful for non-continuous data
e.g. nominal data
space between bars to show lack of continuity
histograms
scores placed along x axis and frequency on y axis
area within bars must be proportional to the frequencies represented
used for continuous data
continuous data on x axis
no space between the bars
column width is width for equal category interval
Distributions
normal
bell shaped curve
it is the predicted distribution when considering an equally likely set of results
34.13% of the population lie above and 34.13 lie below the mean
features
mean, median and mode are in the exact mid-point
distribution is symmetcial around mid-point
dispersion either side is consistent and can be expressed in standard deviations
How to check if it is normal distribution
check visually
work out measures of central tendency to see if they are the same
plot a histogram of the data to see if it a bell shaped curve
skewed
data not distributed equally around the mean
positive skewed
high extreme score
contains more low scores than high scores
mode is lower than the mean
negative skewed
high low score
more high scores than low scores
mode and median are higher than the mean
Levels of measurement
nominal
in categories e.g types of trees
ordinal
expressed in a set of continuous order of results along a spectrum e.g. height or IQ test
interval
this is a higher level of measurement than the other two
ordinal data where gaps between results are standardised e.g. race intervals of 30 seconds