Please enable JavaScript.
Coggle requires JavaScript to display documents.
Personality - Research Methods (Y1) - Coggle Diagram
Personality - Research Methods (Y1)
Research designs
Obvious nature of findings and the power of hindsight: plenty of evidence that people regard findings as true / obvious after the event irrespective of their truth
Psychology's role is to continually generate and provide evidence to support theories about how the social world works
Using different methodologies and concept definitions can often yield different results
Proving our theories - in science we cannot prove a theory is true with data, as we cannot be sure that the next observation will not simply disprove the theory
There is probably no truth, certainty or proof to be found in empirical science
Instead in typical hypothetico-deductive science we generate hypotheses that could be false, and then give the world every chance to falsify our theories
If theories are not falsified we can continue to believe them
Popper - black swan
Popper and deductive reasoning - inductive reasoing involves observing the world, observing a pattern and generating a theory
Deductive reasoning starts with a theory and a hypothesis and hypothesis testing which allows us to reject or accept the null hypothesis
Personality research
Should generate testable hypotheses, and research evidence can support but not prove a theory
Typical research designs - correlational, longitudinal, experimental and case study - independent variable -> dependent variable
Causality requires - Hume (1739-40, 1748) -
Corroborating evidence - cause and effect should occur closer in time (continuity)
Cause must come first, temporal precedence
Effect shouldn't occur without the cause
John Stuart Mill (1865) -
Third or confounding variables - ideally compare conditions with and without the cause i.e. experiments
Third variables - when two variables are measured at the same time, the association between them may be an artefact of a third variable
Confounding variables - extraneous variables that are not relevant to the hypothesis but correlate with both the independent and dependent variable, or which differ between conditions
-These may or may not have been measured
A high quality study should measure and control for confounds in analyses
Bobo Doll Study (1961, Bandura) - causation of aggression through observing role models acting aggressively
-> Could have been demand characteristics, Hawthorne Effect, increased authority of the actor, other effects from being in a lab, genes, previous experience with dolls, innate hatred of clowns, impact of home life
Correlational studies - concurrent associations between personality traits and other variables
Does not indicate causation - directionality and third variables are other factors
In depth evaluation of single individuals - often therapy clients, history, current behaviour, changes during study, idiographic approach to personality
Limitations - generalizability, causality and subjectivity
Can be useful - if interested in rare experiences or cases, if the participant is no different from others and to illustrate a therapy treatment
Longitudinal studies - collect repeated measurements over several occasions
Retains naturalistic relationships but allows directional inference
Third variable still a problem if not measured
Causality with observational methods - observational studies are more ecologically valid
Often the only option for many personality characteristics (we cannot manipulate them)
Attempts at causality often focus on - eliminating the possibility of other confounds or causes
Finding change in IV predicting change in the DV
Causality with experimental methods -
Lab v field experiment
Less useful for manipulating traits but can be very useful for looking at antecedents / influences on personality
More controlled but lack validity
Randomisation and active control groups boost confidence in causal effects
Often difficult to execute in real world settings - how do we manipulate personality
Great in the lab - but not generalisable to the real world
Personality experiments - independent variables are manipulated or non-manipulated but measured
Personality x situation (variables are moderated)
Confounding variables - might affect both IV and DV e.g. family size
Assess these to control for them (covariates)
3 ways to think about personality - Kluckhohn and Murray (1953); each person is
Everyone is all the same
Like some other people (nomothetic)
Like no other people (idiographic)
Replication and fraud in personality research
No single study can provide a definite answer - the more a result can be replicated, the more confidence we have in it and the more the theory is supported
Different populations, designs and measures, difficult to publish non significant replication attempts, limited access to replications
Replication explanations - Colom et al (2000) - found Lynn's statistical approaches to be questionable - re analysis shows negligible differences
Colom and Garcia-Lopez (2002) - found sex differences in the opposite direction
Burt's fraud and Eysenck v Kamin debate:
1960 - Cyril Burt (eugenicist who invented the 11+) placed the estimate of heritability of intelligence very high at 0.8
1970s - US psychologist Leon Kamin claims that Burt's work is a case of fraud - Burt is defended by student Hans Eysenck, and Burt burned all his notes and records and then died in 1971
Kamin and Eysenck debate the fraud case as well as the nature-nurture question about intelligence and personality
25 of Eysenck's publications have been deemed unsafe since his death - papers published around the idea of a caner-prone personality (crazy man)
Burt's students who studied intelligence testing were also chared with scientific racism, as were Eysenck's
Shocked many in the field and sped up the process of pre registration and open data practices
Lynn and Irving - two meta analyses of intelligence tests - no consistent sex differences among children aged 6-14 (d = 0.02)
In adults, men have an advantage (d=0.33) of about 4-5 IQ points
Goes against almost all previous reviews suggesting no differences
Male university students obtained a higher mean than females (0.22-0.33) equivalent to 3.3-5 IQ points
Dishonesty explanations -
Blinkhorn (2005) - found Irving and Lynn did not weigh their meta analysis - did initially, but decided unweighted result was better, and they also decided on median weightings rather than means, a flawed and suspect tactic to try and find a difference
Also argued that they knowingly ignored the file drawer problem (we don't just constantly publish null effects of intelligence) - also questioned sample use (no UK sample, where the test came from)
Studies in meta analysis did not necessarily test for sex differences, so studies were not designed to look for sex differences
Replication crisis - Open Science Collaboration (2015) conducted a project in which they redid 100 major studies in psychological science from three high ranking journals
Mean effect size half of what was reported - and only 36% were replicable
Need for pre-registration to show people are engaging in deductive reasoning
Open data practice so others can see data exists and how it is being analysed
Use pre registrations to log null effects
Reduce journal bias towards null effects and replications
Reliability and Validity
Assessing personality properly is vital for carrying out meaningful research and application
Assessment tools must be reliable - give consistent results regardless of when, where and by whom is scored and valid - measure what is it supposed to measure, and nothing else
Assessment used depends on the researcher's approach to personality
Measurement errors - errors are inevitable, and as psychological scientists we aim to limit and control error to the best of our ability
What problems are there with simple questions - lack of detail in response, global assessment v specific, timing and we can really tell if it measures what is intended (not operationalised)
Types of reliability - indicates the amount of measurement error in responses - consistent results regardless of:
inter rater reliability, do self and informant reports postively correlate
Test-retest - is there correlation over two different time points, ideally r > 0.7
Internal consistency - are individuals items correlated with each other - are they working together to measure the same thing
Types of validity - face validity, convergent validity, discriminant validity and predictive validity
Reliability - two forms in psychometric testing ; internal reliability and reliability over time (test-retest)
Internal reliability (internal consistency) - refers to whether all aspects of the psychometric test are generally working together to measure the same thing and we would expect these aspects to be positively correlated with each other
This would commonly be the number of questions in the scale and so all individual questions should correlate to form a single construct of personality
Cronbach's Alpha - common technique to assess this (1951) - figure ranges between -1 and +1 and a score of above +0.7 is seen as an acceptable level of internal reliability, although some circumstances prefer a much higher level when a more exact measurement is wanted (0.8)
Using this to select items by identifying the weakest one and assess how they can be improved
Correction of correlated item tells us how much each item is related to the overall score, and then CA indicates what the score would be if the scale would be deleted, allowing us to see if the items actually contribute to the scale
Kline (1986) - good items should correlate above 0.3 and not be below 0.2
In a circumstance in which some items are not satisfactory and the score could be improved, you remove an item at a time, starting with the worst performing one and continue until there is no more improvement
Computing the scale - when we are satisfied with the internal reliability of the scale, we would then add all the items together to produce an overall score, where higher scores on the scale represent higher levels
Test-retest reliability - researchers interested in constructs are concerned with individuals being relatively consistent in their attitudes and behaviours over time
In personality, intelligence and individual differences literature, the literature of traits, there are commonly tests on stability of personality
Aim is for a consistent positive correlation between the two times with result - usually above 0.7 is satisfactory
Validity - concerned with whether or not a test is measuring what we claim it is measuring
Types of validity criteria -
Convergent validity - a psychometric test's convergent validity is assessed by the extent to which it shows associations with measures to which it should be related
Concurrent validity - when it shows acceptable correlations with known and accepted standard measures of construct; slightly different as it isn't against other related criteria, but criteria that reportedly measure the same thing
Discriminant validity - when it is not related to things it should not be related too - difficult to assess sometimes as findings need to be useful; new construct should not share any correlation with other constructs
Face validity - concerned with what the measure appears to measure - test looks like it measures what it was designed to measure
Content validity - extent to which a measure represents all facets of the phenomenon being measured
Predictive validity - assesses whether a measure can accurately predict something
Psychometrics have developed some of these ideas into wider terms - criterion related validity and construct validity:
Criterion related validity - assesses the value of the test by the respondents responses to other measures - concurrent and predictive validity
Construct validity - validity that seeks to establish a clear relationship between the construct at a theoretical level and the measure that has been developed introduced convergent validity and discriminant validity
They developed a theory named the multitrait-multimethod matrix which assess construct validity by balancing measures of convergent and discriminant validity
Can never find total validity, but the more tests you do and findings that can be replicated help to make the test more valid in its results
Third person rating of the individual - getting other people to rate the individual on the items of the questionnaire is a helpful way to potentially assess the questionnaire's validity because ideally, ratings given by participants and the other person should be similar - gold standard of validity testing
Data collection - sources of personal data; Cattell (1965) -
L data (life record) - observable, objective behavioural measurements, other reports often substituted
T-data - tests, standardised and cannot be faked
Q data - questionnaire - self reported responses or descriptions
Have to collect data to test the items and decide if it is a good measure - minimum of 150 respondents to be administered too, and you should aim for at least 2-3 participants per item
Observation of behaviour - everyday life, lab experiments coded with checklists, interviews (structured and unstructured), reliability (subjective interpretation and coding) and validity (not biased by participants' self perceptions
Behaviour in lab / interview may not generalise to other situations
Projective tests - Rorsbach Inkblot Test (1921) - the father card inkblot, second card nothing of note and third card is abstract to allow individual to reveal anything they wanted to state but failed to do so already
Designed to be ambigious - individual unconsciously projects his/her personality into stimulus
Developed and used by psychodynamic psychologists
Reliability - subjective administering and interpretation, scores received on different days are inconsistent and what about projection of the tester (project personality onto ink)
Validity - supposed to identify disorders, but no difference between college students and those mental hospitals and people can fake their responses
Self report questionnaires - questionnaires using fixed option responses - Likert scale, multiple choices
Reliability - objective scoring, several items related per scale, test-retest reliability tends to be high
Validity - relies on introspection, may not be accurate, people can fake or bias responses and significantly predict behaviour and clinical diagnoses
Good practice for self report measurement - psychometrics; write multiple good quality items, that have clarity, avoid leading questions, avoid double barreled questions, have reverse wording and avoid embarrassing or hypothetical questions
-> Minimise or control for response bias - acquiescence and social desirability
Writing items for a psychometric test
Clarity - wording of questions - clear, short and unambiguous and make sure the meaning will not change between people, as participants will simply be answering different questions; this muddies results as you do not know which interpretation has been answered
Leading questions - questions steer the respondent to a particular answer - can reflect unconscious bias and can arise undeliberately and occur due to exact phrasing of the question
Embarrassing questions - questions dealing with personal matters should be avoided - it is frowned upon to make participants feel bad as it leads to incorrect information / attrition / low response rate and low completion rate
Hypothetical question - questions that place the individual in a situation they may never experience and ask for their opinion on something
Questions with reverse wording - done to make the participant read it carefully and not just response to all items in the same way and it is a good way to check for people not taking the questionnaire seriously producing contradictory answers -> should maximise the number of reverse worded items in your scale
Response formats - all closed format questions give a series of choices and there is good practice in terms of the response choices to use
Traditional - yes or no / true or false
Frequency of behaviour measurement
Likert scale
Indication of how much a statement describes you
Try to assess the extent of certain feelings or behaviours, and therefore the responses will assess the extent to which the respondent feels about something
Less and less rules around response formats - but main point is that the response format must make sense in terms of the questions you are asking
instructions - instructions that precede the scale are crucial, and should be simple
Eysenck - yes or no, do not spend too long dwelling on each question, no trick questions and work quickly
If you look for more general traits reflecting typical behaviours and attitudes can be described briefly
Can also specify a time period to complete the test
May also want the participant to think very carefully about their responses, such as thinking about a set of response in relation to a particular circumstance
Have to think carefully about the instructions, because this can be used to not only make the administration of the question easier but also to direct the respondent to look at questions in a particular way if so required
Ethical issues in personality research - Declaration of Helsinki principles (World Medical Association, 1989)
Do good - research of value and include debriefing
Do no harm - sensitive Qs, mood repair, support, right to withdraw
Informed consent - sufficient information, no coercion
Confidentiality / anonymity - identifying information
Avoid deception - use it only when necessary and justifiable and use it sparingly
Evaluation -
Does the design test the theory-based hypothesis?
Is the sampling appropriate?
Is the measure of personality reliable and valid?
Can causal associations be inferred from the results?
Ethical issues
Are participants fully informed in their consent?
Are they making a voluntary choice or being coerced to take part?
Is there any deception?
Is data kept confidential?
Might the procedure be distressing for participants?
Personality in real life - use of personality testing in political advertisement (Cambridge Analytica scandal)
Adverstising based on extraversion and opennes
TikTok algorithms
Scientology personality tests
Matz et al 2007 - social media footprints can provide a good insight into persona and preferences
Judge et al 2008 - organisational behaviour in jobs, social situations etc are all based on personality traits
McCrae and Mottus (2019) - more complex model of personality is better which measures traits in their trait variance at multiple levels of a hierarchy of traits
Types and uses of psychometric tests
Most common measures - personality, ability, motivation, educational and psychological work, clinical assessment and attitude
Personality measures - measure the psychological traits or characteristics of the person that remain relatively stable over time
Ability measures - measure abilities such as intelligence
Motivation and attitude measures - measuring a belief towards something, such as work
Neuropsychological tests uses measures of sensory, perceptual and motor performance to assess different parts of psychophysiological activity and neurological functioning in the brain
Personality tests are wdiely used in occupational testing, ability tests used in education
Psychometric testing is used in clinical settings to diagnose conditions and distinguish between clinical groups - compare to general population to test possible treatments
Neurophysiological tests are used to assess consequences of medical illness or conditions
Developing a psychometric test - Kline (1986) the key is good questions
Distinction between two question types, open format or closed format questions - open format are questions that asked for some written detail but have no determined set of responses and these create qualitative data
However, this can be time consuming for a psychometric researcher as they can receive multiple qualitatively different answers they have to analyse
As a result, closed format questions tend to be favorable - short question with a number of options such as Likert's scale for answers
Kline suggested the best place to start is to write as many items as possible, using the following sources - experts, colleagues and theoretical literature
Phrasing should then be double checked by others and improve them
Suggested the optimal length for any scale measuring one construct should be about 15-20 items - may need to make more initially to ensure you have enough
When you administer a scale, you should have a certain number of participants for each item - at least 5 participants per item, most aim for 10 participants as statistical testing requires a good number of responses to have captured variation of responses across respondents
Also allows us to reduce item number to be proportionate to the amount of participants
If the psychometric test is for clinical or child settings, it can be useful to reduce the number of items
Best way to create is to have a group of experts rate the items in terms of potential effectiveness of measuring the construct - determine those to retain and those to eliminate