4: Ubiquitous Computing
Field Studies

4.1 Introduction

4.2 Three Common Types of Field Studies

  • The type of study you are conducting and why you are conducting a study
    will help you determine the research question for your study
    • Studies of current behavior: What are people doing now?
    • Proof-of-concept studies: Does my novel technology function in the
    real world?
    • Experience using a prototype: How does using my prototype change
    people’s behavior or allow them to do new things?

4.2.1 Current Behavior

4.2.2 Proof of Concept

4.2.3 Experience Using a Prototype

4.3 Study Design

  • It is
    important to realize that there are very few “right” decisions about how
    a field study should be run. Instead, there are many decisions that you
    will need to justify to yourself and your audience (e.g., other researchers,
    reviewers, funding agencies, etc.) as appropriate and sensible in order to
    gather the data needed to address your research question
  • In determining
    your study design, the three important questions to consider are
    • What will your participants do during the study?
      
  • What data will you collect?
  • How long will the study be?

4.3.1 What Will Participants Do?

  • To study current behavior,you might interview participants or log their behavior, whereas in other
    studies participants typically use a prototype.
  • Given that field studies are a choice to sacrifice control of the participant’s
    experience for realism, experimental design techniques used in
    laboratory studies are typically less appropriate for a field study
  • To test the hypothesis, researchers identify a variable,
    called the independent variable, that they will vary between different
    values, called conditions, during the experiment in order to understand the
    effect of variation on the dependent variables they are measuring (e.g., task
    time or user preference).
  • The two main laboratory study designs are withinsubjects
    and between-subjects. In a within-subjects design, also called
    repeated-measures design, each participant experiences all conditions. So,
    if your independent variable was versions of an interface and you have two
    versions (A and B), in a within-subjects study each participant would use
    both versions.
  • However, in withinsubject
    designs you need to worry about whether there will be any learning
    effects. For example, participants might favor version B or be faster in
    using it just because it was the second version they used. Counterbalancing
    or varying the order that different participants experience the conditions is
    used to mitigate any potential learning effects.
  • In a between-subjects design, you divide your participants into different
    groups, typically randomly, and each participant experiences only one
    condition of the independent variable. So half of your participants would
    use version A of the interface and the other half would use version B. This
    approach avoids any potential learning effects, but you generally need to
    have more participants because you cannot directly compare the behavior
    of a single user across the conditions. Finally, some studies use a mixed
    design where some independent variables are within-subjects and some are between-subjects.
  • A within-subjects design also allows you to ask participants for their qualitative
    comparisons between conditions (e.g., different versions of the same
    interfaces). One particularly useful type of within-subjects condition to
    consider having in studies that involve a prototype is a control condition.

4.3.1.1 Control Condition

4.3.2 What Data Will You Collect?

  • During a field study, you can collect quantitative and qualitative data.
  • For field studies, it is valuable to collect both quantitative and qualitative
    data. If you collect only quantitative data you have insight into how people
    behaved, but may have trouble understanding why. If you collect only qualitative
    data you will have insight into why participants did certain things, but
    may have trouble comparing participants or understanding how closely what
    participants thought they did mapped to what they really did.
  • it may also be helpful to look at evaluation metrics
    used by others doing related research. Scholtz and Consolvo (2004) put forth an
    evaluation framework for ubicomp applications that proposes the evaluation
    areas of attention, adoption, trust, conceptual models, interaction, invisibility,
    impact and side effects, appeal, and application robustness.
  • Finally, no matter what data collection methods you choose for your study,
    you must pilot them before the study starts to make sure that you are collecting
    the data you need and that you know how you will analyze the data.

4.3.2.1 Logging

4.3.2.2 Surveys

4.3.2.3 Experience Sampling Methodology

4.3.2.4 Diaries

4.3.2.5 Interviews

4.3.3 How Long Is Your Study?

4.4 Participants

intro A key part of any field study is the participants

4.4.1 Ethical Treatment of Participants

4.4.2 Participant Profile

4.4.3 Number of Participants

4.4.4 Compensation

4.5 Data Analysis

intro: For example, in a proof-of-concept study, the
analysis may be a very straightforward account of whether the technology
worked in the field and participants’ reactions collected through surveys
or interviews.

4.5.1 Statistics

  • To analyze numeric data, there are two main types of statistics: descriptive
    statistics, which describe the data you have collected, and inferential
    statistics, which are used to draw conclusions from the data.

4.5.1.1 Descriptive Statistics

4.5.1.2 Inferential Statistics: Significance Tests

4.5.2 Unstructured Data

  • Most qualitative data, with the exception of some survey data, are unstructured.
    This type of data includes free response questions on surveys,
    answers to interview questions, and any field notes you take down while
    observing participants.

4.5.2.1 Simple Coding Techniques

4.5.2.2 Deriving Themes and Building Theory

4.6 Steps to a Successful Study

intro

4.6.1 Study Design Tips

4.6.1.1 Have a Clear Research Goal

4.6.1.2 Create a Study Design Document

  • A study design document should capture the decisions you make when
    planning your study.

4.6.1.3 Make Scripts for Participant Visits

  • If you are interacting with participants, create a script document for each visit.

4.6.1.4 Pilot Your Study

  • In a pilot study, you run a group of people through the entire study from
    the beginning to end as if they were real participants.

4.6.2 Technology Tips

4.6.2.1 Make Your Technology Robust Enough

  • The “enough” part of “robust enough” is very important in managing the
    effort involved in the study

4.6.2.2 Consider Other Evaluation Methods

  • Before taking the large step of deploying your technology in the wild,
    consider other evaluation methods to identifying as many usability problems
    as possible. In heuristic evaluation, developed by Nielsen and Molich
    (1990), a set of evaluators (which could be you and your colleagues) uses a
    small set of heuristics to critique your technology and identify problems
  • Laboratory studies before your field study
    can also be very valuable to ensure that your technology is usable.

4.6.2.3 Use Existing Technology

4.6.2.4 Get Reassuring Feedback

  • Once your technology has gone into the field, look for means to reassure
    yourself it is working as you expect
  • As mentioned previously, if your technology is not logging data to a central server, consider having it send you periodic “everything’s fine” messages so you can detect problems as soon as possible.

4.6.2.5 Negative Results

  • Do not plan a study
    that relies on adoption and usage as the only dependent variable, because
    you will be in trouble if people do not adopt your technology.

4.6.3 Running the Study

4.6.3.1 Have a Research Team

4.6.3.2 Make Participants Comfortable

4.6.3.3 Safety

4.6.3.4 Be Flexible

4.6.4 Data Collection and Analysis

4.6.4.1 Be Objective

4.6.4.2 The Participant Is Always Right

4.6.4.3 Do Not Make Inappropriate Claims

  • A limitations section in a paper or
    presentation that acknowledges potential limitations (e.g., a small number
    of participants from a limited geographic region) of the study helps make
    clear to the audience that you are not making inappropriate claims

4.7 Conclusion

a. Although using a variety of methods to incorporate user needs and feedback
throughout the process of designing technology is critical, this chapter
describes how to plan and conduct a ἀeld study, also referred to as an in situ
study

b. As other researchers have argued (e.g., Consolvo et al., 2007; Rogers et al.,
2007), field studies are often the most appropriate method for studying people’s
use of ubicomp technologies.

c. The trade-off
for increased realism is a loss of control over the participant’s experience, so
field studies are not appropriate for all evaluations; indeed, for many research
questions, a laboratory study where you have complete control over the environment
may be more appropriate.

d. You should not undertake a field
study because you think it is a requirement to get a paper accepted to a conference
or because you would just like to see how people use your ubicomp
application, but rather because your research questions requires it.

  • This type of
    field study explores how people use existing technology. The contributions
    of this type of study are an understanding of current behavior and implications
    for future technology.
  • For this type of study, technological
    advance is the primary contribution of research rather than field
    study. However, it may be important to conduct a field study to validate the
    feasibility of an approach or prototype in a real-world environment. These
    field studies may be shorter than the other two types and the research
    questions generally focus on whether the prototype or algorithm functions
    appropriately in a real environment.
  • The main contribution of this type of study is the experience
    of the people using the prototype. Although the technology deployed
    is typically not commercially available, it may not be a novel contribution. In
    some cases, researchers may conduct a Wizard of Oz study, where aspects of
    a prototype or system are simulated in order to understand the participants’
    reactions to systems that are too expensive to fully build and deploy.
    It is particularly important to take care in specifying your research
  • So, rather than focusing specifically on how
    participants will use a prototype, better research questions focus on the concept
    the prototype embodies or tests, for example, “Does sharing location
    information lead to privacy concerns?” or “Will peripheral displays enhance
    family awareness?”
  • In a control condition, you measure the dependent variables for a certain
    period before you introduce the technology (e.g., logging for a week a behavior
    that you think might change), then introduce your technology and measure
    the dependent variables again
  • However, collecting control data is not appropriate for all ubicomp studies,
    because your prototype may afford a behavior that was impossible without
    it and thus there is no meaningful control condition to compare against.
  • For example, if you wanted to give the location-based mobile application to
    people that had never used a mobile phone before, you could not compare
    against previous use of mobile phones, but you might try to collect data
    about how often the participant communicated using landline phones or
    other communication methods to compare against.
  • In addition to deciding on your study method, if you are introducing
    a new technology in your study, there are a number of pragmatic
    considerations
    a. Will participants use the technology as they • wish or to complete
    specific tasks?
    b. Will you give the participant technology to use or augment the technology
    the participant already owns?
    c. Should you simulate any part of the participant’s experience?
  • In field studies, logging is often the main method for collecting quantitative
    data about usage, either of existing technology or your novel technology.
    When logging data, your prototype typically writes information to
    a data file when things occur that you want to know about.
  • Surveys are often used to gather data before a field study begins (presurvey),
    after any changes of condition in a between-subjects study (postcondition),
    and at the end of the study (postsurvey).
  • In ESM, participants are
    asked to fill out short questionnaires at various points throughout their day,
    asking about their experience at that time.
  • ESM allows the researcher to collect
    qualitative data throughout the study, which has advantages over asking
    participants later to try to recall what they were thinking or feeling, or why
    they took some action
  • Participants can be asked to complete a survey either
    randomly throughout the day, at scheduled times, or based on an event.
  • Although it
    is most often used to gather qualitative data, you can also use ESM to gather
    quantitative data based on events, for example, recording the location of a
    participant every time he or she answers a call on their mobile phone.
  • Similar in spirit to ESM, some studies gather data by asking participants to
    record information about what they do, typically referred to as a “diary.”
  • This method is frequently used when participants are making diary entries
    about something that would not be possible to sense using an ESM tool, thereby rendering event-based ESM inappropriate.
  • Many of the considerations for diary studies are similar to those for ESM, such as what you will ask your participant to record in each diary entry.
  • However, for diary studies there are typically greater concerns about participation, because participants are typically not carrying a device that interrupts them as
    in an ESM study.
  • Another option is asking participant to retrospectively construct a diary, as the Phone Proximity study had participants do at the weekly interview for the previous day using the Day Reconstruction Method
  • During field studies, researchers frequently conduct “semistructured
    interviews.”
  • Retrospective interviews using video can be
    a valuable method to use for asking participants about situations in which
    they cannot be interrupted (e.g., playing basketball, performing surgery).
  • What type of study is it? Proof-of-concept • studies may be on the
    shorter side if less time is needed to prove the feasibility of the prototype.
  • Studies of experience using a prototype are usually
    longer because the study is the contribution, whereas studies
    of current behavior vary widely.
  • Do you expect novelty effects to be an issue? Often, when using new
    technology, people start out very enthusiastically using it and then
    decrease their usage. Unfortunately, there is no guarantee about how
    long novelty effects last. If you are worried about novelty effects, try
    to make your study as long as possible and be wary of basing too
    many of your findings on usage from the beginning of the study

4.3.2.6 Unstructured Observation

  • Participants should receive a consent form at the beginning of the study
    to review and sign to signify that they have consented to participate
  • Identifying the participants you would like to recruit for your field study
    depends on the research goals of your study
  • It is also best to have participants
    who are not involved in any way with your research. This reduces
    the chances that they are biased by knowledge they might have of your
    study or goals. Finally, recruiting different types of participants and comparing
    between them is a common type of independent variable
  • recognize that your research question will help you decide how
    to rank the importance of different aspects of your participant profile.
    Depending on what is important for your study, you may be forced to make
    trade-offs in other criteria.
  • Are there any conditions in your study (• e.g., between or within
    subjects)?
  • What claims are you trying to make? Is this a proof-of-concept
    study?
  • Plan for participants to drop out.
  • Time to recruit participants
  • However, as researchers in
    that study noted, when you do not compensate your participants you need
    to consider bias.
  • Will the compensation method affect the • data collected?
  • The statistics that are appropriate
    to use depend on how a variable was measured, referred to as its
    level of measurement
  • The three common levels of measurement for field study variables
    are described below
    i. Nominal Variables where the possible answers represent unordered
    categories are referred to as nominal, or sometimes categorical. For
    nominal variables, you can only report the frequency that each category
    occurred. For example, gender is a nominal variable where the count of
    responses can be reported (e.g., Phone Proximity study had 10 male and
    10 female participants), but there is no concept of ordering between the
    response categories

ii. Ordinal Variables measured on an ordinal scale represent a rank
order preference without a precise numeric difference between different
categories. For example, a survey question with five possible responses of
daily, weekly, monthly, and almost never, is measured on an ordinal scale,
because the response options can be ordered from more to less frequent,
but not added or subtracted. For ordinal variables, both the frequency that
each category occurred and the median value can be reported.

  • Answers to Likert scale questions on a survey are the most common
    example of ordinal variables collected during a field study. You can compare
    whether different participant’s answers are more positive or less positive
    than another, but they cannot be added or subtracted

iii. Interval For variables measured on an interval scale, the difference
between any two values is numerically meaningful. Interval variables
can be added and subtracted—for example, a person’s age in years,
the number of times someone performed a particular action, how long
an action took, or the number of ESM surveys a participant answered.
Descriptive statistics valid for interval data include sum, mean, and
median.

  • It is important to examine interval data for outliers. Outliers affect the
    mean, so always report the standard deviation if you report the mean value
    for a variable.
  • Once you have computed descriptive statistics for a variable, one type of
    inferential statistics, significance tests, allow you to determine whether the
    results found in your sample of participants are statistically significant or
    might be due to sampling errors.
  • The use of inferential statistics in analyzing
    field study data is rare since the small number of participants typically
    feasible to have in a field study makes it difficult to collect enough data for
    many statistical tests to be appropriate
  • To conduct a significance test comparing descriptive statistics, you first
    determine the variable you wish to compare and the appropriate groups of
    participants or different conditions to compare between.
  • For example, an independent samples
    t-test is appropriate to use when comparing the mean of a variable
    with a normal distribution between two groups, whereas analysis of variance
    (ANOVA) tests are used for comparing across more than two groups.
  • However, many of the nonparametric
    equivalents (e.g., Mann-Whitney U, Kruskal-Wallis) that do not assume
    a variable has a normal distribution, may be more appropriate for field
    study data since they make fewer assumptions about the data that have
    been collected.
  • Regardless of what statistical test you use, significance tests start with
    the assumption that there is no difference between the groups for the variable
    being examined (referred to as the null hypothesis). If a difference
    is observed (e.g., the means or medians are different), there are two possibilities:
    there is a difference between the groups or that there is sampling
    error in the data.
  • The p value indicates how likely it is that the data might
    be wrong. Researchers often use a cutoff of either p < 0.01 or p < 0.05 to
    determine if the test results are statistically significant. If p < 0.01, there is
    a 99% chance that the data collected represent a real difference between
    the groups rather than a sampling error (or a 95% chance for p < 0.05).
  • Strauss and Corbin (1998, p. 3) broadly define coding as “the analytic
    processes through which data are fractured, conceptualized and turned
    into theory.”
  • However, the simplest coding techniques consist of closely
    examining your data and counting the number of times a concept or
    theme reoccurs, essentially turning qualitative data into quantitative data.
  • Depending on your study, it may be appropriate to have one person code
    the data. However, multiple coders, sometimes referred to as raters, are
    often used if there is a large amount of data to code. When multiple raters
    code, it is necessary to check for interrater reliability, agreement between
    the raters, to make sure different people are coding the data consistently.
  • More typically, multiple
    raters each code the same subset of data (in addition to mutually exclusive
    subsets), and then a test such as Cohen’s kappa is used to report interrater
    reliability on the overlapping set of data and show that the raters are coding
    consistently.
  • Two common methods used are affinity diagramming
    and grounded theory.
  • Based on the affinity process introduced by Kawakita, Beyer and
    Holtzblatt (1998) created an affinity diagramming process, as part of their
    Contextual Design process to develop user-centered systems. Their affinity
    diagram process is designed to organize a large number of notes captured
    during observations or interviews into a hierarchy to understand
    common issues and themes present in the data.
  • The philosophy of affinity diagrams where issues and themes are derived
    from the data using a bottom-up approach was influenced by the grounded
    theory method, developed by Glaser and Strauss (1967).
  • Grounded theory emphasizes building theories from the observed data rather than starting from preconceived hypothesis or theories. Researchers begin by conducting
    a microscopic examination of a subset of their data to identify concepts
    and categories in the data and relationships between them. These
    categories are then used and adjusted as needed while coding the rest of
    the data.
  • Although the amount of effort involved in conducting a field study may
    seem a bit daunting, there is really no substitute for the inspiration and
    understanding you will gain from interacting with participants in the
    field. Regardless of whether your field study involves observing people’s
    current behavior, conducting a proof-of-concept study, or deploying your
    technology to participants for a long period, you will learn something that
    surprises you and helps you to move your research forward.