4: Ubiquitous Computing Field Studies

4: Ubiquitous Computing
Field Studies

4.1 Introduction

4.2 Three Common Types of Field Studies

The type of study you are conducting and why you are conducting a study
will help you determine the research question for your study
• Studies of current behavior: What are people doing now?
• Proof-of-concept studies: Does my novel technology function in the
real world?
• Experience using a prototype: How does using my prototype change
people’s behavior or allow them to do new things?

4.2.1 Current Behavior

4.2.2 Proof of Concept

4.2.3 Experience Using a Prototype

4.3 Study Design

It is
important to realize that there are very few “right” decisions about how
a field study should be run. Instead, there are many decisions that you
will need to justify to yourself and your audience (e.g., other researchers,
reviewers, funding agencies, etc.) as appropriate and sensible in order to
gather the data needed to address your research question
In determining
your study design, the three important questions to consider are
- ```
What will your participants do during the study?
```
What data will you collect?
How long will the study be?

4.3.1 What Will Participants Do?

To study current behavior,you might interview participants or log their behavior, whereas in other
studies participants typically use a prototype.
Given that field studies are a choice to sacrifice control of the participant’s
experience for realism, experimental design techniques used in
laboratory studies are typically less appropriate for a field study
To test the hypothesis, researchers identify a variable,
called the independent variable, that they will vary between different
values, called conditions, during the experiment in order to understand the
effect of variation on the dependent variables they are measuring (e.g., task
time or user preference).
The two main laboratory study designs are withinsubjects
and between-subjects. In a within-subjects design, also called
repeated-measures design, each participant experiences all conditions. So,
if your independent variable was versions of an interface and you have two
versions (A and B), in a within-subjects study each participant would use
both versions.
However, in withinsubject
designs you need to worry about whether there will be any learning
effects. For example, participants might favor version B or be faster in
using it just because it was the second version they used. Counterbalancing
or varying the order that different participants experience the conditions is
used to mitigate any potential learning effects.
In a between-subjects design, you divide your participants into different
groups, typically randomly, and each participant experiences only one
condition of the independent variable. So half of your participants would
use version A of the interface and the other half would use version B. This
approach avoids any potential learning effects, but you generally need to
have more participants because you cannot directly compare the behavior
of a single user across the conditions. Finally, some studies use a mixed
design where some independent variables are within-subjects and some are between-subjects.
A within-subjects design also allows you to ask participants for their qualitative
comparisons between conditions (e.g., different versions of the same
interfaces). One particularly useful type of within-subjects condition to
consider having in studies that involve a prototype is a control condition.

4.3.1.1 Control Condition

4.3.2 What Data Will You Collect?

During a field study, you can collect quantitative and qualitative data.
For field studies, it is valuable to collect both quantitative and qualitative
data. If you collect only quantitative data you have insight into how people
behaved, but may have trouble understanding why. If you collect only qualitative
data you will have insight into why participants did certain things, but
may have trouble comparing participants or understanding how closely what
participants thought they did mapped to what they really did.
it may also be helpful to look at evaluation metrics
used by others doing related research. Scholtz and Consolvo (2004) put forth an
evaluation framework for ubicomp applications that proposes the evaluation
areas of attention, adoption, trust, conceptual models, interaction, invisibility,
impact and side effects, appeal, and application robustness.
Finally, no matter what data collection methods you choose for your study,
you must pilot them before the study starts to make sure that you are collecting
the data you need and that you know how you will analyze the data.

4.3.2.1 Logging

4.3.2.2 Surveys

4.3.2.3 Experience Sampling Methodology

4.3.2.4 Diaries

4.3.2.5 Interviews

4.3.3 How Long Is Your Study?

4.4 Participants

intro A key part of any field study is the participants

4.4.1 Ethical Treatment of Participants

4.4.2 Participant Profile

4.4.3 Number of Participants

4.4.4 Compensation

4.5 Data Analysis

intro: For example, in a proof-of-concept study, the
analysis may be a very straightforward account of whether the technology
worked in the field and participants’ reactions collected through surveys
or interviews.

4.5.1 Statistics

To analyze numeric data, there are two main types of statistics: descriptive
statistics, which describe the data you have collected, and inferential
statistics, which are used to draw conclusions from the data.

4.5.1.1 Descriptive Statistics

4.5.1.2 Inferential Statistics: Significance Tests

4.5.2 Unstructured Data

Most qualitative data, with the exception of some survey data, are unstructured.
This type of data includes free response questions on surveys,
answers to interview questions, and any field notes you take down while
observing participants.

4.5.2.1 Simple Coding Techniques

4.5.2.2 Deriving Themes and Building Theory

4.6 Steps to a Successful Study

intro

4.6.1 Study Design Tips

4.6.1.1 Have a Clear Research Goal

4.6.1.2 Create a Study Design Document

A study design document should capture the decisions you make when
planning your study.

4.6.1.3 Make Scripts for Participant Visits

If you are interacting with participants, create a script document for each visit.

4.6.1.4 Pilot Your Study

In a pilot study, you run a group of people through the entire study from
the beginning to end as if they were real participants.

4.6.2 Technology Tips

4.6.2.1 Make Your Technology Robust Enough

The “enough” part of “robust enough” is very important in managing the
effort involved in the study

4.6.2.2 Consider Other Evaluation Methods

Before taking the large step of deploying your technology in the wild,
consider other evaluation methods to identifying as many usability problems
as possible. In heuristic evaluation, developed by Nielsen and Molich
(1990), a set of evaluators (which could be you and your colleagues) uses a
small set of heuristics to critique your technology and identify problems
Laboratory studies before your field study
can also be very valuable to ensure that your technology is usable.

4.6.2.3 Use Existing Technology

4.6.2.4 Get Reassuring Feedback

Once your technology has gone into the field, look for means to reassure
yourself it is working as you expect
As mentioned previously, if your technology is not logging data to a central server, consider having it send you periodic “everything’s fine” messages so you can detect problems as soon as possible.

4.6.2.5 Negative Results

Do not plan a study
that relies on adoption and usage as the only dependent variable, because
you will be in trouble if people do not adopt your technology.

4.6.3 Running the Study

4.6.3.1 Have a Research Team

4.6.3.2 Make Participants Comfortable

4.6.3.3 Safety

4.6.3.4 Be Flexible

4.6.4 Data Collection and Analysis

4.6.4.1 Be Objective

4.6.4.2 The Participant Is Always Right

4.6.4.3 Do Not Make Inappropriate Claims

A limitations section in a paper or
presentation that acknowledges potential limitations (e.g., a small number
of participants from a limited geographic region) of the study helps make
clear to the audience that you are not making inappropriate claims

4.7 Conclusion

a. Although using a variety of methods to incorporate user needs and feedback
throughout the process of designing technology is critical, this chapter
describes how to plan and conduct a ἀeld study, also referred to as an in situ
study

b. As other researchers have argued (e.g., Consolvo et al., 2007; Rogers et al.,
2007), field studies are often the most appropriate method for studying people’s
use of ubicomp technologies.

c. The trade-off
for increased realism is a loss of control over the participant’s experience, so
field studies are not appropriate for all evaluations; indeed, for many research
questions, a laboratory study where you have complete control over the environment
may be more appropriate.

d. You should not undertake a field
study because you think it is a requirement to get a paper accepted to a conference
or because you would just like to see how people use your ubicomp
application, but rather because your research questions requires it.

This type of
field study explores how people use existing technology. The contributions
of this type of study are an understanding of current behavior and implications
for future technology.

For this type of study, technological
advance is the primary contribution of research rather than field
study. However, it may be important to conduct a field study to validate the
feasibility of an approach or prototype in a real-world environment. These
field studies may be shorter than the other two types and the research
questions generally focus on whether the prototype or algorithm functions
appropriately in a real environment.

The main contribution of this type of study is the experience
of the people using the prototype. Although the technology deployed
is typically not commercially available, it may not be a novel contribution. In
some cases, researchers may conduct a Wizard of Oz study, where aspects of
a prototype or system are simulated in order to understand the participants’
reactions to systems that are too expensive to fully build and deploy.
It is particularly important to take care in specifying your research
So, rather than focusing specifically on how
participants will use a prototype, better research questions focus on the concept
the prototype embodies or tests, for example, “Does sharing location
information lead to privacy concerns?” or “Will peripheral displays enhance
family awareness?”

In a control condition, you measure the dependent variables for a certain
period before you introduce the technology (e.g., logging for a week a behavior
that you think might change), then introduce your technology and measure
the dependent variables again
However, collecting control data is not appropriate for all ubicomp studies,
because your prototype may afford a behavior that was impossible without
it and thus there is no meaningful control condition to compare against.
For example, if you wanted to give the location-based mobile application to
people that had never used a mobile phone before, you could not compare
against previous use of mobile phones, but you might try to collect data
about how often the participant communicated using landline phones or
other communication methods to compare against.
In addition to deciding on your study method, if you are introducing
a new technology in your study, there are a number of pragmatic
considerations
a. Will participants use the technology as they • wish or to complete
specific tasks?
b. Will you give the participant technology to use or augment the technology
the participant already owns?
c. Should you simulate any part of the participant’s experience?

In field studies, logging is often the main method for collecting quantitative
data about usage, either of existing technology or your novel technology.
When logging data, your prototype typically writes information to
a data file when things occur that you want to know about.

Surveys are often used to gather data before a field study begins (presurvey),
after any changes of condition in a between-subjects study (postcondition),
and at the end of the study (postsurvey).

In ESM, participants are
asked to fill out short questionnaires at various points throughout their day,
asking about their experience at that time.
ESM allows the researcher to collect
qualitative data throughout the study, which has advantages over asking
participants later to try to recall what they were thinking or feeling, or why
they took some action
Participants can be asked to complete a survey either
randomly throughout the day, at scheduled times, or based on an event.
Although it
is most often used to gather qualitative data, you can also use ESM to gather
quantitative data based on events, for example, recording the location of a
participant every time he or she answers a call on their mobile phone.

Similar in spirit to ESM, some studies gather data by asking participants to
record information about what they do, typically referred to as a “diary.”
This method is frequently used when participants are making diary entries
about something that would not be possible to sense using an ESM tool, thereby rendering event-based ESM inappropriate.
Many of the considerations for diary studies are similar to those for ESM, such as what you will ask your participant to record in each diary entry.
However, for diary studies there are typically greater concerns about participation, because participants are typically not carrying a device that interrupts them as
in an ESM study.
Another option is asking participant to retrospectively construct a diary, as the Phone Proximity study had participants do at the weekly interview for the previous day using the Day Reconstruction Method

During field studies, researchers frequently conduct “semistructured
interviews.”
Retrospective interviews using video can be
a valuable method to use for asking participants about situations in which
they cannot be interrupted (e.g., playing basketball, performing surgery).

What type of study is it? Proof-of-concept • studies may be on the
shorter side if less time is needed to prove the feasibility of the prototype.
Studies of experience using a prototype are usually
longer because the study is the contribution, whereas studies
of current behavior vary widely.
Do you expect novelty effects to be an issue? Often, when using new
technology, people start out very enthusiastically using it and then
decrease their usage. Unfortunately, there is no guarantee about how
long novelty effects last. If you are worried about novelty effects, try
to make your study as long as possible and be wary of basing too
many of your findings on usage from the beginning of the study

4.3.2.6 Unstructured Observation

Participants should receive a consent form at the beginning of the study
to review and sign to signify that they have consented to participate

Identifying the participants you would like to recruit for your field study
depends on the research goals of your study
It is also best to have participants
who are not involved in any way with your research. This reduces
the chances that they are biased by knowledge they might have of your
study or goals. Finally, recruiting different types of participants and comparing
between them is a common type of independent variable
recognize that your research question will help you decide how
to rank the importance of different aspects of your participant profile.
Depending on what is important for your study, you may be forced to make
trade-offs in other criteria.

Are there any conditions in your study (• e.g., between or within
subjects)?
What claims are you trying to make? Is this a proof-of-concept
study?
Plan for participants to drop out.
Time to recruit participants

However, as researchers in
that study noted, when you do not compensate your participants you need
to consider bias.
Will the compensation method affect the • data collected?

The statistics that are appropriate
to use depend on how a variable was measured, referred to as its
level of measurement
The three common levels of measurement for field study variables
are described below
i. Nominal Variables where the possible answers represent unordered
categories are referred to as nominal, or sometimes categorical. For
nominal variables, you can only report the frequency that each category
occurred. For example, gender is a nominal variable where the count of
responses can be reported (e.g., Phone Proximity study had 10 male and
10 female participants), but there is no concept of ordering between the
response categories

ii. Ordinal Variables measured on an ordinal scale represent a rank
order preference without a precise numeric difference between different
categories. For example, a survey question with five possible responses of
daily, weekly, monthly, and almost never, is measured on an ordinal scale,
because the response options can be ordered from more to less frequent,
but not added or subtracted. For ordinal variables, both the frequency that
each category occurred and the median value can be reported.

Answers to Likert scale questions on a survey are the most common
example of ordinal variables collected during a field study. You can compare
whether different participant’s answers are more positive or less positive
than another, but they cannot be added or subtracted

iii. Interval For variables measured on an interval scale, the difference
between any two values is numerically meaningful. Interval variables
can be added and subtracted—for example, a person’s age in years,
the number of times someone performed a particular action, how long
an action took, or the number of ESM surveys a participant answered.
Descriptive statistics valid for interval data include sum, mean, and
median.

It is important to examine interval data for outliers. Outliers affect the
mean, so always report the standard deviation if you report the mean value
for a variable.

Once you have computed descriptive statistics for a variable, one type of
inferential statistics, significance tests, allow you to determine whether the
results found in your sample of participants are statistically significant or
might be due to sampling errors.
The use of inferential statistics in analyzing
field study data is rare since the small number of participants typically
feasible to have in a field study makes it difficult to collect enough data for
many statistical tests to be appropriate
To conduct a significance test comparing descriptive statistics, you first
determine the variable you wish to compare and the appropriate groups of
participants or different conditions to compare between.
For example, an independent samples
t-test is appropriate to use when comparing the mean of a variable
with a normal distribution between two groups, whereas analysis of variance
(ANOVA) tests are used for comparing across more than two groups.
However, many of the nonparametric
equivalents (e.g., Mann-Whitney U, Kruskal-Wallis) that do not assume
a variable has a normal distribution, may be more appropriate for field
study data since they make fewer assumptions about the data that have
been collected.
Regardless of what statistical test you use, significance tests start with
the assumption that there is no difference between the groups for the variable
being examined (referred to as the null hypothesis). If a difference
is observed (e.g., the means or medians are different), there are two possibilities:
there is a difference between the groups or that there is sampling
error in the data.
The p value indicates how likely it is that the data might
be wrong. Researchers often use a cutoff of either p < 0.01 or p < 0.05 to
determine if the test results are statistically significant. If p < 0.01, there is
a 99% chance that the data collected represent a real difference between
the groups rather than a sampling error (or a 95% chance for p < 0.05).

Strauss and Corbin (1998, p. 3) broadly define coding as “the analytic
processes through which data are fractured, conceptualized and turned
into theory.”
However, the simplest coding techniques consist of closely
examining your data and counting the number of times a concept or
theme reoccurs, essentially turning qualitative data into quantitative data.
Depending on your study, it may be appropriate to have one person code
the data. However, multiple coders, sometimes referred to as raters, are
often used if there is a large amount of data to code. When multiple raters
code, it is necessary to check for interrater reliability, agreement between
the raters, to make sure different people are coding the data consistently.
More typically, multiple
raters each code the same subset of data (in addition to mutually exclusive
subsets), and then a test such as Cohen’s kappa is used to report interrater
reliability on the overlapping set of data and show that the raters are coding
consistently.

Two common methods used are affinity diagramming
and grounded theory.
Based on the affinity process introduced by Kawakita, Beyer and
Holtzblatt (1998) created an affinity diagramming process, as part of their
Contextual Design process to develop user-centered systems. Their affinity
diagram process is designed to organize a large number of notes captured
during observations or interviews into a hierarchy to understand
common issues and themes present in the data.
The philosophy of affinity diagrams where issues and themes are derived
from the data using a bottom-up approach was influenced by the grounded
theory method, developed by Glaser and Strauss (1967).
Grounded theory emphasizes building theories from the observed data rather than starting from preconceived hypothesis or theories. Researchers begin by conducting
a microscopic examination of a subset of their data to identify concepts
and categories in the data and relationships between them. These
categories are then used and adjusted as needed while coding the rest of
the data.

Although the amount of effort involved in conducting a field study may
seem a bit daunting, there is really no substitute for the inspiration and
understanding you will gain from interacting with participants in the
field. Regardless of whether your field study involves observing people’s
current behavior, conducting a proof-of-concept study, or deploying your
technology to participants for a long period, you will learn something that
surprises you and helps you to move your research forward.