Please enable JavaScript.
Coggle requires JavaScript to display documents.
An Iterative Design Methodology for User-Friendly Natural Language Office…
An Iterative Design Methodology
for User-Friendly Natural Language Office Information Applications
INTRODUCTION
a.Computerized office applications employ one of three modes of input: menus, command languages, or natural language.
b. According to one view. the suitability
of a particular mode of input depends on the "semantic" and "syntactic knowledge" of the user
c. Menus might best serve a person who is not familiar with
the computer's command structure (low syntactic knowledge) and who is uncertain how to proceed in solving his particular problem (low semantic knowledge);
d. Command languages are good for people who know what steps they want to undertake to solve a problem and are familiar with the computer's syntax for accomplishing each step.
e. This view holds that a natural language interface might be appropriate for people who have a
high level of semantic knowledge in a problem domain, but aren't familiar with any special computer syntax for achieving their goals.
f. The principal purpose of the research reported here was to design and test a systematic, empirical methodology for developing context-dependent natural language computer applications.
METHODOLOGY
(1) Task analysis.: Twenty-three business professionals were interviewed extensively to discover how they keep their appointment calendars. That information provided a starting point for the functional specification of a computerized calendar [4].
(2) Deep structure development. : In this second step of program development, the database manipulating functions were written in APL.
(3) First run of OZ (simulation): Here, no language processing components were in place. The experimenter simulated the system in toto. This simulation is similar to the ones used in [2] and [12].
(4) First-approximation language processor: The corpus of inputs obtained in step three was used to develop a first approximation of the language processing subroutines (described in [6]).
(6) Cross-validation: The final program was tested with six additional participants to see how well it performed. In this step the program ran without any assistance from the experimenter. Various measures of program speed, "understanding," and efficiency were combined with the results of postsession interviews to evaluate CAL's success.
(5) Second run of OZ (intervention). : This was the iterative design phase of program development. Fifteen participants used the program, and the experimenter intervened as necessary to keep the dialog flowing. As this step progressed, and as the dictionaries and functions were augmented, the experimenter was phased out of the communications loop.
2.1 Apparatus:
Participants and experimenter communicated via IBM 3277 displays and keyboards to an APLSV program residing in an IBM 370/168 host system
Both participant and experimenter, working in separate rooms, communicated with the host system via a Bell System 4800 baud modem and an IBM 3271 controller
The keyboard used by the participants in the iterative design phase of OZ and in the cross-validation was modified by abbreviating the available editing functions to include only backspace, forward cursor motion, and deletion of one character at a time.Aside from the RESET and ENTER keys, all other function keys were masked and disabled.
-The slave terminal was useful in two ways. Its primary use was to prepare the experimenter by giving him an advanced view of the message being composed by the participant. On a few occasions, the slave terminal allowed the experimenter to rescue the participant from certain difficult situations surreptitiously
2.2 Problem Solving Task
Participants were asked to tell the computer about whatever routine and nonroutine appointments they had in the next two weeks or so.
-Pilot work showed that this minimal goal was sufficient to provide some focus for the participant in "trying out" the system.
-To control for the possibility that material in the overview might affect the language generated by the participants, questions about the form and frequency of manual use were included in the postsession interviews.
2.3 Participants:
Examples of professions included in the participant pool were Jesuit priest, symphony conductor, auto repair manager, real estate saleswoman, clothing store owner, clinical psychologist, architect, dental assistant, flight instructor, homemaker, bank manager, attorney, and an appointments secretary to a US senator.
2.4 Procedure
Each participant took part in a single experimental session. After a background questionnaire was filled out, a short (5 minute) interactive keyboard tutorial was run by the experimenter and the participant together.
After the introductory tutorial, the participants were advised that the experimenter would be in the next room "keeping more or less of an eye on the printout of the session," and that they could call him on the intercom if they had any questions.
During the iterative design (intervention) phase of OZ, the experimenter intervened in the session when the fledgling system made mistakes. After each of the intervention iterations (sessions), the dictionaries and programming of CAL were augmented and enhanced to accommodate the grammatical structures and functional requirements of the inputs from that session.
In addition, a batch type program was written allowing CAL to be subjected to a large corpus of difficult inputs from previous sessions to make sure that changes made in the programming or dictionaries would not interfere with the previous capabilities of the system.
After each of the 15 participants took part in an intervention session during the iterative design phase of CAL's development, the experimenter made the decision that the development had reached the point of diminishing returns (a judgment that was borne out in examination of the approach to asymptote dictionary growth,
The development process for CAL comprised six steps [5]. Central to the
methodology is an experimental simulation which I call the OZ paradigm, in which experimental participants are given the impression that they are interacting with a program that understands English as well as another human would.
RESULTS
3.1 Participants' Performance
A straightforward extension of the development methodology would have made CAL able to handle most of the examples.
Word dictionary size as a function of time (total number of word types entered into the system). The vertical divisions in this chart represent participant boundaries.
The participants took an average of 58 seconds to compose and enter a message; CAL responded within a few seconds.
3.2 Program Growth
The figure shows the growth in number of unique, recognized words (types) in the master word dictionary as a function of time (as measured by the total number of word types entered into the system) as the development phase progressed through its iterations. The vertical divisions in the chart of Figure 1 represent participant {session, iteration) boundaries.
A chart of the growth of recognized word synonym categories would show a similar quick approach to asymptote.
Another perspective on dictionary growth comes from the analysis of overlaps among the sets of words used in each session. Each participant contributed, on the average, 1.91 unique words to the total pool (mode = 1). This represents a measure of the acceleration of dictionary growth at asymptote; it means that most people used only one or two words that no one else used during the development and cross-validation phases.
3.3 Program Performance
a. Errors.
During that process, there were three occasions on which the program's failure to correctly process an unambiguous input resulted in the storage of an incorrect appointment. These all occurred when one participant typed in times without colons (e.g., "630" instead of "6:30") I consider this a program error rather than a user error because I feel that this input would be unambiguous to a human being. As such, it should have been clear to CAL.
On two occasions the program failed to correctly process the time description in an unambiguous input, recognized that it was confused, and abandoned the attempt to store the appointment. In the first case, the user left out a space between two words: the month and the date. In the second case, CAL caught its own error when a participant entered a multiday appointment in an unrecognized format
There were five occasions when CAL made errors of varying magnitude, but, accommodating the possibility that there might be something wrong in its interpretation, engaged the user in a successful clarification dialog.
b. Efficiency.
A simplistic view of these numbers is as follows: about half of the times that an appointment is entered, CAL engages the user in further dialog in order to confirm its understanding of the input or, in a few cases, to warn of a potential scheduling conflict. Also, a change of an appointment usually requires three inputs, except for the few times (seven, to be precise) that users figured out how to skip a step taking only two inputs, and the fewer times (four) that the change became complicated requiring four or five steps
3.4 Postsession Interviews
The interview results are presented here with several goals in mind.
First, comments on reaction to "bugs," response time, and experimenter interference could act as pointers for others contemplating use of the simulation techniques reported here.
Second, individual comments on CAL's style and mode of operation might prove useful in the design of other office applications.
Third, some of the comments point to potential strengths and weaknesses of natural language as a mode of input.
a. Effectiveness of simulation: participants quite readily accept the low-level deception inherent in the OZ paradigm. In spite of an occasional spelling error or other human fault in the "computer output" simulated by the experimenter, no subject ever seriously questioned the proposition that there was a computer acting alone on the other end of the line. This relates to Weizenbaum's observation [13] that human parties to a communication interaction attribute all sorts of world knowledge and understanding to their partners. It almost seems to require a positive effort to convince participants that there is less to the computer program than meets the eye.
b. Utility: Three respondents felt that their old paper and pencil way of doing things was better than a computerized approach could ever be ("I think my little calendar that I have at home is much easier."). Seven saw advantages to CAL, but would want changes made (such as portable terminals and/or daily printouts) before using it for themselves. Nine participants were unequivocal in their praise of CAL's potential.
c. Keyboard and display: Aside from this, and a problem a few people had in hitting the carriage return (next line key) instead of the ENTER key, most participants found the pared-down keyboard/display system "real easy."
d. Output language style: The consensus was that CAL's output language was "very polite" and "friendly." "The friendliness level was appropriate, not overly unctuous." One computer-unsophisticated participant thought that CAL "was kind."
e. Input flexibility: The prevailing view was that CAL was flexible in terms of the variations it would accept. A few people (none in the cross-validation group) found the program "fairly demanding
f. Perceived comprehension: In this category (which interacts somewhat with the previous one), three respondents (including one cross-validation participant) felt that CAL's comprehension of the English language left something to be desired
g. Perceived accuracy: One cross-validation participant and four others suggested that they would prefer a little more experience with the program first.
h. Use of "manual"/examples: While there was no manual provided per se, the two-page overview of the system contained some examples of language
i. Estimated training time: Most participants acknowledged that they were learning about the system as the one-hour session progressed. No one thought that remembering how to use CAL after an absence of a week or so would pose any problems.
j. Reaction to "bugs": Seven participants commented that the "bugs" or unexplainable problems that sometimes crop up with computers didn't affect their attitudes much.
k. Response time.: Half of the cross-validation participants and ten others found CAL's response time "too slow," several feeling that the response "really should be instantaneous." (The response times during the iterative development phase were longer than in the cross-validation, owing to the time lag involved in interventions.)
l. Program assumptions: Human parties to communication are able to make inferences and assumptions, thus filling in missing pieces of information. CAL is endowed with some minor examples of the same ability. In order to lighten the load on the user, CAL can proceed with an incomplete specification of a time interval, inferring the missing pieces on the basis of knowledge in its own world
model (e.g., one such item of knowledge is that an appointment from "11 till 2 tomorrow" is probably from 11 am till 2 pro).
m. Dialog initiative: Many people noticed the places where CAL would assume more control over the dialog (when it needed
specific, mandatory, pieces of information, for instance), but that was not felt to be intimidating.
n. Experimenter interference: The participants were told at the beginning of the session that the experimenter would be keeping an eye on the printout, but none of the participants maintained that awareness, or, if they were conscious of the indirect presence of the experimenter watching their progress, it did not affect them:
o. Ease-of-Use: It was very comfortable to use." "Once they know what they're doing with it, I think it would be very easy to use."
p. Grammatical quirks: Due to the simplicity of CAL's grammatical model, it sometimes makes minor errors in extracting the descriptions of an appointment once the time references are removed (i.e., the occasional stray comma or word finds its way into the description). In addition, there is sometimes a confusion of plurality when lists are printed ("Here are appointments number 1 ...").
q. General favorable comments: -I didn't feel tense at all." "I love it. At least it listens, it's better than most people! .... I really enjoyed it, thoroughly enjoyed it." "In fact," one computer-naive psychiatrist commented, "I was sitting here thinking.., for the first time I thought it might be sort of fun to have a home computer."
DISCUSSION
Yet, it did well in a controlled test of many aspects of its performance. This success is directly attributable to the empirical nature of the design process that gave birth to the program.
How did the key phases of the design process contribute to this success?
a. The task analysis was indispensable for everything that followed. Its purpose was to determine what functions the computer application must have (i.e., what exactly the program is supposed to do).
b. A key role of the simulation phase was to provide a basis upon which to build the initial grammar. In contrast to previous natural language programs (e.g., LUNAR [15], SHRDLU [14]), CAL was not built on a model of a prespecified grammar. Rather, CAL uses what I have chosen to describe as an empirically derived grammar.
It was surprising that a point of diminishing returns was reached after so few iterations (i.e., that each participant used only one to two words that no one else used).
Among the many potential explanations for this difference (e.g., substantive differences between their problem domain and the calendar-keeping domain) there are two compelling possibilities: first, the participants in the Michaelis, et al. study were probably operating in a less familiar problem-solving area and, having lower levels of "semantic knowledge," tended to be more erratic in their language behaviors. Another possibility is that when people are (or think they are) communicating with a machine, as they were in the CAL study, they might tend to "normalize" their language (i.e., use fewer uncommon words).
A natural consequence of the use of an empirical grammar is the inability to generalize the obtained language model beyond the context in which it was developed. 2 However, though CAL's grammar cannot be generalized, the systematic approach used to generate it can. The six steps of program development used here can just as easily, and presumably with comparably little investment in participant-hours, be applied to other office applications where natural language is appropriate.
c. During the iterative design phase, breakdowns in communications were not
blamed on "user error," but were thought of as failures of CAL (or, more
appropriately, of CAL's designer) to anticipate all the necessary variations in
input structure. If people find it natural to express times with semi-colons rather
than colons (and thus avoid the SHIFT function of the keyboard), and if that
usage doesn't generate any unresolvable ambiguity (it doesn't), why force them
to use colons? It doesn't cost anything to add that flexibility to the program.
While there were some informal long-term users of CAL, a more formal
longitudinal study would be necessary to shed more light on how this natural
language interface holds up with dedicated users over time. This would also
provide an opportunity to give more thorough consideration to information
retrieval in a realistic setting (e.g., the ways in which people check their calendars
at the beginning of each day).
CONCLUSION
This study has shown several things about calendars, natural language, and software design.
The task analysis [4] showed that calendars are indispensable in the office environment and that they are good candidates for computerization.
While no controlled comparison was made with other modes of input, the interview results do indicate that natural language does hold some promise as an input mode, at least for the semantically knowledgeable, computer-naive business professionals represented in this study.
Finally, the objective program performance results show that the object of CAL's design was met. Computer-naive users were indeed able to sit down at a terminal and have meaningful interactions with a computer, in their own natural language, from the very outset.
Most of the participants in this experiment, including those who had expressed much trepidation over the prospect of dealing with computers, went out of their way to tell me how enthusiastic they were about the program and their accomplishments with it.
Abstract
i. A six-step, iterative, empirical human factors design methodology was used to develop CAL, a natural
language computer application to help computer-naive business professionals manage their personal
calendars. CAL, Calendar Access Language.
ii. Input language is processed by a simple, nonparsing algorithm with limited storage
requirements and a quick response time.
iii. CAL allows unconstrained English inputs from users with
no training (except for a five minute introduction to the keyboard and display) and no manual (except
for a two-page overview of the system).
iv. In a controlled test of performance, CAL correctly responded to between 86 percent and 97 percent of the storage and retrieval requests it received, according to
various criteria
v. This level of performance could never have been achieved with such a simple processing model were it not for the empirical approach used in the development of the program and its dictionaries
vi. The key is to elicit the cooperation of such users as partners in an iterative, empirical development process.