Please enable JavaScript.
Coggle requires JavaScript to display documents.
VUI / Conversational Interface Design - Coggle Diagram
VUI / Conversational Interface Design
Design Principles
Theory / Building Block: Gricean Maxims
Maxim of quality
truthful and accurate communication
news: bad if too much info
Maxim of quantity
right amount of info
Maxim of relevance
appropriate and relevant info
YouTube: get info you want
Maxim of manner
clear, cooperative communication
Multimodality
include visual info, speech, touch/vibration
most conversational interfaces are multimodal
unimodel
Alexa
Principle
take advantage of other modalities when appropriate and effective
provide breaks for decision making and interruptions
all components should be consistent to fit in conversation
Does my interface still support a speech-only interaction?
Interaction Paradigm
Command-and-control interfaces (always-on voice assistants)
Alexa
Google Home
speech input mapped to specific system functions, called immediately
Express user intent with wake word / a button
OK Google
home button on iPhone
Indicate listenning and understanding
light on Alexa
Siri sound wave
Execute mapped function (e.g. show stocks)
Siri
Conversational interfaces
chatbots
task assistants
social robots
characteristics of human conversations
take turns
core, cooperative
one speaker at a time
explicit change of tokens
Principles
one speaker at a time
transparent who is speaking
turn exchanges
explicit signaling of what will speaknext
interruption handling
difficult
when talk, bad at listening
lot of ambiguity
infite depth
conversational markers
cues that indicate state / direction of conversation
Acknowledgement
thanks, got it, alright, sorry about that
Positive feedback
good job, nice to hear that
Timeline
First, halfway, finally
Confirmation
explicit confirmation to improve usability and transparency
explicit vs. implicit
explicit
I think ... Is that right?
require user to confirm
implicit
let user know what was understood
Ok, setting a reminder to ...
speech-based vs. non-speech based (visual, action)
Error-handling
due to technical mistakes, unexpected user behavior, environment
Types
no speech detected
speech detected, nothing recognized
recognized, system does wrong thing
recognized incorrectly (user talk to others)
Flowcharting Conversational interfaces
most commonly used method
model and prototype conversational interaction
flows: how interaction flow depending on
system state
user behavior
external influence
Usability Heuristics
Guiding, Teaching, and Offering Help
Guide through conversations, so not lost
guide subtly using natural affordance, not explicit
"so..." to cue
guide towards desired response, cue type of desired response
allow data to be given in response to single/multiple prompts
Use responses to help users discover what is possible
teach possible ways of asking for a result
user examples naturally, rather than teaching commands explicitly
Feedback and prompts
short feedback and prompts
talk because want to get things done, not want to talk
clear, succint
keep list of items short (3-5)max
let people ask if they want to know more
let experienced users have faster and shorter prompts
accelerator
Confirm input intelligently
implicitly through results or next prompt
explicitly confirm irreversible or critical actions
even allow undo after confirmation
say "yes" to confirm
Use speech-recognition system confidence to drive feedback style
High
Do it and tell me
Moderate
Confirm input
Low
re-prompt - "say that again?"
Use multimodal feedback when available
light
graphic display
sound
vibration
Erros
Avoid cascading correction errors
input is ambiguous / incorrect
escalate details in prompts
provide more
input results in many hypotheses
let user select from list with "yes"/"no"
error correction
use different modality / voice response style
e.g. select from a list
Use normal language in communicating errors
Vary (error) prompt working
not repetitive
don't blame user
don't: "that's not valid response"
"I don't understand"
don't mock
don't: I did not understand the response I heard.
Allow users to exit from errors or a mistaken conversation
special escape world globally
"Stop"
non-speech methods when speech fails
push a physicall button
General
Give agent a persona through language, sound, ...
agent
VUI
Alexa
chatbox
young?
doctor?
delivery person?
create an illution and be consistent
not distracting, keep minimum
Make system status clear
verbal, sound, multimodal feedback
Siri: show waves means listening
communicate delays immediately and give "busy" feedback
delay due to cloud base
Speak user's language
words, phrases, concepts familiar to user
paper feeder error
no system-oriented / technical jargon
error 56
Start and stop conversations (for command-and-control)
wake word (2-3)
don't require it again in same conversation
gracefully end conversation when user is done
frequent: goodbye, got it
infrequent: speak more
pay attention to what user said and respect user's context
user already know context as parameter
confirm before use
use user input as parameter if possible
remember what user said in current conversation
grounding
use context to respond intelligently
location/environment
time constraints
number of user
user identity/age
Conversational style
use spoken language characteristics
more elaborate (thank you, not "ty")
discourse markers as confirmations and prompts
more natural
next, and, so, acutally, sure, ok, got it
prosody (韵律)
rhythm, tone, pause, emphasis, fillers (uh, hmm, ah, like)
more natural
up - question; down - statement
Make conversation back-and-forth
don't prompt everything at once
take turns, don't let instructions get in way
give users a chance before jump in
Adapt agent style to who users, how they speak, feel
agents have similar conversational style
match user's emotion, gender, personality
fewer accidents
Overview
Effectiveness
less effective, error-prone
due to technology, ambiguities, and environment
Efficiency
less efficient
Satisfaction
awkward, socially inappropriate (bad in office), frustrating
Not replacing, but complementing graphic user interface
Value
streamline app in a conversational paradigm
resource constraints
cook
drive
more effective, efficient, satisfactory
address accessibility problems
vision (blindness)
motor (tremor)
cognitive deficiency (dyslexia 难语症)