Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Learning human multimodal dialogue strategies
Rieser V., Lemon O. Natural Language Engineering16 (1):3-23,2010.Type:Article
Date Reviewed: Oct 18 2010

Although humans can usually maintain a dialog quite effortlessly, managing even domain-specific human-machine dialog is fraught with difficulties. Rieser and Lemon’s paper addresses this issue. It is well known that humans employ information from a variety of sources--such as gestures, eye contact, the rhythm and sequencing of speech turns, and voice features, in addition to the actual meaning of what is being said--in order to establish and maintain communication. In more formal settings, dialog is often aided explicitly by visual modalities, such as when a presenter exhibits a slide at a meeting. How do humans decide on the appropriateness of employing a visual modality to aid communication? Can dialog systems be implemented to emulate this ability? This paper attempts to answer these questions with respect to clarification requests.

The most original aspect of the work reported in the paper is the use of the Wizard of Oz technique to gather information about the behavior of a human being engaged in the task of simulating a spoken language dialog system. In the Wizard of Oz experiment, one person (the “wizard”) sits in another room, observes the user’s actions, and simulates the system’s responses in real time. The technique is usually employed as prototyping tool at the early stages of system design to assess user performance. The originality in the application of the method, as described in the paper, is that the wizard, not the user, is the subject of the experiment. The data the authors are after is the wizard’s decision about whether to present visual information when asking for clarification or to use speech exclusively. The result of this data collection method is a dataset that describes features of the clarification request itself, features of the utterances that preceded and followed the clarification request, and information state features, such as number of matches, delay of user reply, dialog duration, number of click events, and mean values that describe average behavior for the user in the course of the dialog. Based on this data, the authors then employ various machine learning techniques in an attempt to predict whether or not the clarification request included visual output (a binary feature encoded in the dataset).

The authors are explicit about certain limitations of their work, including the small size of the dataset (with respect to the size of its feature set) and the shortcomings of the technique they employ to simulate speech recognition results. However, a potentially more serious limitation is the assumption that the data gathered through the Wizard of Oz technique will offer insights into natural human behavior in dialogs. I find it implausible that natural (human-human) behavior could be elicited through an experiment in which one of the people involved is unaware that he or she is engaged in an interaction with another person, and where the mediation of the simulated interface interferes with phenomena such as establishment of common ground and the semantics and pragmatics of gesturing, gaze, and turn taking. In addition, I find the interpretation of the machine learning results odd in places. The authors conclude, for instance, that wizard behavior is suboptimal by comparing the strategies observed to other user studies and best-practice advice on multimodal generation. However, the failure of the wizards to conform to current theories and design practices seems rather more likely to point to a flaw in the Wizard of Oz setup itself--namely, that of allowing the wizard too much flexibility in his or her choice of modalities.

Despite these limitations, the paper does shed light on a specific aspect of multimodal dialogs, and it gives the reader a good idea of the difficulties involved in designing multimodal dialog systems. As a bonus, it also says something about the Wizard of Oz technique as a prototyping tool for such systems, even though the issue is not explicitly discussed.

Reviewer:  Saturnino Luz Review #: CR138495 (1104-0439)
Bookmark and Share
  Featured Reviewer  
 
Learning (I.2.6 )
 
 
Feature Evaluation And Selection (I.5.2 ... )
 
 
Natural Language (H.5.2 ... )
 
 
Speech Recognition And Synthesis (I.2.7 ... )
 
 
Natural Language Processing (I.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy