Article

Toward open-microphone engagement for multiparty interactions

Authors:

Rebecca Lunsford,

Alexander M. ArthurAuthors Info & Claims

ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Pages 273 - 280

https://doi.org/10.1145/1180995.1181049

Published: 02 November 2006 Publication History

Abstract

There currently is considerable interest in developing new open-microphone engagement techniques for speech and multimodal interfaces that perform robustly in complex mobile and multiparty field environments. State-of-the-art audio-visual open-microphone engagement systems aim to eliminate the need for explicit user engagement by processing more implicit cues that a user is addressing the system, which results in lower cognitive load for the user. This is an especially important consideration for mobile and educational interfaces due to the higher load required by explicit system engagement. In the present research, longitudinal data were collected with six triads of high-school students who engaged in peer tutoring on math problems with the aid of a simulated computer assistant. Results revealed that amplitude was 3.25dB higher when users addressed a computer rather than human peer when no lexical marker of intended interlocutor was present, and 2.4dB higher for all data. These basic results were replicated for both matched and adjacent utterances to computer versus human partners. With respect to dialogue style, speakers did not direct a higher ratio of commands to the computer, although such dialogue differences have been assumed in prior work. Results of this research reveal that amplitude is a powerful cue marking a speaker's intended addressee, which should be leveraged to design more effective microphone engagement during computer-assisted multiparty interactions.

References

[1]

Arthur, A. M., Lunsford, R., Wesson, M., & Oviatt, S., Prototyping novel collaborative multimodal systems: Simulation, data collection, and analysis tools for the next decade. In press, ICMI 2006.

Digital Library

[2]

Boersma, P. & Weenink, D., Praat: Doing phonetics by computer (version 4.2). 2005. (URL:www.praat.org).

[3]

Coulston, R., Oviatt, S. L., & Darves, C. Amplitude convergence in children's conversational speech with animated personas. In Proceedings of the International Conference on Spoken Language Processing (ICSLP'2002), 2002, Casual Prod. Ltd., Denver, CO: 2689--2692.

[4]

Escera, C., Corral, M.-J., & Yago, E., An electrophysiological and behavioral investigation of involuntary attention towards auditory frequency, duration and intensity changes. Cognitive Brain Research, 14, 3: 325--332.

[5]

Hirschberg, J. & Grosz, B. Intonational features of local and global discourse. In Proceedings of the Speech and Natural Language Workshop, 1992 (Harriman, NY). Morgan Kaufmann: 441--446.

Digital Library

[6]

Hirschberg, J. & Nakatani, C. H. A prosodic analysis of discourse segments in direction-giving monologues. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, 1996 (Santa Cruz, CA). 286--293.

Digital Library

[7]

Katzenmaier, M., Stiefelhagen, R., & Schultz, T. Identifying the addressee in human-human-robot interactions based on head pose and speech. In Proceedings of the International Conference on Multimodal Interfaces, 2004 (State College, PA). ACM Press: 144--151.

Digital Library

[8]

Levow, G.-A. Prosodic cues to discourse segment boundaries in human-computer dialogue. In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue 2004: 93--96.

[9]

Lunsford, R., Oviatt, S. L., & Coulston, R. Audio-visual cues distinguishing self- from system-directed speech in younger and older adults. In Proceedings of the International Conference on Multimodal Interfaces, 2005 (Trento, Italy). ACM Press: 265--272.

Digital Library

[10]

Messer, D. J., The identification of names in maternal speech to infants. Journal of Psycholinguistic Research, 10, 1 (January 1981): 69--77.

[11]

Neti, C., Iyengar, G., Potamianos, G., Senior, A., & Maison, B. Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction. In Proceedings of the International Conference on Spoken Language Processing, 2000 (Beijing). 3, Chinese Friendship Publishers: 11--14.

[12]

Oviatt, S. L., Darves, C., & Coulston, R., Toward adaptive conversational interfaces: Modeling speech convergence with animated personas. Transactions on Human Computer Interaction (TOCHI), 11, 3: 300--328.

Digital Library

[13]

Oviatt, S. L., Maceachern, M., & Levow, G., Predicting hyperarticulate speech during human-computer error resolution. Speech Communication, 24, 2: 1--23.

Digital Library

[14]

Paek, T., Horvitz, E., & Ringger, E. Continuous listening for unconstrained spoken dialog. In Proceedings of the International Conference on Spoken Language Processing, 2000 (Beijing, China). Chinese Freindship Publishers: 138--141.

[15]

Schroger, E., A neural mechanism for involuntary attention shifts to changes in auditory stimulation. Journal of Cognitive Neuroscience, 8, 6 (November 1996): 527--539.

Digital Library

[16]

van Turnhout, K., Terken, J., Bakx, I., & Eggen, B. Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features. In Proceedings of the 7th International Conference on Multimodal interfaces, 2005 (Trento, Italy). ACM Press: 175--182.

Digital Library

[17]

Welkowitz, J., Feldstein, S., Finklestein, M., & Aylesworth, L., Changes in vocal intensity as a function of interspeaker influence. Perceptual and Motor Skills, 35: 715--718.

Cited By

Roesler EVollmann MManzey DOnnasch L(2024)The dynamics of human–robot trust attitude and behavior — Exploring the effects of anthropomorphism and type of failureComputers in Human Behavior10.1016/j.chb.2023.108008150(108008)Online publication date: Jan-2024
https://doi.org/10.1016/j.chb.2023.108008
Cohn MZellou G(2021)Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility AdjustmentsFrontiers in Communication10.3389/fcomm.2021.6757046Online publication date: 29-Jul-2021
https://doi.org/10.3389/fcomm.2021.675704
Jacob MWachs J(2016)Optimal Modality Selection for Cooperative Human–Robot Task CompletionIEEE Transactions on Cybernetics10.1109/TCYB.2015.250698546:12(3388-3400)Online publication date: Dec-2016
https://doi.org/10.1109/TCYB.2015.2506985
Show More Cited By

Index Terms

Toward open-microphone engagement for multiparty interactions
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Sound-based input / output
2. Human-centered computing
  1. Human computer interaction (HCI)
  2. Interaction design
    1. Interaction design process and methods
      1. Interface design prototyping
      2. User centered design
    2. Interaction design theory, concepts and paradigms

Recommendations

Human perception of intended addressee during computer-assisted meetings
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Recent research aims to develop new open-microphone engagement techniques capable of identifying when a speaker is addressing a computer versus human partner, including during computer-assisted group interactions. The present research explores: (1) how ...
Audio-visual cues distinguishing self- from system-directed speech in younger and older adults
ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces

In spite of interest in developing robust open-microphone engagement techniques for mobile use and natural field contexts, there currently are no reliable techniques available. One problem is the lack of empirically-grounded models as guidance for ...
Multimodal Analysis of Client Persuasion in Consulting Interactions: Toward Understanding Successful Consulting
Social Computing and Social Media: Applications in Marketing, Learning, and Health
Abstract
To analyze successful consulting processes using multimodal analysis, the aim of this research is to develop a model for recognizing when a client is persuaded by a consultant using multimodal features. These models enable us to analyze the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

November 2006

404 pages

ISBN:159593541X

DOI:10.1145/1180995

General Chairs:
Francis Quek
Virginia Tech, USA
,
Jie Yang
Carnegie Mellon University, USA
,
Program Chairs:
Dominic Massaro
University of California, Santa Cruz, USA
,
Abeer Alwan
University of California, Los Angeles, USA
,
Timothy J. Hazen
Massachusetts Institute of Technology, USA

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI06

Sponsor:

ICMI06: 8th International Conference on Multimodal Interfaces 2006

November 2 - 4, 2006

Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
262
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Roesler EVollmann MManzey DOnnasch L(2024)The dynamics of human–robot trust attitude and behavior — Exploring the effects of anthropomorphism and type of failureComputers in Human Behavior10.1016/j.chb.2023.108008150(108008)Online publication date: Jan-2024
https://doi.org/10.1016/j.chb.2023.108008
Cohn MZellou G(2021)Prosodic Differences in Human- and Alexa-Directed Speech, but Similar Local Intelligibility AdjustmentsFrontiers in Communication10.3389/fcomm.2021.6757046Online publication date: 29-Jul-2021
https://doi.org/10.3389/fcomm.2021.675704
Jacob MWachs J(2016)Optimal Modality Selection for Cooperative Human–Robot Task CompletionIEEE Transactions on Cybernetics10.1109/TCYB.2015.250698546:12(3388-3400)Online publication date: Dec-2016
https://doi.org/10.1109/TCYB.2015.2506985
Oviatt SHang KZhou JChen FZhang ZCohen PBohus DHoraud RMeng H(2015)Spoken Interruptions Signal Productive Problem Solving and Domain Expertise in MathematicsProceedings of the 2015 ACM on International Conference on Multimodal Interaction10.1145/2818346.2820743(311-318)Online publication date: 9-Nov-2015
https://dl.acm.org/doi/10.1145/2818346.2820743
Oviatt SCohen AAli Salah ACohn JSchuller BAran OMorency LCohen P(2014)Written Activity, Representations and Fluency as Predictors of Domain Expertise in MathematicsProceedings of the 16th International Conference on Multimodal Interaction10.1145/2663204.2663245(10-17)Online publication date: 12-Nov-2014
https://dl.acm.org/doi/10.1145/2663204.2663245
Baba NHuang HNakano Y(2013)Identifying the Addressee using Head Orientation and Speech Information in Multiparty Human-Agent ConversationsTransactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.28.14928:2(149-159)Online publication date: 2013
https://doi.org/10.1527/tjsai.28.149
Oviatt SCohen AEpps JChen FOviatt SMase KSears AJokinen KSchuller B(2013)Written and multimodal representations as predictors of expertise and problem-solving success in mathematicsProceedings of the 15th ACM on International conference on multimodal interaction10.1145/2522848.2533793(599-606)Online publication date: 9-Dec-2013
https://dl.acm.org/doi/10.1145/2522848.2533793
Oviatt SEpps JChen FOviatt SMase KSears AJokinen KSchuller B(2013)Problem solving, domain expertise and learningProceedings of the 15th ACM on International conference on multimodal interaction10.1145/2522848.2533791(569-574)Online publication date: 9-Dec-2013
https://dl.acm.org/doi/10.1145/2522848.2533791
Oviatt SCohen AWeibel NEpps JChen FOviatt SMase KSears AJokinen KSchuller B(2013)Multimodal learning analyticsProceedings of the 15th ACM on International conference on multimodal interaction10.1145/2522848.2533790(563-568)Online publication date: 9-Dec-2013
https://dl.acm.org/doi/10.1145/2522848.2533790
Nakano YBaba NHuang HHayashi YEpps JChen FOviatt SMase KSears AJokinen KSchuller B(2013)Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systemsProceedings of the 15th ACM on International conference on multimodal interaction10.1145/2522848.2522872(35-42)Online publication date: 9-Dec-2013
https://dl.acm.org/doi/10.1145/2522848.2522872
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten