research-article

An experimental speech to graphics system

Authors:
Andrew Golightly

Waikato University, Hamilton, New Zealand

Waikato University, Hamilton, New Zealand
View Profile

,
Tony Smith

Waikato University, Hamilton, New Zealand

Waikato University, Hamilton, New Zealand
View Profile

CHINZ '02: Proceedings of the SIGCHI-NZ Symposium on Computer-Human InteractionJuly 2002Pages 91–96https://doi.org/10.1145/2181216.2181232

Published:11 July 2002Publication History

CHINZ '02: Proceedings of the SIGCHI-NZ Symposium on Computer-Human Interaction

Pages 91–96

ABSTRACT

Ever improving speech technology continues to revolutionise the way we interact with computers. This paper describes a speech-driven graphics system that allows the user to construct and manipulate 3-dimensional (3D) graphical images using only their voice, averting the need to learn a graphics programming language or the point-and-click options of a conventional graphics software interface. The system combines an inexpensive Java-based speech-to-text package with open-source Java packages for constructive solid geometry and text-to-speech generation to create a completely hands-off graphics application. These components are integrated with context-free input/output grammars modeled from observations about the language used when a person unfamiliar with computer graphics software directs an experienced user in the creation of 3D images. The result is a natural, conversation-style interface that allows anyone to make effective use of 3D-graphics packages regardless of their technical expertise.

References

Coyne, B., Sproat, R. (2001) "WordsEye: an automatic text-to-scene conversion system." International Conference on Computer Graphics and Interactive Techniques, pp. 487--496. Google ScholarDigital Library
McTear, M. F. (2002) "Spoken dialogue technology: enabling the conversational user interface" ACM Computing Surveys 34(1): 90--169; March. Google ScholarDigital Library
Myers, B., Hollan, J., Cruz, I., Bryson, S., Bulterman, D., Catarci, T., Citrin, W., Glinert, E., Grudin, J., Ioannidis, Y. (1996) "Strategic directions in human-computer interaction" ACM Computing Surveys 28(4): 794--809; December. Google ScholarDigital Library
Myers, B., Hudson, S. E., Pausch, R. (2000) "Past, present, and future of user interface software tools." ACM Transactions in Computer-Human Interaction 7(1): 3--28; March. Google ScholarDigital Library
Verner, S. T. "POVtalk: a Natural Language based 3-D scene generator", Honours thesis, University of Waikato, 1998.Google Scholar
Winograd, T "Procedural Model of Language Understanding". In (Grosz, B., Jones, K. and Webber, B. eds.) Natural Language Processing, Morgan Kaufman Publishers, LosAltos, California, pp. 249--266, 1986. Google ScholarDigital Library

An experimental speech to graphics system
1. Human-centered computing

Recommendations

Articulatory Speech Re-synthesis: Profiting from Natural Acoustic Speech Data
Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions

The quality of static phones (e.g. vowels, fricatives, nasals, laterals) generated by articulatory speech synthesizers has reached a high level in the last years. Our goal is to expand this high quality to dynamic speech, i.e. whole syllables, words, ...
Read More
Psycho-acoustics inspired automatic speech recognition
Abstract
Understanding the human spoken language recognition process is still a far scientific goal. Nowadays, commercial automatic speech recognisers (ASRs) achieve high performance at recognising clean speech, but their approaches are poorly ...
Highlights
- We propose a novel Automatic Speech Recognizer inspired by psycho-acoustic studies.
Read More
Effects of Speaking Rate on Speech and Silent Speech Recognition
CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems

Speaking rate or the speed at which a person speaks is a fundamental user characteristic. This work investigates the rate in which users speak when interacting with speech and silent speech-based methods. Results revealed that native users speak about ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHINZ '02: Proceedings of the SIGCHI-NZ Symposium on Computer-Human Interaction
July 2002
111 pages
ISBN:0473085003
DOI:10.1145/2181216
Conference Chair:
Matt Jones,
Program Chairs:
Steve Jones,
Masood Masoodian
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 July 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D graphics
computer human interaction
procedural semantics
speech
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate8of23submissions,35%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 32
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An experimental speech to graphics system

CHINZ '02: Proceedings of the SIGCHI-NZ Symposium on Computer-Human Interaction

ABSTRACT

References

Cited By

Recommendations

Articulatory Speech Re-synthesis: Profiting from Natural Acoustic Speech Data

Psycho-acoustics inspired automatic speech recognition

Effects of Speaking Rate on Speech and Silent Speech Recognition