ABSTRACT
Ever improving speech technology continues to revolutionise the way we interact with computers. This paper describes a speech-driven graphics system that allows the user to construct and manipulate 3-dimensional (3D) graphical images using only their voice, averting the need to learn a graphics programming language or the point-and-click options of a conventional graphics software interface. The system combines an inexpensive Java-based speech-to-text package with open-source Java packages for constructive solid geometry and text-to-speech generation to create a completely hands-off graphics application. These components are integrated with context-free input/output grammars modeled from observations about the language used when a person unfamiliar with computer graphics software directs an experienced user in the creation of 3D images. The result is a natural, conversation-style interface that allows anyone to make effective use of 3D-graphics packages regardless of their technical expertise.
- Coyne, B., Sproat, R. (2001) "WordsEye: an automatic text-to-scene conversion system." International Conference on Computer Graphics and Interactive Techniques, pp. 487--496. Google ScholarDigital Library
- McTear, M. F. (2002) "Spoken dialogue technology: enabling the conversational user interface" ACM Computing Surveys 34(1): 90--169; March. Google ScholarDigital Library
- Myers, B., Hollan, J., Cruz, I., Bryson, S., Bulterman, D., Catarci, T., Citrin, W., Glinert, E., Grudin, J., Ioannidis, Y. (1996) "Strategic directions in human-computer interaction" ACM Computing Surveys 28(4): 794--809; December. Google ScholarDigital Library
- Myers, B., Hudson, S. E., Pausch, R. (2000) "Past, present, and future of user interface software tools." ACM Transactions in Computer-Human Interaction 7(1): 3--28; March. Google ScholarDigital Library
- Verner, S. T. "POVtalk: a Natural Language based 3-D scene generator", Honours thesis, University of Waikato, 1998.Google Scholar
- Winograd, T "Procedural Model of Language Understanding". In (Grosz, B., Jones, K. and Webber, B. eds.) Natural Language Processing, Morgan Kaufman Publishers, LosAltos, California, pp. 249--266, 1986. Google ScholarDigital Library
- An experimental speech to graphics system
Recommendations
Articulatory Speech Re-synthesis: Profiting from Natural Acoustic Speech Data
Cross-Modal Analysis of Speech, Gestures, Gaze and Facial ExpressionsThe quality of static phones (e.g. vowels, fricatives, nasals, laterals) generated by articulatory speech synthesizers has reached a high level in the last years. Our goal is to expand this high quality to dynamic speech, i.e. whole syllables, words, ...
Psycho-acoustics inspired automatic speech recognition
AbstractUnderstanding the human spoken language recognition process is still a far scientific goal. Nowadays, commercial automatic speech recognisers (ASRs) achieve high performance at recognising clean speech, but their approaches are poorly ...
Highlights- We propose a novel Automatic Speech Recognizer inspired by psycho-acoustic studies.
Effects of Speaking Rate on Speech and Silent Speech Recognition
CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing SystemsSpeaking rate or the speed at which a person speaks is a fundamental user characteristic. This work investigates the rate in which users speak when interacting with speech and silent speech-based methods. Results revealed that native users speak about ...
Comments