| The effect of task conditions on the comprehensibility of synthetic speech |
| Full text |
Pdf
(946 KB)
|
| Source
|
Conference on Human Factors in Computing Systems
archive
Proceedings of the SIGCHI conference on Human factors in computing systems
table of contents
The Hague, The Netherlands
Pages: 321 - 328
Year of Publication: 2000
ISBN:1-58113-216-6
|
|
Authors
|
|
Jennifer Lai
|
IBM Corporation/T.J. Watson Research Center, 30 Saw Mill River Road, Hawthorne, New York
|
|
David Wood
|
IBM Corporation/T.J. Watson Research Center, 30 Saw Mill River Road, Hawthorne, New York
|
|
Michael Considine
|
Rice University, 6300 S. Main Street, Houston, Texas
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 29, Citation Count: 7
|
|
|
ABSTRACT
A study was conducted with 78 subjects to evaluate the comprehensibility of synthetic speech for various tasks ranging from short, simple e-mail messages to longer news articles on mostly obscure topics. Comprehension accuracy for each subject was measured for synthetic speech and for recorded human speech. Half the subjects were allowed to take notes while listening, the other half were not. Findings show that there was no significant difference in comprehension of synthetic speech among the five different text-to-speech engines used. Those subjects that did not take notes performed significantly worse for all synthetic voice tasks when compared to recorded speech tasks. Performance for synthetic speech in the non note-taking condition degraded as the task got longer and more complex. When taking notes, subjects also did significantly worse within the synthetic voice condition averaged across all six tasks. However, average performance scores for the last three tasks in this condition show comparable results for human and synthetic speech, reflective of a training effect.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Francis, A.L., Nusbaum, H.C. (1999). Evaluating the quality of Synthetic Speech. In Gardner-Bonneau, D. (Ed.), Human Factors and Voice Interactive Systems (pp. 63 - 97)
|
| |
2
|
Greenspan, S. L., Nusbaum, H.C., and Pisoni D.B. (1988). Perception of synthetic speech produced by rule: lntellibility of eight text-to-speech systems. Behavioral Research Methods, Instruments, and Computers, 18 100-107
|
| |
3
|
Klatt, D. H. (1987). Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America. September 1987 (pp. 737- 793)
|
| |
4
|
Pisoni, D.B. and Hunnieutt, S. (1980). Perceptual Evaluation of MITalk: The MIT unrestricted text-to-speech system. IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 572-575) New York
|
| |
5
|
Pisoni D.B., Nusbaum H.C. & Greene B.G. (1985). Perception of synthetic speech generated by rule. Proceedings of the 1EEE 73:1665-1676
|
| |
6
|
|
| |
7
|
Ralston, J.V., Pisoni, D.B., and Muliermix, J.W. (1995). Perception and comprehension of speech. In Syrdal, A.K., Bennett, R.W., Greenspan, S.L. (Eds.), Applied Speech Technology (pp. 233-288) Boca Raton: CRC Press
|
| |
8
|
Van Bezooijen, R. & van Heuven, V. (1998) Assessment of Synthesis System. In Gibbon, D., Moore, R. and Winski,R. (Eds.) , Volume 111." Spoken Language System Assessment (pp. 1671481} - 249 {563}) Mouton de Gruyer
|
| |
9
|
www.toefl.org
|
| |
10
|
www.genmagic.com/portico/portico__home.shtml
|
| |
11
|
www.webley.com
|
CITED BY 7
|
|
|
Li Gong , Jennifer Lai, Shall we mix synthetic speech and human speech?: impact on users' performance, perception, and attitude, Proceedings of the SIGCHI conference on Human factors in computing systems, p.158-165, March 2001, Seattle, Washington, United States
|
|
Jennifer Lai , Karen Cheng , Paul Green , Omer Tsimhoni, On the road and on the Web?: comprehension of synthetic and human speech while driving, Proceedings of the SIGCHI conference on Human factors in computing systems, p.206-212, March 2001, Seattle, Washington, United States
|
|
|
|
|
Kristin Vadas , Nirmal Patel , Kent Lyons , Thad Starner , Julie Jacko, Reading on-the-go: a comparison of audio and hand-held displays, Proceedings of the 8th conference on Human-computer interaction with mobile devices and services, September 12-15, 2006, Helsinki, Finland
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.1
MODELS AND PRINCIPLES
Additional Classification:
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.7
Natural Language Processing
Subjects:
Speech recognition and synthesis
General Terms:
Design,
Experimentation,
Human Factors,
Languages,
Management,
Measurement,
Performance,
Theory
Keywords:
comprehension,
synthetic speech,
text-to-speech,
user study
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|