|
ABSTRACT
Although speech is a potentially rich information source, a major barrier to exploiting speech archives is the lack of useful tools for efficiently accessing lengthy speech recordings. This paper develops and evaluates techniques for temporal compression - reducing the time people take to listen to a recording while still extracting critical information. We first describe an exploratory study that identifies novel excision techniques that remove unimportant words or utterances from the recording. We then develop a new method for evaluating how well temporal compression supports users in forming a general understanding of a recording. Applying this method, we demonstrate that excision techniques are generally more effective than standard compression techniques that simply speed up the entire recording.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
AMI Project. http://www.amiproject.org/
|
 |
2
|
|
| |
3
|
|
| |
4
|
Beasley, D.S. and Maki, J.E. Time and frequency altered speech. In Contemporary Issues in Experimental Phonetics, Academic Press, (1976), 419--458.
|
 |
5
|
Barbara L. Chalfonte , Robert S. Fish , Robert E. Kraut, Expressive richness: a comparison of speech and text as media for revision, Proceedings of the SIGCHI conference on Human factors in computing systems: Reaching through technology, p.21-26, April 27-May 02, 1991, New Orleans, Louisiana, United States
[doi> 10.1145/108844.108848]
|
| |
6
|
Covell, M., Withgott, M. and Slaney, M. Mach1: Nonuniform time-scale modification of speech. Proc. IEEE ICASSP 1998, (1998), 493--496.
|
 |
7
|
Ross Cutler , Yong Rui , Anoop Gupta , JJ Cadiz , Ivan Tashev , Li-wei He , Alex Colburn , Zhengyou Zhang , Zicheng Liu , Steve Silverberg, Distributed meetings: a meeting capture and broadcasting system, Proceedings of the tenth ACM international conference on Multimedia, December 01-06, 2002, Juan-les-Pins, France
[doi> 10.1145/641007.641112]
|
| |
8
|
Garofolo, J., Auzanne, C.G.P. and Voorhees, E.M. The TREC-9 spoken document retrieval track: A success story. Proc. RIAO-2000, (2000).
|
| |
9
|
Hays, W.L. Statistics for the Social Sciences. Holt, Rinehart and Winston, 1973.
|
| |
10
|
He, L. and Gupta, A. User benefits of non-linear time compression. Microsoft Research Technical Report MSR-TR-2000-96, Microsoft, (2000).
|
| |
11
|
Hejna, D. Real-time time-scale modification of speech via the synchronized overlap-add algorithm. MSc Dissertation, M.I.T., (1990).
|
| |
12
|
Hori, C. and Furui, S. A new approach to automatic speech summarization. IEEE Trans. Multimedia 5, 3 (2003), 368--378.
|
| |
13
|
Lin, C-W. ROUGE: A package for automatic evaluation of summaries. Proceedings of ACL 2004, (2004), 56--60.
|
| |
14
|
McKeown, K., Hirschberg, J., Galley, M. and Maskey, S.. From text to speech summarization. In Proc. of ICASSP 2005, (2005).
|
| |
15
|
MLMI 2005. http://groups.inf.ed.ac.uk/mlmi05/techprog.html.
|
| |
16
|
Nelson Morgan , Don Baron , Jane Edwards , Dan Ellis , David Gelbart , Adam Janin , Thilo Pfau , Elizabeth Shriberg , Andreas Stolcke, The meeting project at ICSI, Proceedings of the first international conference on Human language technology research, p.1-7, March 18-21, 2001, San Diego
[doi> 10.3115/1072133.1072203]
|
| |
17
|
Nenkova, A. and Passonneau, R. Evaluating content selection in summarization: the pyramid model. In Proc HLT-NAACL 2004, (2004), 145--152.
|
| |
18
|
Sticht, T.G. Comprehension of repeated time-compression recordings. Journal of Experimental Education 37, 4 (1969).
|
 |
19
|
|
| |
20
|
Tucker, S. and Whittaker, S. Accessing multimodal meeting data: systems, problems and possibilities. In Lecture Notes in Computer Science 3361, (2005), 1--11.
|
| |
21
|
Tucker, S. and Whittaker, S. Novel techniques for time-compressing speech: An exploratory study. In Proc of ICASSP 2005, (2005).
|
 |
22
|
Sunil Vemuri , Philip DeCamp , Walter Bender , Chris Schmandt, Improving speech playback using time-compression and speech recognition, Proceedings of the SIGCHI conference on Human factors in computing systems, p.295-302, April 24-29, 2004, Vienna, Austria
[doi> 10.1145/985692.985730]
|
| |
23
|
Voorhees, E.M. and Buckland, L.P. The Thirteenth Text REtrieval Conference Proceedings. NIST Special Publication, (2004).
|
| |
24
|
Walker, M., Prasad, R. and Stent, A. A trainable generator for recommendations in multimodal dialog. In EUROSPEECH: European Conference on Speech Processing, (2003), 1697--1701.
|
 |
25
|
Pierre Wellner , Mike Flynn , Simon Tucker , Steve Whittaker, A meeting browser evaluation test, CHI '05 extended abstracts on Human factors in computing systems, April 02-07, 2005, Portland, OR, USA
[doi> 10.1145/1056808.1057082]
|
 |
26
|
Steve Whittaker , Julia Hirschberg , Brian Amento , Litza Stark , Michiel Bacchiani , Philip Isenhour , Larry Stead , Gary Zamchick , Aaron Rosenberg, SCANMail: a voicemail interface that makes speech browsable, readable and searchable, Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, April 20-25, 2002, Minneapolis, Minnesota, USA
[doi> 10.1145/503376.503426]
|
 |
27
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.1
Multimedia Information Systems
Additional Classification:
H.
Information Systems
H.1
MODELS AND PRINCIPLES
H.1.2
User/Machine Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Keywords:
audio interfaces,
evaluation methods,
excision,
meetings interfaces,
speech manipulation,
speech summary,
speech-as-data,
speed-up,
summarization,
temporal compression
|