ACM Home Page
Please provide us with feedback. Feedback
Automatic summarization of voicemail messages using lexical and prosodic features
Full text PdfPdf (943 KB)
Source ACM Transactions on Speech and Language Processing (TSLP) archive
Volume 2 ,  Issue 1  (February 2005) table of contents
Article No. 1  
Year of Publication: 2005
ISSN:1550-4875
Authors
Konstantinos Koumpis  Vienna Telecommunications Research Center, Vienna, Austria
Steve Renals  University of Edinburgh, Edinburgh, UK
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 85,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1075389.1075390
What is a DOI?

ABSTRACT

This aticle presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words with each word described by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems as well as human transcriptions of voicemail speech.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Beckman, M. 1986. Stress and Non-Stress Accent. Foris Publications, Dordrecht, Holland/Riverton.
 
2
Chen, F. and Withgott, M. 1992. The use of emphasis to automatically summarize a spoken discourse. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'92). San Francisco, CA. 229--232.
 
3
Cordoba, R., Woodland, P. C., and Gales, M. J. F. 2002. Improving cross task performance using MMI training. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'02). Orlando, FL. 85--88.
4
 
5
Garofolo, J., Lard, J., and Voorhees, E. 2001. TREC-9 spoken document retrieval track: overview and results. In Proceedings of the 9th Text Retrieval Conference (TREC-9). Gaithersburg, MD.
 
6
Gotoh, Y. and Renals, S. 2000. Information extraction from broadcast news. Philosophical Trans. Royal Soc. London, Series A, 358, 1295--1310.
 
7
Hakkani-Tür, D., Tür, G., Stolcke, A., and Shriberg, E. 1999. Combining words and prosody for information extraction from speech. In Proceedings of Eurospeech. Budapest, Hungary. 1991--1994.
 
8
Hirschberg, J., Bacchiani, M., Hindle, D., Isenhour, P., Rosenberg, A., Stark, L., Stead, L., Whittaker, S., and Zamchick, G. 2001. SCANMail: Browsing and searching speech data by content. In Proceedings of Eurospeech. Aalborg, Denmark.
 
9
Hirschberg, J. and Nakatani, C. 1998. Acoustic indicators of topic segmentation. In Proceedings of International Conference on Spoken Language Processing (ICSLP'98). Sydney, Australia. 1255--1258.
 
10
Hori, C. and Furui, S. 2000. Improvements in automatic speech summarization and evaluation methods. In Proceedings of International Conference on Spoken Language Processing (ICSLP'00). Beijing, China. 326--329.
 
11
 
12
 
13
Kato, Y. 1994. Voice message summary for voice services. In International Symposium on Speech, Image Processing and Neural Networks. Hong-Kong. 622--625.
 
14
Koumpis, K. 2002. Automatic voicemail summarisation for mobile messaging. Ph.D. thesis, University of Sheffield, UK.
 
15
Koumpis, K. 2004. Automatic categorization of voicemail transcripts using stochastic language models. In Proceedings of the 7th International Conference on Text, Speech and Language. Brno, Czech Republic. Lecture Notes in Computer Science.
 
16
 
17
Koumpis, K. and Renals, S. 2000. Transcription and summarization of voicemail speech. In Proceedings of International Conference on Spoken Language Processing (ICSLP'00). Beijing, China. 688--691.
 
18
Koumpis, K. and Renals, S. 2001. The role of prosody in a voicemail summarization system. In Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding. Red Bank, NJ. 87--92.
 
19
Koumpis, K., Renals, S., and Niranjan, M. 2001. Extractive summarization of voicemail using lexical and prosodic feature subset selection. In Proceedings of Eurospeech. Aalborg, Denmark. 2377--2380.
 
20
Kubala, F., Schwartz, R., Stone, R., and Weischedel, R. 1998. Named entity extraction from speech. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop. Lansdowne, VA.
 
21
Ladd, D. R. 1996. Intonational Phonology. Cambridge University Press, Cambridge, UK.
 
22
Maclay, H. and Osgood, C. 1959. Hesitation phenomena in spontaneous english speech. Word 1, 19--44.
 
23
Makhoul, J., Kubala, F., Schwartz, R., and Weischedel, R. 1999. Performance measures for information extraction. In Proceedings of the DARPA Broadcast News Workshop. Herndon, VA. 249--252.
 
24
Mani, I. 2001. Automatic Summarization. John Benjamins Publishing, Amsterdam, The Netherlands.
 
25
 
26
Medan, Y., Yair, E., and Chazan, D. 1991. Super resolution pitch determination of speech signal. IEEE Trans. Acoustics, Speech Signal Process. 39, 1, 40--48.
 
27
Morgan, N. and Bourlard, H. 1995. An introduction to hybrid HMM/connectionist continuous speech recognition. IEEE Signal Process. Mag., 25--42.
 
28
Morgan, N., Fosler, E., and Mirghafori, N. 1997. Speech recognition using on-line estimation of speaking rate. In Proceedings of Eurospeech. Rhodes, Greece. 2079--2082.
 
29
Padmanabhan, M., Eide, E., Ramabhardan, G., Ramaswany, G., and Bahl, L. 1998. Speech recognition performance on a voicemail transcription task. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98). Seattle, WA. 913--916.
 
30
Paksoy, E., McCree, A., Viswananthan, V., and Linn, J. 1997. A variable-rate CELP coder for fast remote voicemail retrieval using a notebook computer. In Proceedings of the IEEE Workshop on Multimedia Signal Processing. Princeton, NJ. 119--124.
 
31
 
32
Pierrehumbert, J. 1980. The phonology and phonetics of english intonation. Ph.D. thesis, MIT, Cambridge, MA.
 
33
 
34
Robertson, S. E. and Sparck Jones, K. 1997. Simple proven approaches to text retrieval. Tech. rep., TR-356, Cambridge University Computer Laboratory, Cambridge, UK.
 
35
 
36
Rohlicek, J. R., Ayuso, D., Bates, M., Bobrow, R., Boulanger, A., Gish, H., Jeanrenaud, P., Meteer, M., and Siu, M. 1992. Gisting conversational speech. In Proceedings of IEEE International Conference Acoustics, Speech, and Signal Processing (ICASSP'92). San Francisco, CA. 113--117.
 
37
Saon, G. and Padmanabhan, M. 2001. Data-driven approach to designing compound words for continuous speech recognition. IEEE Trans. Speech Audio Process. 9, 4, 327--332.
 
38
Scott, M., Niranjan, M., and Prager, R. 1998. Parcel: Feature subset selection in variable cost domains. Tech. rep., CUED TR-323, ftp://svr-ftp.eng.cam.ac.uk/pub/reports. Cambridge, UK.
 
39
Shriberg, E. 2001. To “errrr” is human: Ecology and acoustics of speech disfluencies. J. Int. Phonetic Ass. 31, 1, 153--169.
 
40
 
41
 
42
Taylor, P., Caley, R., Black, A. W., and King, S. 1999. Edinburgh speech tools library. Tech. rep., ftp://ftp.cstr.ed.ac.uk. Edinburgh, UK.
 
43
Valenza, R., Robinson, T., Hickey, M., and Tucker, R. 1999. Summarization of spoken audio through information extraction. In Proceedings of ESCA Workshop on Accessing Information in Spoken Audio. Cambridge, UK. 111--116.
 
44
Walker, M. A., Litman, D. J., Kamm, C. A., and Abella, A. 1998. Evaluating spoken dialogue agents with PARADISE: Two case studies. Comput. Speech Lang. 12, 3, 317--347.
 
45
Warnke, V., Kompe, R., Niemann, H., and Nöth, E. 1997. Integrated dialog act segmentation and classification using prosodic features and language models. In Proceedings of Eurospeech. Rhodes, Greece. 207--210.
 
46
Williams, G. and Renals, S. 1999. Confidence measures from local posterior probability estimates. Comput. Speech Lang. 13, 395--411.
47
 
48
Zweig, M. H. and Campbell, G. 1993. Receiver-operative characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clinical Chem. 39, 561--577.


Collaborative Colleagues:
Konstantinos Koumpis: colleagues
Steve Renals: colleagues