skip to main content
10.1145/1647314.1647385acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Speaker change detection with privacy-preserving audio cues

Published: 02 November 2009 Publication History

Abstract

In this paper we investigate a set of privacy-sensitive audio features for speaker change detection (SCD) in multiparty conversations. These features are based on three different principles: characterizing the excitation source information using linear prediction residual, characterizing subband spectral information shown to contain speaker information, and characterizing the general shape of the spectrum. Experiments show that the performance of the privacy-sensitive features is comparable or better than that of the state-of-the-art full-band spectral-based features, namely, mel frequency cepstral coefficients, which suggests that socially acceptable ways of recording conversations in real-life is feasible.

References

[1]
D. Gatica-Perez. Analyzing Group Interaction in Conversations: a Review. In Proc. of IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2006.
[2]
J. Ajmera, I. McCowan and H. Bourlard. Robust Speaker Change Detection. IEEE Signal Processing Letters, 2004.
[3]
R. Donovan. Trainable speech synthesis. PhD Dissertation, Cambridge University, 1996.
[4]
P. Thevenaz and H. Hugli. Usefulness of the LPC- residue in text-independent speaker verification. Speech Communication, pages 145--157, 1995.
[5]
N. Dhananjaya and B. Yegnanarayana. Speaker change detection in casual conversations using excitation source features. Speech Communication, pages 153--161, 2007.
[6]
F. K. Soong and A. K. Rosenberg. On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans. on Acoustics Speech and Signal Processing, pages 871--879, 1988.
[7]
S. Furui. Research on individuality features in speech waves and automatic speaker recognition techniques. Speech Communication, 1986.
[8]
J. Makhoul. Linear prediction: A tutorial review. Proc. of the IEEE, 1975.
[9]
R. Smits and B. Yegnanarayana. Determination of instants of significant excitation in speech using group delay function. IEEE Trans. on Speech and Audio Processing, pages 325--333, 1995.
[10]
M. D. Plumpe and T. F. Quatieri and D. A. Reynolds. Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. on Speech and Audio Processing, pages 569--586, 1999.
[11]
J. He et al. On the use of features from prediction residual signals in speaker identification. In Proc. of Eurospeech, 1995.
[12]
D. Wyatt et al. A Privacy-sensitive approach to modeling multi-person conversations. In Proc. of IJCAI, 2007.
[13]
J. Makhoul. Spectral linear prediction: properties and applications. IEEE Trans. on Acoustics Speech and Signal Processing, pages 283--296, 1975.

Cited By

View all
  • (2021)Making Sense of Subtitles: Sentence Boundary Detection and Speaker Change Detection in Unpunctuated TextsCompanion Proceedings of the Web Conference 202110.1145/3442442.3451894(357-362)Online publication date: 19-Apr-2021
  • (2020)Context and Uncertainty Modeling for Online Speaker Change DetectionICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP40776.2020.9053280(8379-8383)Online publication date: May-2020
  • (2013)Wordless SoundsIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2012.221558821:1(85-98)Online publication date: 1-Jan-2013
  • Show More Cited By

Index Terms

  1. Speaker change detection with privacy-preserving audio cues

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces
    November 2009
    374 pages
    ISBN:9781605587721
    DOI:10.1145/1647314
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. modeling social interactions
    2. multiparty conversations
    3. privacy-sensitive features
    4. speaker change detection

    Qualifiers

    • Poster

    Conference

    ICMI-MLMI '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Making Sense of Subtitles: Sentence Boundary Detection and Speaker Change Detection in Unpunctuated TextsCompanion Proceedings of the Web Conference 202110.1145/3442442.3451894(357-362)Online publication date: 19-Apr-2021
    • (2020)Context and Uncertainty Modeling for Online Speaker Change DetectionICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP40776.2020.9053280(8379-8383)Online publication date: May-2020
    • (2013)Wordless SoundsIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2012.221558821:1(85-98)Online publication date: 1-Jan-2013
    • (2012)The nonverbal structure of patient case discussions in multidisciplinary medical team meetingsACM Transactions on Information Systems10.1145/2328967.232897030:3(1-24)Online publication date: 6-Sep-2012
    • (2011)Privacy-Sensitive Audio Features for Speech/Nonspeech DetectionIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2011.215185719:8(2538-2551)Online publication date: 1-Nov-2011

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media