skip to main content
10.1145/3095713.3095721acmotherconferencesArticle/Chapter ViewAbstractPublication PagescbmiConference Proceedingsconference-collections
short-paper

Speaker Clustering Based on Non-Negative Matrix Factorization Using Gaussian Mixture Model in Complementary Subspace

Authors Info & Claims
Published:19 June 2017Publication History

ABSTRACT

Speech feature variations are mainly attributed to variations in phonetic and speaker information included in speech data. If these two types of information are separated from each other, more robust speaker clustering can be achieved. Principal component analysis transformation can separate speaker information from phonetic information, under the assumption that a space with large within-speaker variance is a "phonetic subspace" and a space within-speaker variance is a "phonetic sub-space". We propose a speaker clustering method based on non-negative matrix factorization using a Gaussian mixture model trained in the speaker subspace. We carried out comparative experiments of the proposed method with conventional methods based on Bayesian information criterion and Gaussian mixture model in an observation space. The experimental results showed that the proposed method can achieve higher clustering accuracy than conventional methods.

References

  1. S. E. Tranter and D. A. Reynolds, "An Overview of Automatic Speaker Diarization Systems", IEEE Transactions on AudioGoogle ScholarGoogle Scholar
  2. D. A. Reynolds and P. Torres-Carrasquillo, "Approaches and Applications of Audio Diarization", Proc. ICASSP, Vol.5, pp. 953--956, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  3. S. Chen and P. Gopalakrishnan, "Speaker Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", Proc. DARPA Broadcast News Transcription and Understanding Workshop, pp. 127--132, 1998.Google ScholarGoogle Scholar
  4. S. Cheng, H. Wang, H. Fu, "BIC-based Speaker Segmentation Using Divide-and-conquer Strategies with Application to Speaker Diarization", IEEE Transactions, Vol.18, pp. 141--157, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Iso, "Speaker Clustering Using Vector Quantization and Spectral Clustering", Proc. ICASSP, pp. 4986--4989, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Nishida and T. Kawahara, "Speaker Model Selection Based on the Bayesian Information Criterion Applied to Unsupervised Speaker Indexing", IEEE Transactions on Speech and Audio Processing, Vol.13, No.4, pp. 583--592, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  7. D. A. Reynolds, E. Singer, B. A. Carlson, G. C. O'Leary, J. J. McLaughlin, and M. A. Zissman, "Blind Clustering of Speech Utterances based on Speaker and Language Characteristics", Proc. ICSLP, pp. 3193--3196, 1998.Google ScholarGoogle Scholar
  8. L. Viet Bac, O. Mella, and D. Fohr, "Speaker Diarization using Normalized Cross Likelihood Ratio", Proc. Interspeech, pp. 1869--1872, 2007.Google ScholarGoogle Scholar
  9. M. Nishida and Y. Ariki, "Speaker Recognition by Separating Phonetic Space and Speaker Space", Proc. EUROSPEECH, Vol. 2, pp. 1381--1384, 2001.Google ScholarGoogle Scholar
  10. M. Nishida and S. Yamamoto, "Speaker Clustering Based on Non-negative Matrix Factorization", Proc. Interspeech, pp. 949--952, 2011.Google ScholarGoogle Scholar
  11. Sneath, P. H. A. and Sokal, R. R, "Numerical Taxonomy", W. H. Freeman and Company, 1973.Google ScholarGoogle Scholar
  12. K. Maekawa, "Corpus of Spontaneous Japanese: Its Design and Evaluation", Proc. ISCA & IEEE Workshop on SSPR, pp. 7--12, 2003.Google ScholarGoogle Scholar

Index Terms

  1. Speaker Clustering Based on Non-Negative Matrix Factorization Using Gaussian Mixture Model in Complementary Subspace

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            CBMI '17: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing
            June 2017
            237 pages
            ISBN:9781450353335
            DOI:10.1145/3095713

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 June 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper
            • Research
            • Refereed limited

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader