ABSTRACT
In this paper, we present a robust speech recognition based on binaural speech enhancement system as a preprocessing step. This system uses an existing dereverberation technique followed by a spatial masking-based noise removal algorithm where only signals coming from the desired directions are retained by using a threshold angle. While state-of-the art approaches fix the threshold angle heuristically over all time frames, in this paper, we propose to consider an adaptive computation where this threshold angle is first learned in several noise-only frames and then updated frame by frame. Speech recognition results in real environment show the effectiveness of the proposed speech enhancement approach.
- M. P. Cooke, P. Green, L. Josifovski, and A. Vizinho. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 34: 267--285, 2001. Google ScholarDigital Library
- N. Q. K. Duong, E. Vincent, and R. Gribonval. Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. on Audio, Speech and Language Processing, 18(7): 1830--1840, 2010. Google ScholarDigital Library
- D. T. T. et al. Speech enhancement using combination of dereverberation and noise reduction for robust speech recognition. In Proceedings of the Second Symposium on Information and Communication Technology, 2011. Google ScholarDigital Library
- C. Kim, K. Kumar, and R. M. Stern. Binaural sound source separation motivated by auditory processing. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pages 4574--4577, 2011.Google ScholarCross Ref
- C. Kim and R. M. Stern. Nonlinear enhancement of onset for robust speech recognition. In Proc. Int. Conf. on Spoken Language Processing (INTERSPEECH), pages 2058--2061, 2010.Google ScholarCross Ref
- G. Kim and P. Loizou. Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Trans. Audio, Speech, Language Processing, 18(8): 2080--2090, 2010. Google ScholarDigital Library
- T. F. Kleinschmidt. Robust speech recognition using speech enhancement. PhD thesis, Queensland University of Technology, March 2010.Google Scholar
- S. Makino, T.-W. Lee, and H. Sawada. Blind Speech Separation. Springer, 2007.Google ScholarCross Ref
- H. Park and R. M. Stern. Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero crossings. Speech Communication, 51(1): 15--25, 2009. Google ScholarDigital Library
- B. Raj and R. M. Stern. Missing-feature methods for robust automatic speech recognition. IEEE Signal Processing Magazine, 22(5): 101--116, 2005.Google ScholarCross Ref
- F. I. Shoji Kajita, Kazuya Takeda. A binaural speech processing methos using subband-crosscerrelation analysis for noise robust recognition. In IEEE. Conference Acoustic, Speech, and Signal Processing, 1997. Google ScholarDigital Library
- M. Slaney. Auditory toolbox. Technical report, Interval Research Corporation, 1998.Google Scholar
- S. Srinivasan and D. L. Wang. Robust speech recognition by integrating speech separation and hypothesis testing. Speech Communication, 52: 72--81, 2010. Google ScholarDigital Library
- E. Vincent, S. Araki, F. Theis, G. Nolte, P. Bofill, H. Sawada, A. Ozerov, V. Gowreesunker, D. Lutter, and N. Q. K. Duong. The Signal Separation Campaign (2007--2010): Achievements and remaining challenges. Signal Processing, 2011. Google ScholarDigital Library
Index Terms
- Robust speech recognition based on binaural speech enhancement system as a preprocessing step
Recommendations
Speech enhancement using combination of dereverberation and noise reduction for robust speech recognition
SoICT '11: Proceedings of the 2nd Symposium on Information and Communication TechnologyIn this paper, we describe a speech enhancement approach for robust speech recognition. This approach consists of two stages to solve both current problems of speech recognition: reverberation and noise. Firstly, speech signal is dereveberated by ...
Speech enhancement for robust automatic speech recognition
Evaluation of baseline CHiME3 recogniser in diverse range of acoustic conditions.Performance curves indicate relative influence of noise and reverberation.Evaluation of 6 different speech enhancement pipelines.Deverberation and beamforming dramatically ...
Combined speech enhancement and auditory modelling for robust distributed speech recognition
The performance of automatic speech recognition (ASR) systems in the presence of noise is an area that has attracted a lot of research interest. Additive noise from interfering noise sources, and convolutional noise arising from transmission channel ...
Comments