Báo cáo hóa học: " Research Article Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker "

Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker | Hindawi Publishing Corporation EURASIP Journal on Audio Speech and Music Processing Volume 2008 Article ID 148967 13 pages doi 2008 148967 Research Article Towards an Intelligent Acoustic Front End for Automatic Speech Recognition Built-in Speaker Normalization Umit H. Yapanel and John H. L. Hansen Center for Robust Speech Systems Department of Electrical Engineering University of Texas at Dallas EC33 . Box 830688 Richardson Tx 75083-0688 USA Correspondence should be addressed to John H. L. Hansen Received 27 December 2007 Accepted 29 May 2008 Recommended by Sen M. Kuo A proven method for achieving effective automatic speech recognition ASR due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization VTLN despite the fact that it is computationally expensive. In this study we propose a novel online VTLN algorithm entitled built-in speaker normalization BISN where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN two separating warping phases are needed while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements thereby offering advantages for real-time ASR systems. Evaluations are performed for i an in-car extended digit recognition task where an on-the-fly BISN implementation reduces the relative word error rate WER by 24 and ii for a diverse noisy speech .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.