Báo cáo hóa học: " Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images"

Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images | Hindawi Publishing Corporation EURASIP Journal on Audio Speech and Music Processing Volume 2007 Article ID 64506 9 pages doi 2007 64506 Research Article Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images Koji Iwano Tomoaki Yoshinaga Satoshi Tamura and Sadaoki Furui Department of Computer Science Tokyo Institute of Technology 2-12-1-W8-77 Ookayama Meguro-ku Tokyo 152-8552 Japan Received 12 July 2006 Revised 24 January 2007 Accepted 25 January 2007 Recommended by Deliang Wang This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features lip-contour geometric features and lip-motion velocity features are used individually or jointly in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method. Copyright 2007 Koji Iwano et al. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited. 1. INTRODUCTION In the current environment of mobile technology the demand for noise-robust speech recognition is growing rapidly. Audio-visual bimodal speech recognition techniques using face information in addition to acoustic information are promising directions for increasing the robustness of speech .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.