Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Probabilistic Aspects in Spoken Document Retrieval Wolfgang Macherey | EURASIP Journal on Applied Signal Processing 2003 2 115-127 2003 Hindawi Publishing Corporation Probabilistic Aspects in Spoken Document Retrieval Wolfgang Macherey Lehrstuhlfur Informatik VI Computer Science Department RWTH Aachen University of Technology D-52056 Aachen Germany Email Hans Jorg Viechtbauer Lehrstuhlfur Informatik VI Computer Science Department RWTH Aachen University of Technology D-52056 Aachen Germany Email viechtbauer@ Hermann Ney Lehrstuhlfur Informatik VI Computer Science Department RWTH Aachen University of Technology D-52056 Aachen Germany Email ney@ Received 8 April 2002 and in revised form 30 October 2002 Accessing information in multimedia databases encompasses a wide range of applications in which spoken document retrieval SDR plays an important role. In SDR a set of automatically transcribed speech documents constitutes the files for retrieval to which a user may address a request in natural language. This paper deals with two probabilistic aspects in SDR. The first part investigates the effect of recognition errors on retrieval performance and inquires the question of why recognition errors have only a little effect on the retrieval performance. In the second part we present a new probabilistic approach to SDR that is based on interpolations between document representations. Experiments performed on the TREC-7 and TREC-8 SDR task show comparable or even better results for the new proposed method than other advanced heuristic and probabilistic retrieval metrics. Keywords and phrases spoken document retrieval error analysis probabilistic retrieval metrics. 1. INTRODUCTION Retrieving information in large unstructured databases is one of the most important tasks computers use for today. While in the past information retrieval focused on searching written texts only the field of applications has since then extended to multimedia data such as audio and .