Nó cũng bao gồm một tiêu đề có chứa thông tin về người nhận và bản sắc của người nói. Làm thế nào mô tả SpokenContent được trích xuất và sử dụng không phải là một phần của tiêu chuẩn. Tuy nhiên, chương này bắt đầu với một giới thiệu ngắn để các hệ thống ASR. | APPLICATION EXAMPLE QUERY-BY-HUMMING 197 The extraction of symbolic information like a melody contour from music is strongly related to the music transcription problem and an extremely difficult task. This is because of the fact that most music files contain polyphonic sounds meaning that there are two or more concurrent sounds harmonies accompanying a melody or melodies with several voices. Technically speaking this task can be seen as the multiple fundamental frequency estimation MFFE problem also known as multi-pitch estimation . An overview of this research field can be found in Klapuri 2004 . The work of Goto 2000 2001 is especially interesting for QBH applications because Goto uses real work CD recordings in his evaluations. The methods used for MFFE can be divided into the following categories see Klapuri 2004 . Note that a clear division is not possible because these methods are complex and combine several processing principles. Perceptual grouping of frequency partials. MFFE and sound separation are closely linked as the human auditory system is very effective in separating and recognizing individual sound sources in mixture signals see also Section . This cognitive function is called auditory scene analysis ASA . The computational ASA CASA is usually viewed as a two-stage process where an incoming signal is first decomposed into its elementary time-frequency components and these are then organized to their respective sound sources. Provided that this is successful a conventional F0 estimation of each of the separated component sounds or in practice the F0 estimation often takes place as a part of the grouping process. Auditory model-based approach. Models of the human auditory periphery are also useful for MFFE especially for preprocessing the signals. The most popular unitary pitch model described in Meddis and Hewitt 1991 is used in the algorithms of Klapuri 2004 or Shandilya and Rao 2003 . An efficient calculation method for this auditory model