Báo cáo khoa học: "A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Roots"

We present a clustering algorithm for Arabic words sharing the same root. Root based clusters can substitute dictionaries in indexing for IR. Modifying Adamson and Boreham (1974), our Two-stage algorithm applies light stemming before calculating word pair similarity coefficients using techniques sensitive to Arabic morphology. Tests show a successful treatment of infixes and accurate clustering to up to for unedited Arabic text samples, without the use of dictionaries.

TÀI LIỆU LIÊN QUAN
TÀI LIỆU XEM NHIỀU
TÀI LIỆU MỚI ĐĂNG