Báo cáo khoa học: "A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Roots"

We present a clustering algorithm for Arabic words sharing the same root. Root based clusters can substitute dictionaries in indexing for IR. Modifying Adamson and Boreham (1974), our Two-stage algorithm applies light stemming before calculating word pair similarity coefficients using techniques sensitive to Arabic morphology. Tests show a successful treatment of infixes and accurate clustering to up to for unedited Arabic text samples, without the use of dictionaries.

TÀI LIỆU LIÊN QUAN
TÀI LIỆU XEM NHIỀU