Applying the attributed prefix tree for mining closed sequential patterns

This paper applies the characteristics of closed sequential patterns and sequence extensions into the prefix tree structure to mine closed sequential patterns from the sequence database. The paper uses the parent–child relationship on prefix tree structure and each node on prefix tree is also added fields to determine whether that is a closed sequential pattern or not. Experimental results show that the number of sequential patterns is reduced significantly. | Journal of Science and Technology 54 (3A) (2016) 106-114 APPLYING THE ATTRIBUTED PREFIX TREE FOR MINING CLOSED SEQUENTIAL PATTERNS Pham Thi Thiet*, Vo Thi Thanh Van Faculty of Information Technology, Industrial University of Ho Chi Minh City, 12 Nguyen Van Bao Street, Ward 4, Go Vap District, HoChiMinh City Email: phamthithiet@ Received: 4 May 2016; Accepted for publication: 28 July 2016 ABSTRACT Mining closed sequential patterns is one of important tasks in data mining. It is proposed to resolve difficult problems in mining sequential pattern such as mining long frequent sequences that contain a combinatorial number of frequent subsequences or using very low support thresholds to mine sequential patterns is usually both time- and memory-consuming. This paper applies the characteristics of closed sequential patterns and sequence extensions into the prefix tree structure to mine closed sequential patterns from the sequence database. The paper uses the parent–child relationship on prefix tree structure and each node on prefix tree is also added fields to determine whether that is a closed sequential pattern or not. Experimental results show that the number of sequential patterns is reduced significantly. Keywords: sequential pattern, closed sequential pattern, prefix tree, sequence database. 1. INTRODUCTION Sequential pattern mining, since it was first introduced by Agrawal [1], has played an important role in data mining tasks with broad applications including market and customer analysis, web log analysis, pattern discovery in protein sequences, and mining XML query access patterns for caching and so on. The sequential pattern mining algorithms proposed so far have a good performance in databases with short frequent sequences [2, 3, 9 - 10, 16]. However, when mining long frequent sequences that contain a combinatorial number of frequent subsequences, such a mining will generate an explosive number of frequent subsequences for long patterns, or when using

Không thể tạo bản xem trước, hãy bấm tải xuống
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.