Toward optimal feature selection using ranking methods and classification algorithms

We presented a comparison between several feature ranking methods used on two real datasets. We considered six ranking methods that can be divided into two broad categories: statistical and entropy-based. Four supervised learning algorithms are adopted to build models, namely, IB1, Naive Bayes, decision tree and the RBF network. We showed that the selection of ranking methods could be important for classification accuracy. In our experiments, ranking methods with different supervised learning algorithms give quite different results for balanced accuracy. | Yugoslav Journal of Operations Research 21 (2011), Number 1, 119-135 DOI: TOWARD OPTIMAL FEATURE SELECTION USING RANKING METHODS AND CLASSIFICATION ALGORITHMS Jasmina NOVAKOVIĆ, Perica STRBAC, Dusan BULATOVIĆ Faculty of Computer Science, Megatrend University, Serbia jnovakovic@ Received: April 2009 / Accepted: March 2011 Abstract: We presented a comparison between several feature ranking methods used on two real datasets. We considered six ranking methods that can be divided into two broad categories: statistical and entropy-based. Four supervised learning algorithms are adopted to build models, namely, IB1, Naive Bayes, decision tree and the RBF network. We showed that the selection of ranking methods could be important for classification accuracy. In our experiments, ranking methods with different supervised learning algorithms give quite different results for balanced accuracy. Our cases confirm that, in order to be sure that a subset of features giving the highest accuracy has been selected, the use of many different indices is recommended. Keywords: Feature selection, feature ranking methods, classification algorithms, classification accuracy. MSC: 90B50, 62C99 1. INTRODUCTION Feature selection can be defined as a process that chooses a minimum subset of M features from the original set of N features, so that the feature space is optimally reduced according to a certain evaluation criterion. As the dimensionality of a domain expands, the number of feature N increases. Finding the best feature subset is usually intractable [1] and many problems related to feature selection have been shown to be NPhard [2]. 120 J. Novakovic, P. Strbac, D. Bulatovic / Toward Optimal Feature Selection Feature selection is an active field in computer science. It has been a fertile field of research and development since 1970s in statistical pattern recognition [3, 4, 5], machine learning and data mining [6, 7, 8, 9, 10, 11]. Feature .

Không thể tạo bản xem trước, hãy bấm tải xuống
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.