Feature selection for indexing protein structures

Protein is composed of amino acids which are, in turn, made up of mostly carbon, hydrogen, oxygen, nitrogen. A protein structure consists of thousands of coordinates of its atoms. Building structure index tables (often organized by suffix trees or arrays) of proteins is an important phase for quickly searching or classifying protein structures. | JOURNAL OF SCIENCE OF HNUE Natural Sci. 2011 Vol. 56 No. 7 pp. 32-43 FEATURE SELECTION FOR INDEXING PROTEIN STRUCTURES Luong Van Hieu Hanoi Vocational College for Electro-Mechanics Pham Tho Hoan Hanoi National University of Education E-mail hoanpt@ Abstract. Protein is composed of amino acids which are in turn made up of mostly carbon hydrogen oxygen nitrogen. A protein structure consists of thousands of coordinates of its atoms. Building structure index tables often organized by suffix trees or arrays of proteins is an important phase for quickly searching or classifying protein structures. Most previous studies use only structural features to build the index tables therefore searching and classifying performances based on these index tables are not good enough. In this paper we propose two methods of feature selection to create index tables that contain not only structural features but also sequantial features. Experiments on a protein classification dataset called SCOP showed that our proposed feature selection methods considerably improve the searching and classifying performances when compared with previous feature selection methods. Keywords Protein structure indexing feature selection. 1. Introduction The goal of Life Sciences is to understand the function of biological molecules such as the protein DNA RNA. While biology technologies can now easily deter- mine the sequence or 3D structure of biological molecules it is difficult to discover the functions of the biological molecules. However the structures of the sequence proteins especially 3D structures may be important information to predict their functions based on the previously known functions of other proteins with similar structures. In general the problem of structural search and comparison can be solved through two main phases firstly extracting informative feature vectors of 3D structures secondly representing and organizing these feature vectors by some ap- propriate data structures .

Không thể tạo bản xem trước, hãy bấm tải xuống
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.