Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article Compressing Proteomes: The Relevance of Medium Range Correlations | Hindawi Publishing Corporation EURASIP Journal on Bioinformatics and Systems Biology Volume 2007 Article ID 60723 8 pages doi 2007 60723 Research Article Compressing Proteomes The Relevance of Medium Range Correlations Dario Benedetto 1 Emanuele Caglioti 1 and Claudia Chica2 1 Dipartimento di Matematica Universita di Roma La Sapienza Piazzale Aldo Moro 5 00185 Roma Italy 2 Structural and Computational Biology Unit EMBL Heidelberg Meyerhofstrafie 1 69117 Heidelberg Germany Received 14 January 2007 Revised 28 May 2007 Accepted 10 September 2007 Recommended by Teemu Roos We study the nonrandomness of proteome sequences by analysing the correlations that arise between amino acids at a short and medium range more specifically between amino acids located 10 or 100 residues apart respectively. We show that statistical models that consider these two types of correlation are more likely to seize the information contained in protein sequences and thus achieve good compression rates. Finally we propose that the cause for this redundancy is related to the evolutionary origin of proteomes and protein sequences. Copyright 2007 Dario Benedetto et al. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited. 1. INTRODUCTION Protein sequences have been considered for a long time as nearly random or highly complex sequences from the informational content point of view. The main reason for this is the local complexity of amino acid composition that is the type and number of amino acids found in a sequence segment especially inside the globular domains 1 . This complexity could be related to the so called randomness of coding sequences in DNA already pointed out in a pioneering work 2 and explained by evolutionary models 3 . Studies on protein sequence compression show that proteins behave as sequences of independent .