Tuyển tập các báo cáo nghiên cứu về sinh học được đăng trên tạp chí y học Molecular Biology cung cấp cho các bạn kiến thức về ngành sinh học đề tài: DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach. | Serin and Vingron Algorithms for Molecular Biology 2011 6 18 http content 6 1 18 AMR ALGORITHMS FOR MOLECULAR BIOLOGY RESEARCH Open Access DeBi Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach Akdes Serin and Martin Vingron Abstract Background The analysis of massive high throughput data via clustering algorithms is very important for elucidating gene functions in biological systems. However traditional clustering methods have several drawbacks. Biclustering overcomes these limitations by grouping genes and samples simultaneously. It discovers subsets of genes that are co-expressed in certain samples. Recent studies showed that biclustering has a great potential in detecting marker genes that are associated with certain tissues or diseases. Several biclustering algorithms have been proposed. However it is still a challenge to find biclusters that are significant based on biological validation measures. Besides that there is a need for a biclustering algorithm that is capable of analyzing very large datasets in reasonable time. Results Here we present a fast biclustering algorithm called DeBi Differentially Expressed BIclusters . The algorithm is based on a well known data mining approach called frequent itemset. It discovers maximum size homogeneous biclusters in which each gene is strongly associated with a subset of samples. We evaluate the performance of DeBi on a yeast dataset on synthetic datasets and on human datasets. Conclusions We demonstrate that the DeBi algorithm provides functionally more coherent gene sets compared to standard clustering or biclustering algorithms using biological validation measures such as Gene Ontology term and Transcription Factor Binding Site enrichment. We show that DeBi is a computationally efficient and powerful tool in analyzing large datasets. The method is also applicable on multiple gene expression datasets coming from different labs or platforms. Background In recent years .