Tuyển tập các báo cáo nghiên cứu về y học được đăng trên tạp chí y học Critical Care giúp cho các bạn có thêm kiến thức về ngành y học đề tài: Genome assembly forensics: finding the elusive mis-assembly. | Open Access Software Genome assembly forensics finding the elusive mis-assembly Adam M Phillippy Michael C Schatz and Mihai Pop Address Center for Bioinformatics and Computational Biology University of Maryland College Park MD 20742 USA. Correspondence Mihai Pop. Email mpop@ Published 14 March 2008 Genome Biology 2008 9 R55 doi 186 gb-2008-9-3-r55 The electronic version of this article is the complete one and can be found online at http 2008 9 3 R55 Received 16 October 2007 Revised 10 January 2008 Accepted 14 March 2008 2008 Phillippy et al. licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License http licenses by which permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited. Abstract We present the first collection of tools aimed at automated genome assembly validation. This work formalizes several mechanisms for detecting mis-assemblies and describes their implementation in our automated validation pipeline called amosvalidate. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released open-source at http . Rationale Sequence assembly errors exist in both draft and finished genomes. Since the initial draft sequence of the human genome was released in 2001 1 2 great effort has been spent validating and finishing the official sequence. During this process it became clear that the original draft sequences were not entirely accurate reconstructions of the genome 3-6 . It was also reported in 2004 that finished human bacterial artificial chromosome BAC sequences contained a single basepair error per every 73 Kbp of sequence and more significant mis-assemblies every .