Tuyển tập các báo cáo nghiên cứu về sinh học được đăng trên tạp chí y học Molecular Biology cung cấp cho các bạn kiến thức về ngành sinh học đề tài: Phylogenetic comparative assembly. | Husemann and Stoye Algorithms for Molecular Biology 2010 5 3 http content 5 1 3 AMR ALGORITHMS FOR MOLECULAR BIOLOGY RESEARCH Open Access Phylogenetic comparative assembly Peter Husemann1 2 Jens Stoye1 3 Abstract Background Recent high throughput sequencing technologies are capable of generating a huge amount of data for bacterial genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence. Results Here we propose an algorithm that takes several related genomes and their phylogenetic relationships into account to create a graph that contains the likelihood for each pair of contigs to be adjacent. Subsequently this graph can be used to compute a layout graph that shows the most promising contig adjacencies in order to aid biologists in finishing the complete genomic sequence. The layout graph shows unique contig orderings where possible and the best alternatives where necessary. Conclusions Our new algorithm for contig ordering uses sequence similarity as well as phylogenetic information to estimate adjacencies of contigs. An evaluation of our implementation shows that it performs better than recent approaches while being much faster at the same time. Background Today the nucleotide sequences of many genomes are known. In the first genome projects the process of obtaining the DNA sequence by multi-step clone-by-clone sequencing approaches was costly and tedious. Nowadays the most common approach for de-novo genome sequencing is whole genome shotgun sequencing 1 2 . Here the genome is fragmented randomly into small parts. Each of these fragments is sequenced for example with recent high throughput methods 3 4 . In the next step overlapping reads are merged with an assembler software into a contiguous string. However instead of the .