Tham khảo luận văn - đề án 'báo cáo y học: "substantial deletion overlap among divergent arabidopsis genomes revealed by intersection of short reads and tiling arrays"', luận văn - báo cáo phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả | Santuari et al. Genome Biology 2010 11 R4 http 2010 11 1 R4 w Genome Biology METHOD Open Access Substantial deletion overlap among divergent Arabidopsis genomes revealed by intersection of short reads and tiling arrays Luca Santuari 1 Sylvain Pradervand2 3 Amelia-Maria Amiguet-Vercher1 Jerome Thomas3 Eavan Dorcey1 Keith Harshman3 loannis Xenarios2 Thomas E Juenger4 and Christian S Hardtke 1 Abstract Identification of small polymorphisms from next generation sequencing short read data is relatively easy but detection of larger deletions is less straightforward. Here we analyzed four divergent Arabidopsis accessions and found that intersection of absent short read coverage with weak tiling array hybridization signal reliably flags deletions. Interestingly individual deletions were frequently observed in two or more of the accessions examined suggesting that variation in gene content partly reflects a common history of deletion events. Background Ultra-high throughput sequencing UHTS has become affordable to re-sequence genomes of model organisms such as Arabidopsis thaliana 1-5 . While identification of single nucleotide polymorphisms SNPs and small indels from UHTS short reads is relatively easy detection of structural variation such as larger deletions is less straightforward 2 3 6 7 . This is particularly true for analysis of divergent genomes such as those of Arabidop-sis strains that are not closely related to the reference accession Columbia-0 Col-0 . For instance the accuracy of short read mapping depends on the number of polymorphic sites permitted per read 8 . If it is set too high it can result in read mapping to false locations if it is set too low it can prevent mapping to the correct location. Moreover local accumulation of polymorphisms with respect to the reference genome can occur and such reads could only be correctly mapped with unrealistically relaxed settings that would interfere with overall correct annotation. Consequently the