Tuyển tập các báo cáo nghiên cứu về y học được đăng trên tạp chí y học Wertheim cung cấp cho các bạn kiến thức về ngành y đề tài: Differential expression analysis for sequence count data. | Anders and Huber Genome Biology 2010 11 R106 http 2010 11 10 R106 Genome Biology METHOD Open Access Differential expression analysis for sequence count data Simon Anders Wolfgang Huber Abstract High-throughput sequencing assays such as RNA-Seq ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution with variance and mean linked by local regression and present an implementation DESeq as an R Bioconductor package. Background High-throughput sequencing of DNA fragments is used in a range of quantitative assays. A common feature between these assays is that they sequence large amounts of DNA fragments that reflect for example a biological system s repertoire of RNA molecules RNA-Seq 1 2 or the DNA or RNA interaction regions of nucleotide binding molecules ChIP-Seq 3 HITS-CLIP 4 . Typically these reads are assigned to a class based on their mapping to a common region of the target genome where each class represents a target transcript in the case of RNA-Seq or a binding region in the case of ChIP-Seq. An important summary statistic is the number of reads in a class for RNA-Seq this read count has been found to be to good approximation linearly related to the abundance of the target transcript 2 . Interest lies in comparing read counts between different biological conditions. In the simplest case the comparison is done separately class by class. We will use the term gene synonymously to class even though a class may also refer to for example a transcription factor binding site or even a barcode 5 . We would like to use statistical testing to decide whether for a given gene an observed difference in read counts is significant that is whether it is greater than what would be .