Question

Question about quality of biological replicates from RNAseq data

0

Entering edit mode

sheng zhao ▴ 90

@sheng-zhao-5316

Last seen 9.9 years ago

Germany

Dear all, I am using edgeR to analyse one of my RNAseq data. In this experiment, we have three tumour samples from three different patients. Named: Patient_1, Patient_2, Patient_3 We did RNASeq on Tumour-cells under two different conditions: treated and untreated, and wanted to find differential expressed genes after treatment. In the end, we got following 6 RNASeq data (100bp, paired-end and HiSeq2500): Patient_1_treated, Patient_1_untreated, Patient_2_treated, Patient_2_untreated, Patient_3_treated, Patient_3_untreated, So for each condition (tread V.S untreated), we have three biological replicates. I followed <4.4 RNA-Seq of oral carcinomas vs matched normal tissue> in edgeR Users Guide to analysis these data, for this case study is very similar to what we did. However, as you can see, from the MDS-plot, Patient_1_treated is very close to Patient_1_untreated on first two dimensions, which is same for Patient_2 or Patient_3. plotMDS: http://i.imgur.com/6AQVNhi.png plotBCV : http://i.imgur.com/srqwuJC.png So, without surprise, I end up with finding 0 differential expressed genes (FDR<0.1) My questions are: 1.Could I say that, especially basing on the result of MDS-plot, the biological replicates are not consistent in my case, or the quality (fitness for purpose) is very low. 2. Should we add more replicates to increased statistic power or trying other statistic models, or you have another suggestion to deal with this kind of data ? Thank you for your suggesting in advances. Regards, Sheng ======================= R code: library( "edgeR" ) files <- dir( pattern=".read_cnt", full.names = FALSE) RG <- readDGE( files ) colnames( RG$samples ) d <- DGEList( counts = RG ) y <- d colnames(y)<-c( "Patient_1_treated", "Patient_1_untreated", "Patient_2_treated", "Patient_2_untreated","Patient_3_treated", "Patient_3_untreated") ##==Filtering keep <- rowSums(cpm(y) > 10 ) >= 3 y <- y[keep,] dim(y) ##Re-compute the library sizes: y$samples$lib.size <- colSums(y$counts) y$samples ##Normalizing y <- calcNormFactors(y) y$samples plotMDS(y, top=500, main ="Multi-Dimensional Scaling Plot for Count Data") ###The design matrix Patient <- factor( c( "Patient_1", "Patient_1","Patient_2", "Patient_2","Patient_3", "Patient_3" ) ) Treat <- factor( c( "Treated", "Untread", "Treated", "Untread","Treated", "Untread") ) data.frame( Sample = colnames(y), Patient,Treat) design <- model.matrix( ~Patient + Treat ) rownames( design ) <- colnames( y ) y <- estimateGLMCommonDisp( y, design, verbose = TRUE ) y <- estimateGLMTrendedDisp( y, design ) y <- estimateGLMTagwiseDisp( y, design ) fit <- glmFit(y, design) lrt <- glmLRT( fit ) top <- topTags( lrt, n = 50 ) q_value = 0.05 summary( de <- decideTestsDGE( lrt, p.value = q_value ) ) q_value = 0.1 summary( de <- decideTestsDGE( lrt, p.value = q_value ) ) plotBCV( y, cex = 0.8) #sessionInfo( ) #R version 3.0.1 (2013-05-16) #Platform: x86_64-apple-darwin10.8.0 (64-bit) # #locale: #[1] C # #attached base packages: #[1] splines stats graphics grDevices utils datasets methods base # #other attached packages: #[1] ggplot2_0.9.3.1 biomaRt_2.16.0 edgeR_3.2.3 limma_3.16.5 # #loaded via a namespace (and not attached): # [1] MASS_7.3-27 RColorBrewer_1.0-5 RCurl_1.95-4.1 XML_3.95-0.2 colorspace_1.2-2 # [6] dichromat_2.0-0 digest_0.6.3 grid_3.0.1 gtable_0.1.2 labeling_0.2 #[11] munsell_0.4 plyr_1.8 proto_0.3-10 reshape2_1.2.2 scales_0.2.3 #[16] stringr_0.6.2 tools_3.0.1 [[alternative HTML version deleted]]

RNASeq edgeR RNASeq edgeR • 1.3k views

ADD COMMENT • link 11.7 years ago sheng zhao ▴ 90