Question

RNAseq expression analysis using DESeq: technical replicates, paired samples

0

Entering edit mode

Simon Anders ★ 3.8k

@simon-anders-3855

Last seen 4.2 years ago

Zentrum für Molekularbiologie, Universi…

Dear Jinfeng On Wed, 14 Apr 2010 22:28:17 +0000 (UTC), Jinfeng Liu <jinfengl at="" gene.com=""> wrote: > I'm trying to use DESeq for RNAseq expression analysis. I haven't been > able to find information about how to deal with the following issues: > > 1) technical replicates > We have two biological samples, two libraries (of different insert size) > were prepared for each of them. so I have four lanes of data in total and I > want to do differential expression between the two samples. It doesn't look quite > right to me to set up the condition vector as > > conds <- c( "Sample1", "Sample1","Sample2","Sample2") since they are only > technical replicates, not biological. But I'm not sure what to do. If you set up your test this way, DESeq will assume that the variance between the replicates is all there is. Hence, roughly speaking, it will call a difference significant if it is larger than the fluctuations observed between the technical replicates. This then only tells you that the gene might be typically different between different samples, but you won't know whether the difference is really due to the difference in treatment or whether you would have observed the same magnitude of difference between two samples that have been treated the same way. Of course, without biological replicates, there is no way to settle this question properly. The best thing you can do is to add up the counts from each sample, and compare just one data column with summed data from Sample 1 with one data column for Sample 2. Call DESeq's 'estimateVarianceFunctions' function with the argument 'pool=TRUE', and it will ignore the sample labels and estimate the variance between the conditions. Hence, it will only call those genes differentially expressed that have a much stronger difference between conditions than the other genes of similar expression strength. You might find only few differentially expressed genes, but these are the only ones for which you can be somewhat sure that they are proper hits. > 2) Paired samples > We have samples from three patients. For each patient, we have matched > tumor and adjacent normal samples. How should we set up the analysis to capture the > pair information? Sorry, but DESeq does not support paired tests (yet). I have some ideas on how to add this but this might take a while. For now, your best option is to use DESeq's 'getVarianceStabilizedData' to transform your data to a scale on which it is approximately homoskedastic. Then, you can use a pair-wise t-test or a pair-wise z-test. (Don't do this with the raw data, use DESeq's variance-stabilizing transcformation to make them homoskedastic first.) The pairwise t-test should work out of the box with R's standard 't.test' function. A pair-wise z-test should have more power in this setting, as, after the variance-stabilizing transformation, you may assume that all data has the same variance. Estimate this variance from your genes in a pooled fashion (ask again if you don't know how to do that) and take the median. Divide the pair differences by the square root of this to get z scores, then use 'pnorm' to get a p value. In my experience, this should work reasonably well even though it may not have as much power as a proper NB test would have. Cheers Simon +--- | Dr. Simon Anders, Dipl.-Phys. | European Molecular Biology Laboratory (EMBL), Heidelberg | office phone +49-6221-387-8632 | preferred (permanent) e-mail: sanders at fs.tum.de

RNASeq DESeq RNASeq DESeq • 1.1k views

ADD COMMENT • link 14.5 years ago Simon Anders ★ 3.8k