Entering edit mode
Simon Anders
★
3.8k
@simon-anders-3855
Last seen 4.2 years ago
Zentrum für Molekularbiologie, Universi…
Dear Jinfeng
On Wed, 14 Apr 2010 22:28:17 +0000 (UTC), Jinfeng Liu <jinfengl at="" gene.com="">
wrote:
> I'm trying to use DESeq for RNAseq expression analysis. I haven't
been
> able to find information about how to deal with the following
issues:
>
> 1) technical replicates
> We have two biological samples, two libraries (of different insert
size)
> were prepared for each of them. so I have four lanes of data in
total
and I
> want to do differential expression between the two samples. It
doesn't
look quite
> right to me to set up the condition vector as
>
> conds <- c( "Sample1", "Sample1","Sample2","Sample2") since they are
only
> technical replicates, not biological. But I'm not sure what to do.
If you set up your test this way, DESeq will assume that the variance
between the replicates is all there is. Hence, roughly speaking, it
will
call a difference significant if it is larger than the fluctuations
observed between the technical replicates. This then only tells you
that
the gene might be typically different between different samples, but
you
won't know whether the difference is really due to the difference in
treatment or whether you would have observed the same magnitude of
difference between two samples that have been treated the same way.
Of course, without biological replicates, there is no way to settle
this
question properly.
The best thing you can do is to add up the counts from each sample,
and
compare just one data column with summed data from Sample 1 with one
data
column for Sample 2. Call DESeq's 'estimateVarianceFunctions' function
with
the argument 'pool=TRUE', and it will ignore the sample labels and
estimate
the variance between the conditions. Hence, it will only call those
genes
differentially expressed that have a much stronger difference between
conditions than the other genes of similar expression strength. You
might
find only few differentially expressed genes, but these are the only
ones
for which you can be somewhat sure that they are proper hits.
> 2) Paired samples
> We have samples from three patients. For each patient, we have
matched
> tumor and adjacent normal samples. How should we set up the analysis
to
capture the
> pair information?
Sorry, but DESeq does not support paired tests (yet). I have some
ideas on
how to add this but this might take a while.
For now, your best option is to use DESeq's
'getVarianceStabilizedData' to
transform your data to a scale on which it is approximately
homoskedastic.
Then, you can use a pair-wise t-test or a pair-wise z-test. (Don't do
this
with the raw data, use DESeq's variance-stabilizing transcformation to
make
them homoskedastic first.)
The pairwise t-test should work out of the box with R's standard
't.test'
function. A pair-wise z-test should have more power in this setting,
as,
after the variance-stabilizing transformation, you may assume that all
data
has the same variance. Estimate this variance from your genes in a
pooled
fashion (ask again if you don't know how to do that) and take the
median.
Divide the pair differences by the square root of this to get z
scores,
then use 'pnorm' to get a p value. In my experience, this should work
reasonably well even though it may not have as much power as a proper
NB
test would have.
Cheers
Simon
+---
| Dr. Simon Anders, Dipl.-Phys.
| European Molecular Biology Laboratory (EMBL), Heidelberg
| office phone +49-6221-387-8632
| preferred (permanent) e-mail: sanders at fs.tum.de