Hi,
I am experimenting with different normalization techniques to account for batch effects on an RNAseq dataset. I am using DESeq2 for DE analysis, and RUVseq for normalization. For plotting purposes, I want to be able to extract the post-normalization counts (pseudocounts?) from the DESeqDataSet.
#countsTable is a gene x count matix
samples<-colnames(countsTable)
condition<-c(rep("ctrl",2),rep("A",2),rep("B",2),rep("C",2),rep("D",2), rep("E",2))
pData0 = data.frame(cbind(samples, condition), row.names = samples)#
pData<-pData(set4) ## from RUV analysis
# x conditions W_1
#ctrl_1 ctrl ctrl -0.2864384
#ctrl_2 ctrl ctrl 0.2952550
#A_1 A A -0.1725057
#A_2 A A 0.1767493
#B_1 B B -0.3269523
#B_2 B B 0.3252628
#C_1 C C -0.4057045
#C_2 C C 0.3960620
#D_1 D D -0.2868378
#D_2 D D 0.2795934
#E_1 E E -0.1910980
#E_2 E E 0.1966142
ddsfm <- DESeqDataSetFromMatrix(countData = filtered, colData=pData0, design=~conditions)
ddsfm1 <- DESeqDataSetFromMatrix(countData = filtered, colData=pData, design= ~W_1+conditions)
dds<-DESeq(ddsfm1)
dds0<-DESeq(ddsfm)
head(counts(dds, normalized=T))
head(counts(dds0, normalized=T))
When I do those last calls, the normalized counts look the same for both, even though the log2FoldChanges reflect the normalization. It seems like the counts(dds, normalized=TRUE) call only extracts the counts based on the internal normalization that DESeq2 uses. Is there a way to extract (or compute) post-normalization counts from scaling factors provided in the "design=" section from the creation of the DESeqDataSet?
Thanks in advance!
Hi Michael,
I would like to follow up on this issue. Is there any progress with this feature. Is it possible to provide a small demo how to visualize the normalized+regressed counts?
Thanks in advance.
Guy
I haven't worked this in, because I spent all my time on apeglm and shrinkage estimators. Here's some code you can try though:
Hi Michael,
Thank you very much for the quick response. I learn a lot from your posts.
Guy
Hi Michael, I think that
ignores the differences that are due to experimental conditions
If I correctly understand, would't the following plot the ruv corrected values?
```
Thanks again,
Guy
No my code is correct, because it keeps the differences due to condition in the data.
Yours removes it which means you will see nothing in the plot.Sorry I just saw the “fitted”. Yours is not correct for different reasons then: it keeps in the technical batch effect and it removes the “biological variation”.
I understand it now, thanks again for the help.
Hi @Michael,
Thanks for your reply; I look forward to the release of this feature.
(previous content removed due to calculation error)