Question

DESEQ Inquiry

1

Entering edit mode

sap275 ▴ 10

@1a70038b

Last seen 5 months ago

United States

Hello! I have a general inquiry about the conceptual process and analysis within DESEQ packages. I am running some bulk RNA data through DESEQ2. Some ideas around the code and files are confusing my troubleshooting. Below are my questions:

How does creating a deseq object using dds <- DESeqDataSetFromMatrix(countData = countdata, colData = colData, design = ~ condition + batch, tidy = TRUE) differ from an object after dds<-deseq(dds)? How is the count matrix being changed by the deseq() code?
Specifying my batch as a covariate in ~ condition + batch did not seem to correct my batch effect, so I used Limma for batch correction. Is it appropriate to use Limma batch corrected output to perform further downstream analysis and plotting (result(), PCA, MA plots, heatmaps, etc.)?
Are there any tips/advice regarding effective plotting, statistical analysis, alpha values, etc.?

I would truly appreciate any insight I can receive. It is pretty amazing to start learning bioinformatics, and I hope to continue more after this project. Thank you!

DESeq2 rnaseqGene RNASeqData limma RNASeq • 1.4k views

ADD COMMENT • link updated 14 months ago by Michael Love 43k • written 14 months ago by sap275 ▴ 10

score 1 · Answer 1 · 2024-02-26

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 8 days ago

United States

Take a look at the help function for DESeq and the workflow:

https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#differential-expression-analysis

Note that the counts are never modified during DESeq2. This is also described in the 2014 paper.

Is it appropriate to use Limma batch corrected output to perform further downstream analysis

Answered your other recent post:

Limma Question

Are there any tips/advice regarding effective plotting, statistical analysis, alpha values, etc.?

In our workflow we go through an entire analysis with description for beginners:

https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

ADD COMMENT • link 14 months ago Michael Love 43k

0

Entering edit mode

Thank you for your reply! I have performed deseq() with the multi-factor design of ~batch + condition. However, when I plot it in PCA with the interest group of vsd <- vst(dds,blind= FALSE) plotPCA(vsd,intgroup=c("condition")) I am still observing a batch effect within my samples. Should I be performing batch correction using packages such as SUV?

ADD REPLY • link 14 months ago sap275 ▴ 10

1

Entering edit mode

I am still observing a batch effect within my samples.

You've read the FAQ that I linked to in my reply to your other post? The FAQ directly answers this.

ADD REPLY • link 14 months ago Michael Love 43k

0

Entering edit mode

Thank you for your reply on this and my other post. I did follow your instructions for multi-factor design using the RUV package. However, even if I include the other factors I still encounter same issue where my PCA plot shows clustering based on the batches instead of condition. design = ~ batch + condition

Only after Limma removeBatchEffect function will it cluster by condition; however, I am informed that Limma corrected values should not be used for downstream analysis. Should I look for alternative methods?

ADD REPLY • link 14 months ago sap275 ▴ 10

0

Entering edit mode

Hmm, I think we are just having trouble communicating, we are kind of talking past each other here.

The FAQ says that what you describe is exactly what is expected.

Limma removeBatchEffect does something to the data you use for the PCA plot which you can think of as analogous to what happens when you use a multi-factor design with DESeq2 but with the original counts. You don't need to modify the original counts for differential analysis.

So you don't need to have concern here.

ADD REPLY • link 14 months ago Michael Love 43k