Question

DESeq2 batch design

1

Entering edit mode

Hope ▴ 10

@993cf259

Last seen 10 months ago

United States

Hi, I'm using DESeq2 to run analysis on 3 different cell types undergoing the same treatment conditions, each with 2 replicates (see metadata below). However, 1 of the cell types (HCT116) was done in a different lab and shows a stark batch effect between its two replicates, while the other 2 cell types have a more muted batch effect between their two replicates (PCA below).

metadata pca

Downstream, we are most interested in identifying the DE genes that are shared and unique between cell types.

I've analyzed each cell type in their own separate DESeq2 object using both replicate+treatment and treatment designs. Overall, I see the best precision with a known set of gene targets when using the native DESeq2 batch correction in HCT116, but the other two cell types show a small loss of TPs when using the ~replicate+treatment design. To me, this suggests that I'm fitting noise with the batch term in only these two cell types, which is causing these issues.

Right now, I'm considering integrating all of the datasets into a single DESeq2 object with the metadata table above, and then identifying the shared and cell-type specific response genes from this run using the design recommended at this link: (~cell + cell:rep + cell:treatment, section "Group-specific condition effects, individuals nested within groups" of the DESeq2 vignette), and then employing the appropriate group contrasts. Is simply finding the overlap of the individual DESeq2 runs (selectively applying batch correction or just using a batch correction for all) my best option, or is grouping them together the more robust method? The output statistics from the latter could also be useful for the planned analysis downstream, but I'm worried about losing cell-type specific signal with this design, plus fitting noise due to the batch issues stated above. Which is the better approach?

Thank you and let me know if you have any follow up questions!

DESeq2 • 1.1k views

ADD COMMENT • link updated 17 months ago by swbarnes2 ★ 1.4k • written 17 months ago by Hope ▴ 10

score 0 · Answer 1 · 2023-08-16

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 11 hours ago

United States

Sorry, I only have time these days to handle software related questions. For questions about approaching the statistical analysis, I recommend finding a local statistician or bioinformatician.

ADD COMMENT • link 17 months ago Michael Love 43k

score 0 · Answer 2 · 2023-08-21

I am not sure there is much to be gained by putting samples from two labs and three different cell types in one DESeq object.

I think you will have to just do the control vs treated for each cell type. With so few samples, you don't have the power to do anything fancier than that.

I'm not sure it makes any sense at all to include replicates as a factor. Your rep1 and rep2 of each cell type are all the same cells, aren't they? You don't have individuals nested in groups.