Question

Why did PCA plots look exactly the same after adjusting for batch covariate in DESeq2 design?

1

Entering edit mode

Quang ▴ 10

@873e0bdb

Last seen 12 hours ago

United Kingdom

Hi there,

I have a total of 6 donors, 2 cell types and 3 conditions (ex vivo, unstim, and stim). I run the experiments in 2 days. The donors in day one are not the same donors in day 2. In each day, I did have all 3 conditions and both cell types, so it is balanced.

Experimental layout

When I vsd <- vst(dds, blind=TRUE) and plotPCA(vsd, returnData = TRUE, intgroup = c("Date_lysed", "Conditions")) after running DESeq2::DESeqDataSetFromHTSeqCount(design = ~ Subset * Conditions), Stim and Unstim cells for both cell types were grouped away from ex vivo, but within unstim and stim group, the days of running experiment was the main force separating them into 2 groups.

Pre-adjustment: X and circles indicate the 2 days of running

I then ddsHTSeq <- DESeq2::DESeqDataSetFromHTSeqCount(design = ~ Date_lysed + Subset * Conditions), and reran thevst(blind=FALSE)andplotPCA` on the new object, but the plot is exactly the same.

Post-adjustment

I plan to try SVA and RUV next, but anyone has any suggestions/insights?

Thanks for your help!

DESeq2 BatchEffect • 195 views

ADD COMMENT • link updated 12 hours ago by James W. MacDonald 67k • written 20 hours ago by Quang ▴ 10

score 0 · Answer 1 · 2025-01-08

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 11 hours ago

United States

All you have done is set up a DESeqDataSet object and then run vst on it. That won't affect the PCA plot at all - you are simply telling DESeq2 the model you want to fit, which won't happen until you fit the model by running DESeq on the object. You don't have any batch effects anyway. It's normal for different conditions to be different, so the PCA plot makes sense.

You also don't say what 'Subset' is, but I cannot imagine why you want an interaction between that and the 'Conditions'.

ADD COMMENT • link 16 hours ago James W. MacDonald 67k

0

Entering edit mode

Hi, James. I forgot to mention, but yes, I did run DESeq2::DESeq(). So, I ran DESeq2::DESeqDataSetFromHTSeqCount, then DESeq2::DESeq(), then vst(dds, blind=FALSE), then plotPCA.

From the PCA plots, I thought there was actually batch effect by Date_lysed within the coloured dots (brown are unstim and red are stim)? It is because these brown and red symbols are grouped into 2 groups that I showed as 2 boxes below (circle = day 1 of experiment and cross = day 2 of experiment). Or, am I missing something about this PCA plot?

enter image description here

Subset variable refers to two cell types for each condition. For example, ex vivo for a donor was sorted into Subset 1 and Subset 2. Subsets are not visualised in this plot. Donor is biological replicates. There is no technical replicate.

My aim is to compare differences between these 2 subsets within Ex vivo condition. In addition, I want to compare Subset 1 between Unstim and Stim, and Subset 2 between Unstim and Stim. That was why the design = ~ Subset * Conditions

Given the ex vivo (black crosses and black dots) do not seem to be affected by batch variable date_lysed, do you think it is best to just use the count file data of ex vivo samples to compare between Subset in ex vivo condition, then do a separate analysis with the stim and unstim samples?

Thank you so much!

ADD REPLY • link 14 hours ago • updated 13 hours ago Quang ▴ 10

0

Entering edit mode

How you analyze your data is up to you. But do note that Subset * Conditions implies that you want to find genes that respond to stimulus differently, depending on the cell type, which is different from what you describe. If you do care about interactions, it is IMO much easier to reparameterize to condition_subset and fit a cell means model, then make whatever comparisons you want. There is a small paragraph in the DESeq2 vignette that talks about this, but a clearer explanation IMO is in the limma User's Guide in 9.5.2-9.5.3, starting on page 45.

Also, you probably want to include a Donor factor in your model to account for Donor-specific differences. But if you do so, you will not be able to include Date_Lysed, because that's nested in Donor.

ADD REPLY • link 12 hours ago James W. MacDonald 67k

score 0 · Answer 2 · 2025-01-08

0

Entering edit mode

ATpoint ★ 4.6k

@atpoint-13662

Last seen 14 hours ago

Germany

It's an FAQ in the vignette: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-after-vst-are-there-still-batches-in-the-pca-plot

ADD COMMENT • link 15 hours ago ATpoint ★ 4.6k