Question

Adding a group of samples reduces DEGs in other groups

0

Entering edit mode

Jonathan ▴ 10

@c31cf0e5

Last seen 3 months ago

United States

I have a simple Bulk RNASeq experiment, in which I compare relatively matched lesional and nonlesional samples, using the voom-duplicatecorrelation-limma pipeline. Recently, I've added a group of controls, and consequently, two more contrasts (Lesional vs. Controls, non-lesional vs. controls). To my surprise, this reduced the number of DEGs in the lesional-vs-nonlesional comparison.

Two questions:

If I understood correctly, is this because there is a high variance within the control group, and this variance affects the voom results of the entire experiment?
Could I mitigate this by running the three pipelines separately, each with two compared groups and one contrast comparison?

Thanks!

RNASeq limma • 1.3k views

ADD COMMENT • link updated 21 months ago by Gordon Smyth 52k • written 21 months ago by Jonathan ▴ 10

0

Entering edit mode

Are libraries prepared in the same batch as the samples that were already present?

ADD REPLY • link 21 months ago ATpoint ★ 4.8k

0

Entering edit mode

Yes, all libraries were prepared in the same batch.

ADD REPLY • link 21 months ago Jonathan ▴ 10

score 4 · Accepted Answer · 2023-07-31

4

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 4 hours ago

WEHI, Melbourne, Australia

is this because there is a high variance within the control group, and this variance affects the voom results of the entire experiment?

Presumably yes, but you should directly examine whether the controls are variable by plotting the data (plotMDS) rather than making indirect conclusions from the number of DE genes.

Could I mitigate this by running the three pipelines separately, each with two compared groups and one contrast comparison?

No, we definitely do not recommend that. Far better is to estimate quality weights either for individual samples or for groups (using voomLmFit or voomWithQualityWeights). To estimate sample weights with voomLmFit set sample.weights=TRUE. To estimate group weights, set var.group=Group.

ADD COMMENT • link 21 months ago Gordon Smyth 52k

0

Entering edit mode

Thank you. What are the pros/cons of using sample weights vs. using group weights? If I understand correctly, it seems that calculating individual sample weights is more computation-heavy, but more accurate.

ADD REPLY • link 21 months ago Jonathan ▴ 10

1

Entering edit mode

No, individual sample weights are neither more computationally heavy nor necessarily more accurate.

If (after exploring your data with QC plots) your data seems to have outlier samples, then you should use sample weights. If the issue isn't outliers but rather a systematic increase in variability in one group than another, then you should use group weights. It is not a matter of pros and cons but rather a matter of matching the analysis to the nature of the data.

ADD REPLY • link 21 months ago Gordon Smyth 52k

0

Entering edit mode

In the voomLmFit documentaiton, it says: "var.group - optional vector or factor indicating groups to have different array weights". I've also tried to execute it with var.group=TRUE, but it failed as var.group has wrong length. It seems that I should specify the groups, e.g. voomLmFit(..., var.group = phenoData$Group), is it not?

Additionally, if both outlier samples and increased variability are issues, can I use both options? e.g., voomLmFit(counts = DGE.cpm, design = design, block = phenoData$SubjectID, sample.weights = T, var.group = phenoData$Group)?

Thank you, I really appreciate it.

ADD REPLY • link 21 months ago Jonathan ▴ 10

1

Entering edit mode

Yes, var.group should be the group factor. Sorry, I typed the wrong thing in my answer above, now corrected.

No you cannot specify both options. The function can actually handle very general possibilities via the var.design argument, but I recommend that you stick or one of the two options I mentioned.

ADD REPLY • link 21 months ago Gordon Smyth 52k