Question

Highly DE genes but small count difference within conditions

0

Entering edit mode

bioshot.com • 0

@01d91c6f

Last seen 12 weeks ago

Italy

Hello! I am looking for DE genes comparing condition1 vs control. I have 3 biological replicates per each case, so 6 in total. I am using this design:

design=~ sex + condition

and I am filtering low expressed genes in this way:

smallestGroupSize <- 3 
keep <- rowSums(counts(dds0) >= 10) >= smallestGroupSize

Then I get my DE that looks like this, with lfc=+/-1.5 and padj<0.005 volcano_plot

But when i then look at raw counts and tpm of Slfn5 looks like this: enter image description here

Why do I get this high padjusted value for this gene?

RNASeq DESeq2 RNASeqData • 1.1k views

ADD COMMENT • link written 6 months ago by bioshot.com • 0

score 2 · Accepted Answer · 2024-10-07

It sounds like you're diving deep into DE analysis, comparing condition1 with the control using a solid design. However, the elevated padj value for Slfn5 may stem from its low expression levels despite meeting criteria. In a way, navigating statistical challenges in DE analysis parallels mastering the Slope Game both require strategic thinking to overcome hurdles and achieve optimal results.

score 1 · Accepted Answer · 2024-10-07

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 17 hours ago

United States

Could be due to controlling for sex. You have only three samples?

Dropping sex would likely remove such genes as marginally you don't see a trend.

ADD COMMENT • link 6 months ago Michael Love 43k

0

Entering edit mode

Hello Michael, thanks for your answer. Yes, I have only 3 samples :(. Dropping sex would remove those genes but at the same time does not include other DE genes. For example: tpm_Eif3j1 is not included. This is also valid the other way around with design=~genotype. I was wondering if could be a possibility to use a more stringent filtering based on condition and variance for the multivariate model, what do you think?

ADD REPLY • link 6 months ago bioshot.com • 0

0

Entering edit mode

TPM is not a robust scaling method and can be influenced by changes in global distribution.

What do you get with an MA plot. Can you highlight these genes in an MA plot?

Also how about a box plot of log (raw) counts eg

boxplot(log10(counts(dds)+1))

ADD REPLY • link 6 months ago Michael Love 43k

0

Entering edit mode

These are the MA plot for design=~sex+genotype and boxplots with boxplot(log10(counts(dds)+1)) MA plot sample_boxplot box

ADD REPLY • link 6 months ago bioshot.com • 0

0

Entering edit mode

Eif3j1 is a hard case, although there is separation here, there is also substantial spread. The posterior LFC is probably not that large? lfcShrink()?

ADD REPLY • link 6 months ago Michael Love 43k

0

Entering edit mode

The lfc of Eif3j1 is around -2.5, i tried with and without shrinkage but didn't change for those genes. What about Slfn5?

ADD REPLY • link 6 months ago bioshot.com • 0

0

Entering edit mode

These are both hard to call for me, looking at the plots. It ends up depending on the information sharing from the rest of the genes, and the design. I'm not sure 6 is enough to really control for sex and genotype, so you end up a bit overfitting in that design. I would tend to use either lfcShrink or lfcThreshold to prioritize genes.

ADD REPLY • link 6 months ago Michael Love 43k

0

Entering edit mode

Thanks for the suggestions. I simplified the model to design=~genotype. I tried both with and without lfcshrinkage and I am not finding anymore genes with high padj but no clear separation :) The problem is that I find significative genes only lowering padj to 0.5 and lfc to 1 and I don´t know if this can be accepted :| Setting lfcThreshold I don´t find any DE gene (with threshold 0.5)

ADD REPLY • link 6 months ago bioshot.com • 0

0

Entering edit mode

This last problem, I don't know how to help with, I think there's high variability here and not many degrees of freedom to overcome that

ADD REPLY • link 6 months ago Michael Love 43k