DESeq2 Dispersion Per Condition
2
2
Entering edit mode
dsg16 ▴ 20
@dsg16-10138
Last seen 8.6 years ago

Dear Michael,

I am hoping to use DESeq2 to analyze the sequencing results of a Multiply Parallel Reporter Assay that I performed. I was wondering if you might be able to help me answer a question that's been stumping me?

In this assay, we transfect cells with a diverse library of plasmids and then perform RNA-seq to assess the expression of each plasmid. My ultimate goal is to determine if a given sequence takes up a greater (or lesser) fraction in the RNA-seq library than it takes up in the plasmid library (and thus was differentially expressed).

I performed 6 biological replicates of RNA-seq (independent transfections), and 6 replicates of sequencing library preparation from the plasmid library that was used for transfection. The issue I'm running into now is that the dispersion in the RNA-seq replicates will be much higher than the dispersion in the plasmid library prep replicates. I know that DESeq originally calculated dispersion for each condition separately. My question is:

Does DESeq2 still calculate dispersion for each condition separately, such that the low dispersion of my plasmid reps will not artificially lower the overall dispersion?

I would greatly appreciate any help you might be able to provide. Thanks a bunch!

-Dustin

deseq2 • 3.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 12 hours ago
United States

DESeq2 calculates a single dispersion value for each gene. This means that if one group has a higher dispersion value than the other, the gene-wise estimate will be somewhere in the middle. Remember, then that in the last step, information is shared across all genes to moderate dispersion estimates toward the trend for genes with similar mean (see DESeq2 paper).

I would guess that having the dispersion value in the middle is not a big issue for sensitivity and specificity. There may be a small gain for modeling each condition with its own dispersion, but the big gain in performance for DE methods comes from sharing information about dispersion across genes.

 

ADD COMMENT
0
Entering edit mode

You can estimate dispersions for each group separately (build a DESeqDataSet for each group with design ~1 and estimateDispersions). You could then compare these gene-wise estimates ( mcols(dds.sub)$dispGeneEst ) to see how different they really are. You could also compare using the overall dispersion estimate with using the dispersion estimate from the group that you suspect has higher dispersion. Note you can do:

dispersions(dds) <- dispersions(dds.sub)
​dds <- nbinomWaldTest(dds)
ADD REPLY
0
Entering edit mode

Hi! I'm facing the same problem, and I managed to get the dispersions for each condition and see that indeed they are different for many genes. I would like to run a DEG test with the two dispersions. How do I do that? I couldn't understand what you mean by dds.sub, I do not have "sub" in dds (I'm new to R)

Thanks!

ADD REPLY
0
Entering edit mode

"There may be a small gain for modeling each condition with its own dispersion"

--> If you expect broad differences in gene expression between samples (e.g., healthy embryo vs sick liver?), don't you think the gain in power would be substantial?

ADD REPLY
0
Entering edit mode

Broad differences in dispersion (CV) you mean? Again, it's not assuming constant variance:

boxplot(rnbinom(100, mu=rep(c(5,100),each=50), size=1/.1) ~ factor(rep(1:2,each=50)))
ADD REPLY
0
Entering edit mode
dsg16 ▴ 20
@dsg16-10138
Last seen 8.6 years ago

Thanks so much for confirming this, and for responding so quickly! I will do comparisons between the different methods you suggested to see what I am working with.

ADD COMMENT

Login before adding your answer.

Traffic: 708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6