Dear Michael,
I am hoping to use DESeq2 to analyze the sequencing results of a Multiply Parallel Reporter Assay that I performed. I was wondering if you might be able to help me answer a question that's been stumping me?
In this assay, we transfect cells with a diverse library of plasmids and then perform RNA-seq to assess the expression of each plasmid. My ultimate goal is to determine if a given sequence takes up a greater (or lesser) fraction in the RNA-seq library than it takes up in the plasmid library (and thus was differentially expressed).
I performed 6 biological replicates of RNA-seq (independent transfections), and 6 replicates of sequencing library preparation from the plasmid library that was used for transfection. The issue I'm running into now is that the dispersion in the RNA-seq replicates will be much higher than the dispersion in the plasmid library prep replicates. I know that DESeq originally calculated dispersion for each condition separately. My question is:
Does DESeq2 still calculate dispersion for each condition separately, such that the low dispersion of my plasmid reps will not artificially lower the overall dispersion?
I would greatly appreciate any help you might be able to provide. Thanks a bunch!
-Dustin
You can estimate dispersions for each group separately (build a DESeqDataSet for each group with design ~1 and estimateDispersions). You could then compare these gene-wise estimates (
mcols(dds.sub)$dispGeneEst
) to see how different they really are. You could also compare using the overall dispersion estimate with using the dispersion estimate from the group that you suspect has higher dispersion. Note you can do:Hi! I'm facing the same problem, and I managed to get the dispersions for each condition and see that indeed they are different for many genes. I would like to run a DEG test with the two dispersions. How do I do that? I couldn't understand what you mean by dds.sub, I do not have "sub" in dds (I'm new to R)
Thanks!
"There may be a small gain for modeling each condition with its own dispersion"
--> If you expect broad differences in gene expression between samples (e.g., healthy embryo vs sick liver?), don't you think the gain in power would be substantial?
Broad differences in dispersion (CV) you mean? Again, it's not assuming constant variance: