Hi,
Wondering how I can modify the following DESeq2 design to extract subcategories, for instance binding events differentially enriched in only Male Vehicle over others, as currently it is set to give me sex differential peaks from the ChIP-seq dataset
In addition, would this be possible with the current input matrix design that I have outlined below?
sampleNames<-c("MBV1","FBV1", "MBE7, "FBE3") sampleSex<-c("male", "female", "male", "female") sampleTreatment<-c("vehicle", "vehicle", "BB", "BB") colData<-data.frame(sampleName=sampleNames, sex=sampleSex, treatment=sampleTreatment) dds<-DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ treatment + sex)
Thank you!
Actually I have considerably more samples and did not include them in the code given an error from Bioconductor about inappropriate language usage, from long string of characters. Hopefully it works in this reply.
So it looks something more like what I have above, can you give some advice about the design I was interested in? Thank you very much!
Then you can use ~sex + sex:treatment to extract sex-specific treatment effects. You can pull them out and build a results table using the 'name' argument of results().
Exactly what I was after, thank you!
The one thing I am a little confused about is here I can tell how to extract the "sexmale.treatmentBB" and "sexfemale.treatmentBB" for this particular design as I want the peaks specific to all four individual conditions (e.g. Male vehicle, male BB, female Vehicle, female BB)
Thanks!
I think a point of confusion here is that DESeq2 performs comparisons on counts, it doesn’t do what you describe. You have four conditions so you can compare B vs A and D vs C and then also the contrast of those two. Regarding which level is chosen as the reference level, see the vignette section about releveling factors.
Right so all I am trying to do is perform analysis on counts where I can get differential calls on for instance all samples that are both male and BB (treatment), so you're indicating that DESeq2 can't take the counts from those specific samples relative to all others to get a differential signature?
Not really. It’s not clear to me how best to compare A vs others. You can compare A vs the average of others, but I’m not big fan of this approach because the average of multiple groups is an artificial point. You can still get a LFC of 0 when the level of A is clearly distinct from all other groups (if the average LFCs of A vs each other group is 0). I’d prefer to do three pairwise comparisons and merge the FDR sets.
Ok I understand what you're saying in this regard. So one question is, I did the following:
What this did in terms of identifying specific binding counts is show me instances where, for either male vehicle or male BB, there are very low counts compared with the rest of the binding matrix which includes female vehicle and BB. I guess what I wanted was the opposite where it shows me the higher counts for one triplet of replicates (e.g. Male BB) vs the 3 other conditions. So it seems like it worked in the way you describe however doing the opposite. Does that make sense? Thanks again for your input!
Then use coefficient averaging:
Use a design of ~0 + group
Then results(dds, contrast=c(1, -1/3, -1/3, -1/3)), etc.
Ok will give it a try, thanks! I guess I was not representing to you that there is not necessarily a covariate here with separate counts but rather combinations of conditions/sex such as Male BB which are actually representative of a single count column, or rather of 3 replicate count columns. So the group makes sense.
A bit of a tangential question, but do you have any suggestions on how to effectively visualize results of something like this comparison, basically some kind of 4-way (dimension) graph that shows differential peaks unique to each of the 4 categories? Thanks!
Humans are best at seeing two dimensions at a time. I'd say the best way to look at counts across multiple groups and multiple significant genes is a heatmap, and we have some code for that in the vignette, and there are even more complex packages for building multi-heatmap diagrams, e.g. ComplexHeatmap.
Thanks for the tip! The code in the vignette refers to doing this with raw counts and I was more interested in doing this with output of results() such as from:
results(dds, contrast=c(1, -1/3, -1/3, -1/3))
With all four possible coefficients plotted. Thanks again!
Also see the vignette section on LFC thresholds, on how to select only positive LFC.
In your experience is there a good approach to picking this positive LFC threshold? I know its experiment specific, but just curious about your thoughts. Thanks.
I only recommended it here because you explicitly asked for it above:
"what I wanted was the opposite where it shows me the higher counts for one triplet of replicates (e.g. Male BB) vs the 3 other conditions"