Hi all,
I have a bit of a theoretical question here. I'm using DEseq2 for DGE analysis between controls and a disease group. Within both groups, I have males and females. I've done DGE analysis controlling for sex as a covariate (so as to omit sex-specific drivers) and now I want to repeat the whole analysis while including a contrast for each sex. The code itself isn't a problem, I'd be setting up a contrast like so:
contrast = c("sex_group", "FemaleSA", "FemaleCTRL")
contrast = c("sex_group", "MaleSA", "MaleCTRL")
For the analysis that omitted sex differences, I opted to filter out genes with less than 20 counts in 90% of subjects. My results are informative. Am I obligated to use the exact same filtering step when redoing the analysis (with sex differences considered)? Seems like 20 counts in 90% of subjects might be too stringent. Can I find a better filtering step (and use the same step for the male analysis versus female analysis), while keeping the steps done in my first analysis as they are?
Thanks!
Thank you so much for your feedback!
Since I'm mildly surprised it's not an issue, I might as well ask you about a second concern that I think/thought I had the correct answer to.
The disease I'm looking at affects both sexes but affects a greater number of males - hence there's more control males and diseased males in my cohort than any females because we're quite literally limited in female samples. Because of this, I opted for the contrasts I stated above.
However, I know in your DEseq2 vignette that a more sophisticated approach is to use an interaction term like so (only showing relevant code steps):
Should I avoid this interaction approach when raising the question of sex differences, given the unequal numbers of males and females? I have 17 diseased males, 14 control males, 7 diseased females and 8 control females.
Thank you in advance!
It's hard to say if you're well powered enough to detect the sex-specific disease effects. Maybe consult with a Biostatistician.
Fair point, thanks again!