Hello,
I am fairly new to DESeq2, though I have used it before it was for experiments with more replicates. I am hoping to get advice on how to deal with DE contrasts for this experiment in which I have 15 groups, 11 groups have 3 biological replicates/samples, but 4 groups are more rare and we could only obtain 2 samples.
I am doing 2 types of contrasts: (i) one group vs rest, (ii) pair-wise group vs group. After I would like to compare the set differences and set intersections of the DE genes across certain groups.
Because I have some groups with only 2 samples, I would like to get advice on how to deal with difference in treatment with the flagging of genes based on Cook's distance which works when comparing groups with 3 samples, but not work in other contrasts with groups with 2 samples. (DESeq2 user guide states: "The results function automatically flags genes which contain a Cook’s distance above a cutoff for samples which have 3 or more replicates. The p values and adjusted p values for these genes are set to NA. At least 3 replicates are required for flagging, as it is difficult to judge which sample might be an outlier with only 2 replicates. This filtering can be turned off with results(dds,cooksCutoff=FALSE).")
This leads to genes with adj p-val NA and ignored for the group contrasts with 3 samples in which count outlier are detected which is ideal, but for the contrasts of groups with only 2 samples, no flagging occurs. I do see genes detected to be DE that have large variance within the group with only 2 samples so this is an issue (I would like to exclude these genes). Because I'm also interested to compare genes that are commonly DE across certain groups, this seems to also be a problem as DE selection is different.
Should I turn-off this Cook's filtering (results(dds,cooksCutoff=FALSE)) for all contrasts and apply my own filter afterwards to maintain consistency? Could you advice how to apply this and how to find a threshold? I don't have experience on this. I had thought to leave Cook's filtering on for the contrasts with groups with 3 samples and look at those outliers as reference but it's limited to genes with outliers in those groups.
I have searched through several previous questions, but have not been able to come to an answer that fits my situation. Please excuse if there is a suitable response that I missed, if you could kindly direct me to that also.
Roez
Thank you Michael. I am wondering if I could run all the contrasts as is (ignoring the non-flagging in the 2 sample cases), set my adj. p-val and lfc cut off, but after this DE analysis set a criteria that filters the DE genes further... possibly using the mean vs. difference pairwise-plots of all biological replicates for each gene to find some threshold, and removing genes in all contrasts if exceeds this threshold. Mostly I just want to make sure to exclude genes if the within sample variance is higher that between sample variance in those groups with only 2 samples.
hmm, I don't really follow your proposed filtering rule, and it sounds like it could potentially result in loss of control of false positives. In particular, you can't do this: "exclude genes if the within [group] variance is higher than between [group] variance". This will certainly result in loss of control of false positives.
Sorry for my confusion, thank you for pointing that out. Then I should apply any filtering criteria before controlling for FDR and not after?
Roez