Hello,
I am running DESeq2_1.10.1 on a data set with a continuous predictor term (either with or without a categorical blocking term).
My understanding is that in such cases, outlier detection and replacement is not automatically applied, and instead it's necessary to conduct a manual inspection of Cook's distances. I base this on section 3.6 of the Nov 30 2016 version of the DESeq2 vignette www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf)
However, when I run the DESeq function, I see the following message:
- fitting model and testing
- -- replacing outliers and refitting for 361 genes
- -- DESeq argument 'minReplicatesForReplace' = 7
- -- original counts are preserved in counts(dds)
My question, then, is what actually is happening here? Am I looking at a copy of the vignette that is out of date? Is my analysis carrying out the outlier replacement procedure even though it's not optimal for continuous predictors? Or am I misinterpreting the message entirely?
Thank you in advance! I can provide more details about my DESeqDataSet object if helpful.
Cameron
The covariate is a disease phenotype: the proportion of afflicted individuals per inbred strain, with each RNAseq sample corresponding to a strain. Many samples are zero—in light of your comment, I suppose that this may be the source of the problem. I've pasted the covariate below (after a log(x+c) transformation).
I've also carried out the analysis with the data downgraded to binary (zero vs nonzero). Should I stick with the downgraded data and avoid using the continuous data given its unusual distribution?
Thanks!
There's not necessarily a problem then. The outlier replacement procedure can run on this dataset, because there is repetition in the continuous values.
You may choose to turn it off if you feel it's not helpful, by setting minReplicatesForReplace=Inf.
I wouldn't make modeling choices (continuous vs binary) based on this outlier procedure. It usually is just picking up on a number of genes with all 0's but then one or two samples have technical artifacts.
Okay, great—thanks kindly for your help!