Hi! I am performing differential expression analysis and when I created the dds model and I got the following message:
txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
txi.rsem$length[txi.rsem$length == 0] <- 1
########Creation of dds model
ddsTxi <- DESeqDataSetFromTximport(txi.rsem, colData = samples_selected, design = ~ condition)
using counts and average transcript lengths from tximport
ddsTxi$condition <- relevel(ddsTxi$condition, ref = cond2)
dds <- DESeq(ddsTxi)
estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 7581 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
summary(res)
out of 47011 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 1, 0.0021%
LFC < 0 (down) : 0, 0%
outliers [1] : 2008, 4.3%
low counts [2] : 191, 0.41%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
I was wondering if I can keep the results that I obtained considering only the p-value, or it is better to get a more reliable result to perform the analysis modifying the minReplicatesForReplace option?
Thanks,
Concetta
Thank you! I have removed from the dds sub-optimal genes and I re-created the dds model and I did not get the message
replacing outliers and refitting for 7581 genes
. However when I ransummary(res)
I got this results:Considering that I did not get any message when I ran DESeq(dds), I did not understand if now I can trust my model and the results.
Thank you.
Concetta
So there are still many outliers, which I would encourage you to look at with
plotCounts
.I looked the plotCounts but I noticed that looking at different genes I have every time different samples that behave as outliers samples. What do you suggest to do?
If you think that, from those plots, the genes do contain outliers and are likely not exhibiting DE signal, then you can go with the results table above (the one with 1,795 outliers). If you think that there is DE but it is not being found, I would look at the PCA plot and see if you think that the biological variability within groups is smaller than across groups.