Hello,
I'm performing differential gene expression analysis on my bacterial RNA samples and I'm having some troubles exporting the results. I'm using DESeq2 version 1.30.0 and R 4.0.3.
I've generated my results following the vignette's workflow. Briefly put, this is my last chunk of code:
> dds <- DESeq(dds)
> res <- results(dds, alpha = 0.05, lfcThreshold = 1)
> summary(res)
out of 5546 with nonzero total read count
adjusted p-value < 0.05
LFC > 1.00 (up) : 29, 0.52%
LFC < -1.00 (down) : 7, 0.13%
outliers [1] : 0, 0%
low counts [2] : 2258, 41%
(mean count < 38)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
So with the filters I've set (padj < 0.05 and lfcThreshold = 1) 36 genes in total are differentially expressed. I want to export in a CSV file the results data frame of only these 36 DE genes, but after running:
> resFilt <- res[which(res$padj < 0.05 & abs(res$log2FoldChange) > 1), ]
> write.csv(resFilt, file="DE_results_filtered.csv")
the resulting csv has 202 rows. Is this because creating a subset of the results doesn't account for the independent filtering? How can I export the results of only those 36 DE genes?
Thank you in advance!
Thanks! Although I don't have NAs in the output CSV. Please, could you specify in which step should I remove NAs?
I'd recommend to work in R and figuring things out there before the CSV.
You should have concordance between the summary table and these sums, is that the case?
Sorry for the late reply. The results of the summary table and the sums after removing NAs are indeed the same:
To simplify things I'm trying to first filter by padj to see if I can subset only those genes without the NAs. So with a padj < 0.05 there are 200 DE genes in total:
If I subset without removing NAs I obtain 203 DE genes. If I try to remove NAs with
res = na.omit(res)
and then subsetting withresFilt <- subset(res, padj < 0.05)
I also obtain 203 genes, so NAs are still in my output.I have tried other ways of removing NAs inside the subset function, such as
resFilt <- subset(res, !is.na(padj < 0.05))
, but I always end up with this output in the summary table:With the same number of DE genes as in the res unfiltered object, but with less rows.
I'm sure there is a correct way of removing the NAs for subsetting the filtered results, but I can't seem to find it. Any help will be greatly appreciated :)
I'm a little lost, so from the top, you have the right numbers by
summary
andsum
. What about:That gives me 203 rows, but as I showed in the last post, there should be 200 DEGs using a padj < 0.05, so I guess that the result of
resFilt <- res[ which(res$padj < 0.05), ]
still includes NAs (although I don't see any in the CSV).What about
length(which(res$padj < 0.05))
andsum(res$padj < 0.05, na.rm=TRUE)
? I'm confused why you're getting extra rows.I'm also very confused. I'll leave you here the same code as above so you can see again the results easier:
And also including the log2FC cutoff:
The last output is explained because you didn't use
resLFC
.I suppose 3 of the LFC could be set to 0, depending on the design and the test (I can't see this information above). Can you show the design, how you ran
DESeq()
and how you ranresults()
?Finally got it! Turns out I was trying to subset using the res unfiltered object like
subset(res, padj < 0.05)
instead of the filtered object (such as resLFC in the last chunk of code) likesubset(resLFC, padj < 0.05)
. Sorry for the confusion and thank you so much for the help, your last comment was what clicked for me.