Hello everyone,
I was comparing the graph generated by the function plotMA() from DESeq2, with the graph that I generated supposing
I extracted the data with this function:
ds2data <- as.data.frame(results(dds,cooksCutoff = FALSE))
and plotted the result with ggplot2 using similar margins and color code. The points seem to be at the same positions, however, the colors are sometimes not matching (there are for example two red and three grey triangles on the lower boundary, while there are three red and two grey triangles on my plot).
Does this come to some additional parameters passed to results() ? Or some tweaking in plotMA()?
Best,
Yvan
The images are:
plotMA with DESeq2: https://ibb.co/c2v0tk
My plot based on the results() function above: https://ibb.co/bJw6Yk
R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS Matrix products: default BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] bindrcpp_0.2 tidyr_0.7.1 DESeq2_1.16.1 SummarizedExperiment_1.6.3 DelayedArray_0.2.7 [6] matrixStats_0.52.2 Biobase_2.36.2 GenomicRanges_1.28.5 GenomeInfoDb_1.12.2 IRanges_2.10.3 [11] S4Vectors_0.14.4 BiocGenerics_0.22.0 gplots_3.0.1 dplyr_0.7.3 ggplot2_2.2.1 [16] BiocInstaller_1.26.1 loaded via a namespace (and not attached): [1] bit64_0.9-7 splines_3.4.1 gtools_3.5.0 Formula_1.2-2 assertthat_0.2.0 [6] latticeExtra_0.6-28 blob_1.1.0 GenomeInfoDbData_0.99.0 RSQLite_2.0 backports_1.1.0 [11] lattice_0.20-35 glue_1.1.1 digest_0.6.12 RColorBrewer_1.1-2 XVector_0.16.0 [16] checkmate_1.8.3 colorspace_1.3-2 htmltools_0.3.6 Matrix_1.2-11 plyr_1.8.4 [21] XML_3.98-1.9 pkgconfig_2.0.1 genefilter_1.58.1 zlibbioc_1.22.0 purrr_0.2.3 [26] xtable_1.8-2 scales_0.5.0 gdata_2.18.0 BiocParallel_1.10.1 htmlTable_1.9 [31] tibble_1.3.4 annotate_1.54.0 nnet_7.3-12 lazyeval_0.2.0 survival_2.41-3 [36] magrittr_1.5 memoise_1.1.0 foreign_0.8-69 tools_3.4.1 data.table_1.10.4 [41] stringr_1.2.0 locfit_1.5-9.1 munsell_0.4.3 cluster_2.0.6 AnnotationDbi_1.38.2 [46] compiler_3.4.1 caTools_1.17.1 rlang_0.1.2 grid_3.4.1 RCurl_1.95-4.8 [51] htmlwidgets_0.9 labeling_0.3 bitops_1.0-6 base64enc_0.1-3 gtable_0.2.0 [56] DBI_0.7 R6_2.2.2 gridExtra_2.3 knitr_1.17 bit_1.1-12 [61] bindr_0.1 Hmisc_4.0-3 KernSmooth_2.23-15 stringi_1.1.5 Rcpp_0.12.12 [66] geneplotter_1.54.0 rpart_4.1-11 acepack_1.4.1 tidyselect_0.2.0
Hi,
Right. In the graphical examples linked above I used ylim on plotMA to control this.
But actually the points seems to be at the same positions. What I am not sure about, is why the significances are sometimes different.
Here is what I used for ggplot2 with the data from
I only slightly modified ds2data so that abs(log2FoldChange values) does not go beyond ±7.5 (log2ChangeMAX).
ggplot(data = ds2data) + geom_point( aes(x = baseMean, y = log2FoldChangeMAX), size = 1.25, color = ifelse(dge_FH$padj < 0.1, "red", "grey20"), alpha = 0.5, shape = ifelse(abs(dge_FH$log2FoldChange) >= 7.5, 17, 16) ) + scale_x_log10()
I played a bit with the p-adj threshold (Padj < 0.1, Padj <= 0.1, the vignette says "...colored red if the adjusted p value is less than 0.1" so I assume Padj<0.1 is correct). Anyway, this does not seem to be the cause of the difference because I had to significantly lower the Pajd threshold on my script so that this grey triangle lying against the -7.5 boundary that has a different color between the two plots became unsignificant (grey), by this time, there were many more differences between my plot and the DESeq2::plotMA result.
Yvan
I would avoid putting another data.frame in the geom_point() call. You could have a few rows shifted/missing which could give wrong coloring. Why not just put padj and log2FoldChange as columns of ds2data?
Ooops my mistake... I changed the names in my code meanwhile and wanted to be consistent with the previous post but missed this one... all my data come from the same data.frame... Here is the full code.
as compared with
That's not the recommended way of using ggplot() and I don't have much time to help debug here.
You shouldn't be using "$" in the ggplot() call or geoms, all the information should be in the data that is provided to ggplot()
Dear Dr. Love, many thanks for the advice on how to use ggplot... however, the discrepancy does not come from it as it clearly reflects the dataset generated by DESeq2::results() using the cooksCutoff = F option that seems to be used also in plotMA()
I was in fact more asking about the details of the call of results() that is used to produce the MA-plot, but how to explore S4 objects is obscure to me, if anyone else from the community has a suggestion, it would be great.
Oh I see now that you didn't call results() when you use plotMA().
I recommend using:
and then
so you can control exactly how the results table is generated. This is the way we show plotMA in the user guides. Sorry I didn't catch if this was the intent of your original question.
Many thanks! I get now perfectly identical plots if I use the result of results() into plotMA or into my ggplot2 plot.
Best.