How to exactly replicate the results from plotMA in DESeq2 ?
1
0
Entering edit mode
Yvan Wenger ▴ 50
@yvan-wenger-5608
Last seen 6.8 years ago

Hello everyone,

I was comparing the graph generated by the function plotMA() from DESeq2, with the graph that I generated supposing

I extracted the data with this function:

ds2data <- as.data.frame(results(dds,cooksCutoff = FALSE))

and plotted the result with ggplot2 using similar margins and color code. The points seem to be at the same positions, however, the colors are sometimes not matching (there are for example two red and three grey triangles on the lower boundary, while there are three red and two grey triangles on my plot).

Does this come to some additional parameters passed to results() ? Or some tweaking in plotMA()?

Best,

Yvan

The images are:

plotMA with DESeq2: https://ibb.co/c2v0tk

My plot based on the results() function above: https://ibb.co/bJw6Yk

 

R version 3.4.1 (2017-06-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 14.04.5 LTS

Matrix products: default

BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0

LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   

 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:

 [1] bindrcpp_0.2               tidyr_0.7.1                DESeq2_1.16.1              SummarizedExperiment_1.6.3 DelayedArray_0.2.7        

 [6] matrixStats_0.52.2         Biobase_2.36.2             GenomicRanges_1.28.5       GenomeInfoDb_1.12.2        IRanges_2.10.3            

[11] S4Vectors_0.14.4           BiocGenerics_0.22.0        gplots_3.0.1               dplyr_0.7.3                ggplot2_2.2.1             

[16] BiocInstaller_1.26.1      

loaded via a namespace (and not attached):

 [1] bit64_0.9-7             splines_3.4.1           gtools_3.5.0            Formula_1.2-2           assertthat_0.2.0       

 [6] latticeExtra_0.6-28     blob_1.1.0              GenomeInfoDbData_0.99.0 RSQLite_2.0             backports_1.1.0        

[11] lattice_0.20-35         glue_1.1.1              digest_0.6.12           RColorBrewer_1.1-2      XVector_0.16.0         

[16] checkmate_1.8.3         colorspace_1.3-2        htmltools_0.3.6         Matrix_1.2-11           plyr_1.8.4             

[21] XML_3.98-1.9            pkgconfig_2.0.1         genefilter_1.58.1       zlibbioc_1.22.0         purrr_0.2.3            

[26] xtable_1.8-2            scales_0.5.0            gdata_2.18.0            BiocParallel_1.10.1     htmlTable_1.9          

[31] tibble_1.3.4            annotate_1.54.0         nnet_7.3-12             lazyeval_0.2.0          survival_2.41-3        

[36] magrittr_1.5            memoise_1.1.0           foreign_0.8-69          tools_3.4.1             data.table_1.10.4      

[41] stringr_1.2.0           locfit_1.5-9.1          munsell_0.4.3           cluster_2.0.6           AnnotationDbi_1.38.2   

[46] compiler_3.4.1          caTools_1.17.1          rlang_0.1.2             grid_3.4.1              RCurl_1.95-4.8         

[51] htmlwidgets_0.9         labeling_0.3            bitops_1.0-6            base64enc_0.1-3         gtable_0.2.0           

[56] DBI_0.7                 R6_2.2.2                gridExtra_2.3           knitr_1.17              bit_1.1-12             

[61] bindr_0.1               Hmisc_4.0-3             KernSmooth_2.23-15      stringi_1.1.5           Rcpp_0.12.12           

[66] geneplotter_1.54.0      rpart_4.1-11            acepack_1.4.1           tidyselect_0.2.0  
deseq2 plotma • 3.2k views
ADD COMMENT
0
Entering edit mode

Hi,

Right. In the graphical examples linked above I used ylim on plotMA to control this.

plotMA(dds, ylim = c(-7.5, 7.5))

But actually the points seems to be at the same positions. What I am not sure about, is why the significances are sometimes different.

Here is what I used for ggplot2 with the data from

ds2data <- as.data.frame(results(dds,cooksCutoff = FALSE))

I only slightly modified ds2data so that abs(log2FoldChange values) does not go beyond ±7.5 (log2ChangeMAX).

ggplot(data = ds2data) +
  geom_point(
    aes(x = baseMean, y = log2FoldChangeMAX),
    size = 1.25,
    color = ifelse(dge_FH$padj < 0.1, "red", "grey20"),
    alpha = 0.5,
    shape = ifelse(abs(dge_FH$log2FoldChange) >= 7.5, 17, 16)
  ) +
  scale_x_log10()

I played a bit with the p-adj threshold (Padj < 0.1, Padj <= 0.1, the vignette says "...colored red if the adjusted p value is less than 0.1" so I assume Padj<0.1 is correct). Anyway, this does not seem to be the cause of the difference because I had to significantly lower the Pajd threshold on my script so that this grey triangle lying against the -7.5 boundary that has a different color between the two plots became unsignificant (grey), by this time, there were many more differences between my plot and the DESeq2::plotMA result.

Yvan

ADD REPLY
0
Entering edit mode

I would avoid putting another data.frame in the geom_point() call. You could have a few rows shifted/missing which could give wrong coloring. Why not just put padj and log2FoldChange as columns of ds2data?

ADD REPLY
0
Entering edit mode

Ooops my mistake... I changed the names in my code meanwhile and wanted to be consistent with the previous post but missed this one... all my data come from the same data.frame... Here is the full code.

ds2data <- as.data.frame(results(dds,cooksCutoff = FALSE))

ds2data <-
  ds2data %>% dplyr::mutate(
      log2FoldChangeMAX = ifelse(
      abs(log2FoldChange) > 7.5,
      sign(log2FoldChange) * 7.5,
      log2FoldChange
    )
  )

ggplot(data = ds2data) +
  geom_point(
    aes(x = baseMean, y = log2FoldChangeMAX),
    size = 1.25,
    color = ifelse(ds2data$padj < 0.1, "red", "grey20"),
    alpha = 0.5,
    shape = ifelse(abs(ds2data$log2FoldChange) >= 7.5, 17, 16)
  ) +
  scale_x_log10()

as compared with

plotMA(dds, ylim = c(-7.5, 7.5))
ADD REPLY
1
Entering edit mode

That's not the recommended way of using ggplot() and I don't have much time to help debug here.

You shouldn't be using "$" in the ggplot() call or geoms, all the information should be in the data that is provided to ggplot()

ADD REPLY
0
Entering edit mode

Dear Dr. Love, many thanks for the advice on how to use ggplot... however, the discrepancy does not come from it as it clearly reflects the dataset generated by DESeq2::results() using the cooksCutoff = F option that seems to be used also in plotMA()

I was in fact more asking about the details of the call of results() that is used to produce the MA-plot, but how to explore S4 objects is obscure to me, if anyone else from the community has a suggestion, it would be great.

ADD REPLY
0
Entering edit mode

Oh I see now that you didn't call results() when you use plotMA(). 

I recommend using:

res <- results(dds, ...)

and then

plotMA(res)

so you can control exactly how the results table is generated. This is the way we show plotMA in the user guides. Sorry I didn't catch if this was the intent of your original question.

ADD REPLY
0
Entering edit mode

Many thanks! I get now perfectly identical plots if I use the result of results() into plotMA or into my ggplot2 plot.

Best.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Code?

If you want to compare the points on the boundary, you'll need to set the ylim to be exactly equal, right?

ADD COMMENT

Login before adding your answer.

Traffic: 567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6