GSVA package scores are opposite on mac and windows
2
1
Entering edit mode
@jasonl_weirather-14107
Last seen 7.2 years ago

Just wanted to describe a bug I'm getting for the GSVA package.

This little code bit should calculate a little TLR4 signaling pathway on the dataset.

rm(list=ls())
library(GSVA)
library(leukemiasEset)
data(leukemiasEset)
dat <- exprs(leukemiasEset)
TLR4 = 'ENSG00000136869'
MYD88 = 'ENSG00000172936'
plot(x=dat[TLR4,],y=dat[MYD88,])
gset = list(LPS=c('ENSG00000136869','ENSG00000172936'))
pathway <- gsva(dat, gset)$es.obs
plot(x=dat[TLR4,],y=pathway['LPS',])

But this bit of code produces opposite GSVA scores when computed on mac and windows.

Mac was done on R 3.4.2 with Bioconductor 3.6, and Windows was R 3.4.1 with Bioconductor 3.6.

You can see the Windows correlation looks right, but Mac is the opposite.

Windows:

Windows is correct

And on Mac OS the same bit of code is this:

Mac OS is wrong

I'm not sure how to page the package owner on this forums or if there is a more appropriate place for this post.  But it should be of general interest to anyone who sees that their GSVA results on Mac are opposite of what they expect.. positive and negative switch.

Thanks!

Jason

 

Edit: I've sent an email to the package maintainer to let them know of the problem.

 

GSVA bug gsva bugs • 2.8k views
ADD COMMENT
0
Entering edit mode

The Bioconductor version (3.6) for R 3.4.2 is not released yet. You seem to be aware that it is on development but the package might change before the release to be consistent between OS.

ADD REPLY
1
Entering edit mode
@peter-langfelder-4469
Last seen 5 weeks ago
United States

I would guess that SVA uses singular value decomposition (SVD); in SVD, the sign of the singular vectors is arbitrary and may change between implementations. Here's a quote from help(svd):

     Recall that the singular vectors are only defined up to sign (a
     constant of modulus one in the complex case).  If a left singular
     vector has its sign changed, changing the sign of the
     corresponding right vector gives an equivalent decomposition.

 

 

 

 

ADD COMMENT
0
Entering edit mode

hi, the function 'gsva()' uses by default the GSVA method introduced in the GSVA paper, but it does provide the user the possibility to calculate right-singular vectors as summaries of pathway expression by setting the argument 'method="plage". I've tried that and indeed the resulting values are inversely correlated to the x-axis but in both, linux and macOS sierra, platforms, so there's no discrepancy between the platforms.

ADD REPLY
0
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 4 days ago
Barcelona/Universitat Pompeu Fabra

Dear Jason,

thanks for your message. I don't have access to a windows machine but I do have an apple laptop with macOS Sierra (10.12.6) and i cannot reproduce the problem in the release nor in the development version. i get exactly the same numbers in macOS and in linux. I'm providing here below my session information where I have omitted the linux session information to meet the maximum allowed characters per answer (5000 characters). Please check that you are using the latest release version of GSVA in both platforms and verify whether there are different versions of the software between my session information and yours.

cheers,

robert.

** macOS sierra release sessionInfo():

R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] leukemiasEset_1.12.0 Biobase_2.36.2       BiocGenerics_0.22.0
[4] GSVA_1.24.2          BiocInstaller_1.26.1 nvimcom_0.9-28
[7] colorout_1.1-2

loaded via a namespace (and not attached):
 [1] graph_1.54.0         Rcpp_0.12.13         AnnotationDbi_1.38.2
 [4] IRanges_2.10.4       bit_1.1-12           xtable_1.8-2
 [7] rlang_0.1.2          blob_1.1.0           tools_3.4.0
[10] DBI_0.7              bit64_0.9-7          digest_0.6.12
[13] tibble_1.3.4         GSEABase_1.38.2      S4Vectors_0.14.6
[16] bitops_1.0-6         RCurl_1.95-4.8       memoise_1.1.0
[19] RSQLite_2.0          compiler_3.4.0       stats4_3.4.0
[22] XML_3.98-1.9         annotate_1.54.0

** macOS sierra devel sessionInfo():

R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin16.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /opt/R/R-3.4.0_BioCdevel/lib/R/lib/libRblas.dylib
LAPACK: /opt/R/R-3.4.0_BioCdevel/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] leukemiasEset_1.13.0 Biobase_2.37.2       BiocGenerics_0.23.2
[4] GSVA_1.25.6          nvimcom_0.9-28       colorout_1.1-2

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13                lattice_0.20-35
 [3] prettyunits_1.0.2           Rsamtools_1.29.1
 [5] Biostrings_2.45.4           assertthat_0.2.0
 [7] digest_0.6.12               mime_0.5
 [9] BiocFileCache_1.1.16        R6_2.2.2
[11] GenomeInfoDb_1.13.4         stats4_3.4.0
[13] RSQLite_2.0                 httr_1.3.1
[15] zlibbioc_1.23.0             rlang_0.1.2
[17] GenomicFeatures_1.29.11     progress_1.1.2
[19] lazyeval_0.2.0              annotate_1.55.0
[21] blob_1.1.0                  S4Vectors_0.15.10
[23] Matrix_1.2-11               shinythemes_1.1.1
[25] BiocParallel_1.11.9         geneplotter_1.55.0
[27] stringr_1.2.0               RCurl_1.95-4.8
[29] bit_1.1-12                  biomaRt_2.33.4
[31] shiny_1.0.5                 DelayedArray_0.3.21
[33] compiler_3.4.0              httpuv_1.3.5
[35] rtracklayer_1.37.3          Organism.dplyr_1.3.2
[37] pkgconfig_2.0.1             htmltools_0.3.6
[39] SummarizedExperiment_1.7.10 tibble_1.3.4
[41] GenomeInfoDbData_0.99.1     IRanges_2.11.17
[43] matrixStats_0.52.2          XML_3.98-1.9
[45] dplyr_0.7.4                 dbplyr_1.1.0
[47] GenomicAlignments_1.13.6    bitops_1.0-6
[49] rappdirs_0.3.1              grid_3.4.0
[51] xtable_1.8-2                GSEABase_1.39.4
[53] DBI_0.7                     AnnotationFilter_1.1.9
[55] magrittr_1.5                graph_1.55.0
[57] stringi_1.1.5               XVector_0.17.1
[59] bindrcpp_0.2                RColorBrewer_1.1-2
[61] tools_3.4.0                 bit64_0.9-7
[63] glue_1.1.1                  AnnotationDbi_1.39.3
[65] GenomicRanges_1.29.14       memoise_1.1.0
[67] bindr_0.1
ADD COMMENT
0
Entering edit mode

hi again, sorry i didn't realize you were talking about the development version. I've edited my answer to address the question with respect to the development version. In any case, it is always recommended to use the release version.

ADD REPLY
0
Entering edit mode

Hi Robert.  Thanks for the reply, I'll post up some Session Info.  I got the same bug on both release and development versions of Bioconductor, so that doesn't seem to be the issue.  Sierra 10.12.6 is the OS version. Sorry you couldn't reproduce it. Based on a comment above implicating the SVD function, my colleague suggested that it may be a difference in the underlying Fortran libraries between yours and my system and perhaps a bug in the assumptions the GSVA package is making about how the functions its calling are behaving.  But yeah, the problem is there regardless of version, and updating all versions to the most current did not fix it. This is a days old system, and R was installed through homebrew if that helps you track down anything.  If you need to know any other info about my libraries or anything please let me know.  Its a great package and I appreciate the help!

ADD REPLY
0
Entering edit mode

Session info three:

[57] stringi_1.1.5               XVector_0.17.1             

[59] bindrcpp_0.2                RColorBrewer_1.1-2         

[61] tools_3.4.2                 bit64_0.9-7                

[63] glue_1.1.1                  AnnotationDbi_1.39.3       

[65] GenomicRanges_1.29.14       memoise_1.1.0              

[67] bindr_0.1                  

 

ADD REPLY
0
Entering edit mode

Hi Jason, thanks for posting the session information. According to it, you are using the very latest GSVA development version 1.25.6. However, under that version, the code posted above would not work because in the current devel version we have deprecated the way in which scores calculated by the default method="gsva" are returned. Until now, and this is how the current release version works, scores were returned within a list, in an element called 'es.obs', but in the current devel version with default parameters this list is not returned anymore, and the function returns a matrix, except when the input expression data is an 'ExpressionSet' object because then it returns an 'ExpressionSet' object with the GSVA scores of gene sets by samples. This means, that the session information you have pasted does not correspond to the code that in principle reproduces the problem, which I have not been able to actually reproduce. I strongly recommend you to work with the release version of GSVA, which in about three weeks will be replaced by the current devel version. Regarding your comment about SVD calculations and underlying Fortran libraries. You code in principle is using the default GSVA method and this method does not do any SVD calculations nor uses any underlying Fortran libraries. One way forward for you would be to make a fresh installation of the current release version of R and a fresh installation of the current release version of GSVA and see whether the problem disappears.

ADD REPLY
0
Entering edit mode

Hi Robert, I saw there has been some changes to the way outputs are delivered. As I mentioned before I was working with the release version of Bioconductor and the eSet as well as the devel version.  The problem exists on both.  I took your advice and did a complete wipe of all packages and I did a fresh install of Bioconductor the release version 3.5, and the problem persists. 

R version 3.4.2 (2017-09-28)

Platform: x86_64-apple-darwin16.7.0 (64-bit)

Running under: macOS Sierra 10.12.6

 

Matrix products: default

BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib

LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

 

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 

attached base packages:

[1] parallel  methods   stats     graphics  grDevices utils     datasets

[8] base     

 

other attached packages:

[1] leukemiasEset_1.12.0 Biobase_2.36.2       BiocGenerics_0.22.0

[4] GSVA_1.24.2         

 

loaded via a namespace (and not attached):

[1] Rcpp_0.12.13         IRanges_2.10.4       XML_3.98-1.9        

[4] digest_0.6.12        bitops_1.0-6         xtable_1.8-2        

[7] DBI_0.7              GSEABase_1.38.2      stats4_3.4.2        

[10] RSQLite_2.0          graph_1.54.0         rlang_0.1.2         

[13] annotate_1.54.0      blob_1.1.0           S4Vectors_0.14.6    

[16] tools_3.4.2          bit64_0.9-7          RCurl_1.95-4.8      

[19] bit_1.1-12           compiler_3.4.2       AnnotationDbi_1.38.2

[22] memoise_1.1.0        tibble_1.3.4

 

ADD REPLY
0
Entering edit mode

hi, the only difference i see is that you use R 3.4.2 and i was using 3.4.0. i've updated R in my macOS sierra to 3.4.2 and run again the code. unfortunately i cannot reproduce the problem, i get exactly the same numbers as with linux. i don't know what can i do about it if i cannot reproduce the error, i hope somebody else has a hint. please find below my new session information, just in case.

cheers,

robert.

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] leukemiasEset_1.12.0 Biobase_2.36.2       BiocGenerics_0.22.0
[4] GSVA_1.24.2          colorout_1.1-2

loaded via a namespace (and not attached):
 [1] graph_1.54.0         Rcpp_0.12.13         AnnotationDbi_1.38.2
 [4] IRanges_2.10.4       bit_1.1-12           xtable_1.8-2
 [7] rlang_0.1.2          blob_1.1.0           tools_3.4.2
[10] DBI_0.7              bit64_0.9-7          digest_0.6.12
[13] tibble_1.3.4         GSEABase_1.38.2      S4Vectors_0.14.6
[16] bitops_1.0-6         RCurl_1.95-4.8       memoise_1.1.0
[19] RSQLite_2.0          compiler_3.4.2       stats4_3.4.2
[22] XML_3.98-1.9         annotate_1.54.0
ADD REPLY
2
Entering edit mode

There is an unfortunate regression in R-3.4.2 (BiocGenerics built with R 3.4.2 changes behaviour) that might contribute; my hunch is that the regression appears when packages are installed under R-3.4.2, so just updating might not be enough to tickle this. The culprit is typically *apply-like calls to functions that dispatch on ... that have been made generics, this includes in BiocGenerics rbind, cbind, pmax, pmin, pmax.int, pmin.int, Map, mapply, order, paste, table, but could be other functions in dependent packages.

ADD REPLY
0
Entering edit mode

Thanks Martin, then @Jason one possible way forward for you would be to install the former release version of R 3.4.1 and install the BioC packages on that version.

ADD REPLY
0
Entering edit mode

Thats so much for taking a deeper look into it. I can work around through executing the GSVA snipits in dockers for now.  GSVA is a favorite in our lab, I hope either a more clear way to reproduce the bug comes around or perhaps this will just be confined to this particular build of R.  If people aren't sanity checking their data this could cause some problems.  I wouldn't have found it so quickly if I hadn't run half of it on a linux cluster and half on mac os and seen there were a bunch of unexpected opposite correlations when I did the rbind on the two.

ADD REPLY
1
Entering edit mode

I looked a little deeper, and think the culprit is gsva.R:548

 sort.sgn.idxs <- apply(gene.density, 2, order, decreasing=TRUE) # n.genes * n.samples

where the bug is manifest by ignoring the 'decreasing=TRUE' argument. I believe the fix will use BiocGenerics >= 0.22.1 for release, or >= 0.23.3 for devel as described A: BiocGenerics built with R 3.4.2 changes behaviour. The build system is still recovering from this, and BiocGenerics of the appropriate version is not yet (7 October 2017) available via biocLite() across all platforms. I'm not sure whether GSVA needs to be re-installed after BiocGenerics.

ADD REPLY
0
Entering edit mode

The session info one:

R version 3.4.2 (2017-09-28)

Platform: x86_64-apple-darwin16.7.0 (64-bit)

Running under: macOS Sierra 10.12.6

 

Matrix products: default

BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib

LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

 

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 

attached base packages:

[1] parallel  methods   stats     graphics  grDevices utils     datasets

[8] base     

 

other attached packages:

[1] leukemiasEset_1.13.0 Biobase_2.37.2       BiocGenerics_0.23.2

[4] GSVA_1.25.6         

 

 

ADD REPLY
0
Entering edit mode

The session info two (sorry this forum wouldn't let me put it in one message):

loaded via a namespace (and not attached):

[1] Rcpp_0.12.13                lattice_0.20-35            

[3] prettyunits_1.0.2           Rsamtools_1.29.1           

[5] Biostrings_2.45.4           assertthat_0.2.0           

[7] digest_0.6.12               mime_0.5                   

[9] BiocFileCache_1.1.16        R6_2.2.2                   

[11] GenomeInfoDb_1.13.4         stats4_3.4.2               

[13] RSQLite_2.0                 httr_1.3.1                 

[15] zlibbioc_1.23.0             rlang_0.1.2                

[17] GenomicFeatures_1.29.11     progress_1.1.2             

[19] lazyeval_0.2.0              annotate_1.55.0            

[21] blob_1.1.0                  S4Vectors_0.15.10          

[23] Matrix_1.2-11               shinythemes_1.1.1          

[25] BiocParallel_1.11.9         geneplotter_1.55.0         

[27] stringr_1.2.0               RCurl_1.95-4.8             

[29] bit_1.1-12                  biomaRt_2.33.4             

[31] shiny_1.0.5                 DelayedArray_0.3.21        

[33] compiler_3.4.2              httpuv_1.3.5               

[35] rtracklayer_1.37.3          Organism.dplyr_1.3.2       

[37] pkgconfig_2.0.1             htmltools_0.3.6            

[39] SummarizedExperiment_1.7.10 tibble_1.3.4               

[41] GenomeInfoDbData_0.99.1     IRanges_2.11.17            

[43] matrixStats_0.52.2          XML_3.98-1.9               

[45] dplyr_0.7.4                 dbplyr_1.1.0               

[47] GenomicAlignments_1.13.6    bitops_1.0-6               

[49] rappdirs_0.3.1              grid_3.4.2                 

[51] xtable_1.8-2                GSEABase_1.39.4            

[53] DBI_0.7                     AnnotationFilter_1.1.9     

[55] magrittr_1.5                graph_1.55.0               

ADD REPLY

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6