Hello all,
This post is to report an issue I have discovered, not to ask for assistance or help.
Today I found an issue with the GEOquery when you upgrade the readr package to 1.2.1. I have posted the issue on the readr GitHub issue page (https://github.com/tidyverse/readr/issues/925)
Some of my code had stopped working after upgrading readr from 1.1.1 to 1.2.1. I took me a good couple of hours to identify and test this issue. The upgrade to readr 1.2.1 resulted in a parsing error when using GEOquery::getGEO. Column names were no longer parsed correctly.
I have confirmed this issue on Windows and Linux, using different R versions (see github post)
Example of this error on my Ubuntu 16 machine:
> test <- GEOquery::getGEO('GSE76885', GSEMatrix = T, AnnotGPL=TRUE)
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
Found 1 file(s)
GSE76885_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76885/matrix/GSE76885_series_matrix.txt.gz'
Content type 'application/x-gzip' length 10006900 bytes (9.5 MB)
==================================================
downloaded 9.5 MB
Parsed with column specification:
cols(
.default = col_double(),
A_23_P100001 = col_character()
)
See spec(...) for full column specifications.
File stored at:
/tmp/RtmpJ5L0WL/GPL6480.annot.gz
Warning message:
Duplicated column names deduplicated: '-0.615' => '-0.615_1' [65], '-0.267' => '-0.267_1' [95], '-0.303' => '-0.303_1' [96], '-0.105' => '-0.105_1' [101], '0.089' => '0.089_1' [107], '0.146' => '0.146_1' [110], '-0.184' => '-0.184_1' [124], '-0.45' => '-0.45_1' [149], '-0.16' => '-0.16_1' [154], '-0.047' => '-0.047_1' [155], '0.019' => '0.019_1' [157], '-0.074' => '-0.074_1' [158], '-0.113' => '-0.113_1' [159], '0.009' => '0.009_1' [168], '-0.149' => '-0.149_1' [170], '-0.085' => '-0.085_1' [175], '0.096' => '0.096_1' [176], '-0.281' => '-0.281_1' [177], '-0.096' => '-0.096_1' [178], '0.248' => '0.248_1' [179], '-0.308' => '-0.308_1' [181], '-0.22' => '-0.22_1' [190], '-0.306' => '-0.306_1' [195]
> Biobase::sampleNames(test[[1]])[1:10]
[1] "0.04" "-0.173" "-0.288" "0.089" "-0.227" "-0.254" "-0.184" "0.453"
[9] "0.264" "-0.179"
>
The result of the last command should be [1] "GSM2039774" "GSM2039775" "GSM2039776" "GSM2039777" "GSM2039778"
My session info output:
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 tidyr_0.8.2 crayon_1.3.4
[4] dplyr_0.7.8 assertthat_0.2.0 R6_2.3.0
[7] magrittr_1.5 pillar_1.3.0 stringi_1.2.4
[10] rlang_0.3.0.1 curl_3.2 limma_3.34.9
[13] xml2_1.2.0 tools_3.4.4 readr_1.2.1
[16] Biobase_2.38.0 glue_1.3.0 purrr_0.2.5
[19] hms_0.4.2 parallel_3.4.4 compiler_3.4.4
[22] BiocGenerics_0.24.0 pkgconfig_2.0.2 bindr_0.1.1
[25] tidyselect_0.2.5 tibble_1.4.2 GEOquery_2.46.15
Please note, I have confirmed this behavior with GEOquery 2.50.0 on Windows as well. Reverting to readr 1.1.1 resolves the issue.