ArrayExpress fails with experiments without array data but could still return parsed .idf and .sdrf
0
0
Entering edit mode
@andrew_mcdavid-11488
Last seen 3 months ago
United States

Some experiments on ArrayExpress only contain phenotypic information because the processed data live elsewhere. In particular, only .idf and .sdrf files might be present, but these files can be useful per se even if adf files are not posted, because ArrayExpress strictly subsumes GEO, so is a more canonical source.  Currently, the package assumes ADF files are present, eg, line 3 of `readPhenoData.`

Repex:

> library(ArrayExpress)
> habib_ae <- getAE('GSE85721')
> pd <- ArrayExpress:::readPhenoData(habib_ae$sdrf, habib_ae$path)
ArrayExpress: Reading pheno data from SDRF
Error in which(sapply(seq_len(nrow(pData(ph))), function(i) all(pData(ph)[i,  : 
  argument to 'which' is not logical
> #Because line 3 of readPhenoData results in an empty AnnotatedDataFrame
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.2

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] ArrayExpress_1.34.0        GEOquery_2.40.0           
 [3] MultiAssayExperiment_1.0.0 SummarizedExperiment_1.4.0
 [5] GenomicRanges_1.26.2       GenomeInfoDb_1.10.2       
 [7] Zeisel2015Data_0.9         rmarkdown_1.3.9002        
 [9] RColorBrewer_1.1-2         Biobase_2.34.0            
[11] stringr_1.1.0              Biostrings_2.42.1         
[13] XVector_0.14.0             IRanges_2.8.1             
[15] S4Vectors_0.12.1           BiocGenerics_0.20.0       
[17] data.table_1.10.4          preprocessData_0.10.3     
[19] knitr_1.15.1               devtools_1.12.0           

loaded via a namespace (and not attached):
 [1] httr_1.2.1            splines_3.3.2         foreach_1.4.3        
 [4] shiny_1.0.0           assertthat_0.1        yaml_2.1.14          
 [7] RSQLite_1.1-2         backports_1.0.5       lattice_0.20-34      
[10] limma_3.30.10         digest_0.6.12         oligoClasses_1.36.0  
[13] colorspace_1.3-2      preprocessCore_1.36.0 htmltools_0.3.5      
[16] httpuv_1.3.3          Matrix_1.2-8          plyr_1.8.4           
[19] XML_3.98-1.5          affxparser_1.46.0     zlibbioc_1.20.0      
[22] xtable_1.8-2          scales_0.4.1          whisker_0.3-2        
[25] affyio_1.44.0         getopt_1.20.0         ff_2.2-13            
[28] optparse_1.3.2        tibble_1.2            pkgmaker_0.22        
[31] ggplot2_2.2.1         withr_1.0.2           oligo_1.38.0         
[34] lazyeval_0.2.0        magrittr_1.5          crayon_1.3.2         
[37] mime_0.5              memoise_1.0.0         evaluate_0.10        
[40] doParallel_1.0.10     NMF_0.20.6            xml2_1.1.1           
[43] shinydashboard_0.5.3  BiocInstaller_1.24.0  tools_3.3.2          
[46] registry_0.3          gridBase_0.4-7        munsell_0.4.3        
[49] cluster_2.0.5         rngtools_1.2.4        compiler_3.3.2       
[52] grid_3.3.2            RCurl_1.95-4.8        iterators_1.0.8      
[55] rstudioapi_0.6        bitops_1.0-6          gtable_0.2.0         
[58] codetools_0.2-15      DBI_0.5-1             roxygen2_6.0.0       
[61] reshape2_1.4.2        R6_2.2.0              bit_1.1-12           
[64] commonmark_1.1        rprojroot_1.2         desc_1.1.0           
[67] stringi_1.1.2         Rcpp_0.12.9          

 

 

ArrayExpress • 1.6k views
ADD COMMENT
0
Entering edit mode

Hi Andrew, the error is happening in readPhenoData when it looks for Array.Data.File, which is empty because raw data is empty. Which results in empty ph object. It is possible however to modify the readPhenoData to look at Assay.Name instead and get a working function and a full object as a result. Not sure though how dependencies break if this is implemented in the package.

Hope this helps.

Regards,

Andrew

 

> pd <- ArrayExpress:::readPhenoData(habib_ae$sdrf, habib_ae$path)
debugging in: ArrayExpress:::readPhenoData(habib_ae$sdrf, habib_ae$path)
debug: {
    message("ArrayExpress: Reading pheno data from SDRF")
    ph = try(read.AnnotatedDataFrame(sdrf, path = path, row.names = NULL,
        blank.lines.skip = TRUE, fill = TRUE, varMetadata.char = "$",
        quote = "\""))
    ph = ph[gsub(" ", "", ph$Array.Data.File) != “"] %% gsub return an empty index set, ph is assigned an empty array.
    sampleNames(ph) = ph$Array.Data.File
    ph@varMetadata["Array.Data.File", "labelDescription"] = "Index"
    ph@varMetadata["Array.Data.File", "channel"] = as.factor("_ALL_")
    emptylines = which(sapply(seq_len(nrow(pData(ph))), function(i) all(pData(ph)[i, %% this inevitably fails
        ] == "", na.rm = TRUE)))

 

 

 

ADD REPLY

Login before adding your answer.

Traffic: 710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6