Dear bioconductor members,
I finally got my hands on results of HTA 2.0 microarray experiments, and I started processing them using the standard methods.
Reading CEL files, and performing RMA doesn't pose problems, for RMA I used oligo this way :
> data.rma = oligo::rma(data, background=TRUE, normalize=TRUE, subset=NULL, target="core")
But then, I tried to annotate my dataset, with two different methods, but neither works.
First using the PDInfo :
> data.ann <- annotateEset(data.rma, pd.hta.2.0, type = "core") Error: There appears to be a mismatch between the ExpressionSet and the annotation data. Please ensure that the summarization level for the ExpressionSet and the 'type' argument are the same. See ?annotateEset for more information on the type argument.
Then, using the ChipDb :
> data.ann <- annotateEset(data.rma, hta20sttranscriptcluster.db, columns = c("PROBEID", "ENTREZID", "SYMBOL", "ENSEMBL", "GENENAME")) Error: cannot allocate vector of size 37.1 Gb In addition: Warning messages: 1: In unique(.Internal(unlist(lapply(x, levels), recursive, FALSE))) : Reached total allocation of 8089Mb: see help(memory.size) 2: In unique(.Internal(unlist(lapply(x, levels), recursive, FALSE))) : Reached total allocation of 8089Mb: see help(memory.size) 3: In unique(.Internal(unlist(lapply(x, levels), recursive, FALSE))) : Reached total allocation of 8089Mb: see help(memory.size) 4: In unique(.Internal(unlist(lapply(x, levels), recursive, FALSE))) : Reached total allocation of 8089Mb: see help(memory.size)
So, I don't understand the trouble with the PDInfo, since I used the same level of summarization (ie "core") in both commands. The second one is simply my computer not being able to process so much data. For the moment, I don't have access to a bioinformatic server, I will see if that's possible, but is there no way to annotate HTA arrays with 8Go of RAM.
For the details :
Computer : W10 64 bits, i5-2410M CPU (dual core, 2.3 Ghz), 8Go RAM, using R with Rstudio
Session Info :
R version 3.3.0 (2016-05-03) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 [4] LC_NUMERIC=C LC_TIME=French_France.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] hta20sttranscriptcluster.db_8.3.1 org.Hs.eg.db_3.3.0 [3] AnnotationDbi_1.34.4 NMF_0.23.3 [5] cluster_2.0.4 rngtools_1.2.4 [7] pkgmaker_0.25.10 registry_0.3 [9] limma_3.28.14 pd.hta.2.0_3.12.1 [11] RSQLite_1.0.0 DBI_0.4-1 [13] oligo_1.36.1 Biostrings_2.40.2 [15] XVector_0.12.0 IRanges_2.6.1 [17] S4Vectors_0.10.2 genefilter_1.54.2 [19] affycoretools_1.44.2 BiocInstaller_1.22.3 [21] Biobase_2.32.0 BiocGenerics_0.18.0 [23] ggplot2_2.1.0 rpart_4.1-10 [25] Matrix_1.2-6 lattice_0.20-33 [27] oligoClasses_1.34.0 loaded via a namespace (and not attached): [1] colorspace_1.2-6 hwriter_1.3.2 class_7.3-14 [4] modeltools_0.2-21 mclust_5.2 biovizBase_1.20.0 [7] GenomicRanges_1.24.2 dichromat_2.0-0 affyio_1.42.0 [10] flexmix_2.3-13 mvtnorm_1.0-5 interactiveDisplayBase_1.10.3 [13] codetools_0.2-14 splines_3.3.0 R.methodsS3_1.7.1 [16] ggbio_1.20.1 doParallel_1.0.10 robustbase_0.92-6 [19] geneplotter_1.50.0 knitr_1.13 Formula_1.2-1 [22] Rsamtools_1.24.0 gridBase_0.4-7 annotate_1.50.0 [25] kernlab_0.9-24 GO.db_3.3.0 R.oo_1.20.0 [28] graph_1.50.0 shiny_0.13.2 httr_1.2.1 [31] GOstats_2.38.1 acepack_1.3-3.3 htmltools_0.3.5 [34] tools_3.3.0 gtable_0.2.0 affy_1.50.0 [37] Category_2.38.0 reshape2_1.4.1 affxparser_1.44.0 [40] Rcpp_0.12.5 trimcluster_0.1-2 gdata_2.17.0 [43] preprocessCore_1.34.0 rtracklayer_1.32.1 fpc_2.1-10 [46] iterators_1.0.8 stringr_1.0.0 mime_0.5 [49] ensembldb_1.4.7 gtools_3.5.0 XML_3.98-1.4 [52] dendextend_1.2.0 DEoptimR_1.0-6 AnnotationHub_2.4.2 [55] edgeR_3.14.0 MASS_7.3-45 zlibbioc_1.18.0 [58] scales_0.4.0 BSgenome_1.40.1 VariantAnnotation_1.18.5 [61] SummarizedExperiment_1.2.3 RBGL_1.48.1 RColorBrewer_1.1-2 [64] gridExtra_2.2.1 biomaRt_2.28.0 reshape_0.8.5 [67] latticeExtra_0.6-28 stringi_1.1.1 gcrma_2.44.0 [70] foreach_1.4.3 GenomicFeatures_1.24.4 caTools_1.17.1 [73] BiocParallel_1.6.2 chron_2.3-47 GenomeInfoDb_1.8.3 [76] prabclus_2.2-6 ReportingTools_2.12.2 bitops_1.0-6 [79] GenomicAlignments_1.8.4 bit_1.1-12 GSEABase_1.34.0 [82] AnnotationForge_1.14.2 GGally_1.2.0 plyr_1.8.4 [85] magrittr_1.5 DESeq2_1.12.3 R6_2.1.2 [88] gplots_3.0.1 Hmisc_3.17-4 whisker_0.3-2 [91] foreign_0.8-66 survival_2.39-5 RCurl_1.95-4.8 [94] nnet_7.3-12 KernSmooth_2.23-15 OrganismDbi_1.14.1 [97] PFAM.db_3.3.0 locfit_1.5-9.1 grid_3.3.0 [100] data.table_1.9.6 diptest_0.75-7 digest_0.6.9 [103] xtable_1.8-2 ff_2.2-13 httpuv_1.3.3 [106] R.utils_2.3.0 munsell_0.4.3
OK, I have fixed the bugs:
It should progress through the build machines within a day or two - you are looking for version 1.44.3.
Thanks, that's nice to see such rapid support and fix ! I will look it up.