Subscript out of bounds while normalising using affy::rma()
1
0
Entering edit mode
geoffrey ▴ 10
@9d2f950e
Last seen 3.7 years ago
Germany

Hello all,

I was constructing an Affybatch object then normalising it. However, when I used my costumed hugene10stv1cdf, it always reported subscript out of bounds error. When I used default hthgu133a as cdf, the rma() function runs without any problem. I'm using affy 1.64.0

library(GEOquery)
library(limma)
library(splines)
library(affy)
getGEOSuppFiles("GSE19392")
untar("./GSE19392/GSE19392_RAW.tar",exdir = "~/")
cels<-list.files("~/",pattern = "CEL")
base::sapply(base::paste("~/",cels,sep = "/"),gunzip)
cels<-list.files("~/",pattern = "CEL")
cels<-paste0("~/",cels)
library(hugene10stv1cdf)
rawdata<-ReadAffy(filenames = cels,cdfname = "hugene10stv1cdf")
# rawdata<-ReadAffy(filenames = cels)
normdata<-affy::rma(rawdata,destructive = T)

Error in exprs(object)[index, , drop = FALSE] : subscript out of bounds

sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] splines   parallel  grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GEOquery_2.54.1        hthgu133acdf_2.18.0    affy_1.64.0            Biobase_2.46.0         BiocGenerics_0.32.0   
[6] limma_3.42.2           hugene10stv1cdf_2.18.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6                  lattice_0.20-38             tidyr_1.1.3                 assertthat_0.2.1           
 [5] utf8_1.2.1                  R6_2.5.0                    GenomeInfoDb_1.22.1         stats4_3.6.2               
 [9] RSQLite_2.2.6               pillar_1.6.0                zlibbioc_1.32.0             rlang_0.4.10               
[13] curl_4.3.1                  blob_1.2.1                  S4Vectors_0.24.4            Matrix_1.2-18              
[17] preprocessCore_1.48.0       BiocParallel_1.20.1         readr_1.4.0                 RCurl_1.98-1.3             
[21] bit_4.0.4                   DelayedArray_0.12.3         compiler_3.6.2              pkgconfig_2.0.3            
[25] tidyselect_1.1.1            SummarizedExperiment_1.16.1 tibble_3.1.0                GenomeInfoDbData_1.2.2     
[29] ff_4.0.4                    IRanges_2.20.2              matrixStats_0.58.0          fansi_0.4.2                
[33] crayon_1.4.1                dplyr_1.0.5                 withr_2.4.2                 bitops_1.0-6               
[37] lifecycle_1.0.0             DBI_1.1.1                   magrittr_2.0.1              cli_2.5.0                  
[41] cachem_1.0.4                XVector_0.26.0              affyio_1.56.0               xml2_1.3.2                 
[45] ellipsis_0.3.1              generics_0.1.0              vctrs_0.3.7                 tools_3.6.2                
[49] bit64_4.0.5                 glue_1.4.2                  purrr_0.3.4                 hms_1.0.0                  
[53] fastmap_1.1.0               AnnotationDbi_1.48.0        BiocManager_1.30.15         GenomicRanges_1.38.0       
[57] sessioninfo_1.1.1           memoise_2.0.0

To reproduce the workable normalisation using default cdf, please uncomment the line below rawdata<-ReadAffy(filenames = cels,cdfname = "hugene10stv1cdf")

Thanks a lot.

Normalization affy AffymetrixChip • 1.2k views
ADD COMMENT
1
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 4 weeks ago
Republic of Ireland

Hi,

GSE19392 utilised the U133a, so, should you not be using the CDF for that array? The affy package is okay for this array type, and it should automatically detect and then download / install the CDF for you, if not already there.

You imply that you then created a custom CDF for an Affymetrix ST array. affy [the package] cannot be used for ST arrays, and one should instead use oligo for these arrays.

Affymetrix arrays come in 3 main groups:

  • U133
  • HuGene
  • HuEx

*others exist but these are the main groups

HuGene and HuEx are ‘ST’ arrays, which have fundamental differences from the other arrays. The main differences with ST arrays:

  1. mismatch (MM) probes are mostly absent
  2. to accommodate more targets, feature size reduced from 121µm^2 to 25µm^2
  3. target entire length of gene, whereas U133 mainly targets 3` only
  4. manufacturer-supplied annotations differ

The absence of MM probes affects downstream processing:

  • affy Bioconductor package cannot process ST arrays. Must instead use oligo
  • some normalisation (e.g. mas5) and QC methods that use perfect match (PM) and MM probes cannot be used

Kevin

ADD COMMENT
1
Entering edit mode

Thanks a lot Kevin. Sorry I mistook this GSE set with another set of ST arrays I'm processing. I thought this one was an ST as well. Should have checked the basis before

ADD REPLY

Login before adding your answer.

Traffic: 779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6