Question

Strange problem with BeadArrayUseCases vignette

0

Entering edit mode

smt8n • 0

@smt8n-9982

Last seen 8.4 years ago

Hi all,

I encountered a strange problem running the "Analysis of public data using GEO" part of the code from the BeadArrayUseCases vignette:

    library(GEOquery)
    library(limma)
    library(illuminaHumanv1.db)
    
    url <- "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/"
    filenm <- "GSE5350-GPL2507_series_matrix.txt.gz"
    download.file(paste(url, filenm, sep=""), destfile=filenm)
    gse <- getGEO(filename=filenm)
    
    dim(gse)
    exprs(gse)[1:5,1:2]
    
    samples <- as.character(pData(gse)[,"title"])
    sites <- as.numeric(substr(samples,10,10))
    shortlabels <- substr(samples,12,13)
    
    rnasource <- pData(gse)[,"source_name_ch1"]
    levels(rnasource) <- c("UHRR", "Brain", "UHRR75", "UHRR25")
    
    boxplot(log2(exprs(gse)), col=sites+1, names=shortlabels, las=2, cex.names=0.5, ylab=expression(log[2](intensity)), outline=FALSE, ylim=c(3,10), main="Before batch correction")


###################################################
### code chunk number 50: deAnalysisFromGEO (eval = FALSE)
###################################################
    gse.batchcorrect <- removeBatchEffect(log2(exprs(gse)), batch=sites)

    par(mfrow=c(1,2), oma=c(1,0.5,0.2,0.1))
    boxplot(gse.batchcorrect, col=sites+1, names=shortlabels, las=2, cex.names=0.5, ylab=expression(log[2](intensity)), outline=FALSE, ylim=c(3,10),  main="After batch correction")
    plotMDS(gse.batchcorrect, labels=shortlabels, col=sites+1, main="MDS plot")
    
    ids3 <- featureNames(gse)
    
    qual2 <-  unlist(mget(ids3, illuminaHumanv1PROBEQUALITY, ifnotfound=NA))
    
    table(qual2)
    
    rem2 <- qual2 == "No match" | qual2 == "Bad" | is.na(qual2)
    
    gse.batchcorrect.filt <- gse.batchcorrect[!rem2,]

The last line from above gives me an error:

> gse.batchcorrect.filt <- gse.batchcorrect[!rem2,]
Error in gse.batchcorrect[!rem2, ] : 
  (subscript) logical subscript too long

When I check dimensions:

> length(rem2)
[1] 47300

> nrow(gse.batchcorrect)
[1] 47293

I see that there is indeed a problem. An interesting thing is that before "unlisting" I get correct dimensions:

> length(mget(ids3, illuminaHumanv1PROBEQUALITY, ifnotfound=NA))
[1] 47293

Upon further checks, it seemed, there might be a duplication of entries: some entries had 2 "perfect" or "perfect" and "good".

Could anybody, please, advise me on what the problem might be and how to fix it. Thank you in advance.

This is the sessionInfo() output:

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
Running under: OS X 10.8.5 (Mountain Lion)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] illuminaHumanv1.db_1.26.0 org.Hs.eg.db_3.2.3        RSQLite_1.0.0             DBI_0.3.1                
 [5] AnnotationDbi_1.32.3      IRanges_2.4.8             S4Vectors_0.8.11          limma_3.26.9             
 [9] GEOquery_2.36.0           Biobase_2.30.0            BiocGenerics_0.16.1      

loaded via a namespace (and not attached):
[1] RCurl_1.95-4.8 bitops_1.0-6   XML_3.98-1.4

BeadArrayUseCases • 953 views

ADD COMMENT • link 9.1 years ago smt8n • 0