Dear Community,
based on the code presented on a previous post, regarding the import and analysis of Affymetrix Human ST 2.0 microarrays (Preprocessing of Human Gene 2.0 ST microarrays with oligo R package and annotation options), i tried to create some exploratory diagnostic plots regarding the success of normalization, for a needed report:
library(oligo)
library(affycoretools)
library(hugene20sttranscriptcluster.db)
library(limma)
librarypd.hugene.2.0.st)
setwd(mydir)
pdat <- read.table("pdat.project.txt",header=TRUE,stringsAsFactors = FALSE) # phenotype info
celfiles = list.celfiles()
affyRaw <- read.celfiles(celfiles)
identical(colnames(affyRaw),rownames(pdat)) # need to be identical for incorporate phenotype info
[1] TRUE
pd <- AnnotatedDataFrame(data= pdat)
phenoData(affyRaw) <- pd
celfiles.rma <- rma(affy.cels, target="probeset")
But then when i tried:
par(mfrow=c(1,2))
boxplot(affyRaw, col = cols, target="core",las = 2, main = "Pre-Normalization");
Error in `$<-.data.frame`(`*tmp*`, "channel", value = integer(0)) :
replacement has 0 rows, data has 2
boxplot(celfiles.rma, col = cols, las = 2, main = "Post-Normalization")
Any ideas or suggestions about this weird error ?
affyRaw
GeneFeatureSet (storageMode: lockedEnvironment)
assayData: 2598544 features, 6 samples
element names: exprs
protocolData
rowNames: Eogh1_DN41.CEL Eogh1_DN42.CEL ... Eogh1_MN243.CEL (6 total)
varLabels: exprs dates
varMetadata: labelDescription channel
phenoData
rowNames: Eogh1_DN41.CEL Eogh1_DN42.CEL ... Eogh1_MN243.CEL (6 total)
varLabels: Condition Batch
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hugene.2.0.st
sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Greek_Greece.1253 LC_CTYPE=Greek_Greece.1253
[3] LC_MONETARY=Greek_Greece.1253 LC_NUMERIC=C
[5] LC_TIME=Greek_Greece.1253
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] RColorBrewer_1.1-2 pd.hugene.2.0.st_3.14.1
[3] DBI_0.5-1 RSQLite_1.1-2
[5] limma_3.28.21 hugene20sttranscriptcluster.db_8.4.0
[7] org.Hs.eg.db_3.3.0 AnnotationDbi_1.34.4
[9] affycoretools_1.44.3 oligo_1.36.1
[11] Biostrings_2.40.2 XVector_0.12.1
[13] IRanges_2.6.1 S4Vectors_0.10.3
[15] Biobase_2.32.0 oligoClasses_1.34.0
[17] BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 hwriter_1.3.2
[3] biovizBase_1.20.0 htmlTable_1.9
[5] GenomicRanges_1.24.3 base64enc_0.1-3
[7] dichromat_2.0-0 affyio_1.42.0
[9] interactiveDisplayBase_1.10.3 codetools_0.2-15
[11] splines_3.3.1 R.methodsS3_1.7.1
[13] ggbio_1.20.2 geneplotter_1.50.0
[15] knitr_1.15.1 Formula_1.2-1
[17] Rsamtools_1.24.0 annotate_1.50.1
[19] cluster_2.0.5 GO.db_3.3.0
[21] R.oo_1.21.0 graph_1.50.0
[23] shiny_1.0.0 httr_1.2.1
[25] GOstats_2.38.1 backports_1.0.5
[27] assertthat_0.1 Matrix_1.2-8
[29] lazyeval_0.2.0 acepack_1.4.1
[31] htmltools_0.3.5 tools_3.3.1
[33] gtable_0.2.0 affy_1.50.0
[35] Category_2.38.0 reshape2_1.4.2
[37] affxparser_1.44.0 Rcpp_0.12.9
[39] gdata_2.17.0 preprocessCore_1.34.0
[41] rtracklayer_1.32.2 iterators_1.0.8
[43] stringr_1.1.0 mime_0.5
[45] ensembldb_1.4.7 gtools_3.5.0
[47] XML_3.98-1.5 AnnotationHub_2.4.2
[49] edgeR_3.14.0 zlibbioc_1.18.0
[51] scales_0.4.1 BSgenome_1.40.1
[53] VariantAnnotation_1.18.7 BiocInstaller_1.22.3
[55] SummarizedExperiment_1.2.3 RBGL_1.48.1
[57] memoise_1.0.0 gridExtra_2.2.1
[59] ggplot2_2.2.1 biomaRt_2.28.0
[61] rpart_4.1-10 reshape_0.8.6
[63] latticeExtra_0.6-28 stringi_1.1.2
[65] gcrma_2.44.0 genefilter_1.54.2
[67] foreach_1.4.3 checkmate_1.8.2
[69] caTools_1.17.1 GenomicFeatures_1.24.5
[71] BiocParallel_1.6.6 GenomeInfoDb_1.8.7
[73] ReportingTools_2.12.2 bitops_1.0-6
[75] lattice_0.20-34 GenomicAlignments_1.8.4
[77] htmlwidgets_0.8 bit_1.1-12
[79] GSEABase_1.34.1 AnnotationForge_1.14.2
[81] GGally_1.3.0 plyr_1.8.4
[83] magrittr_1.5 DESeq2_1.12.4
[85] R6_2.2.0 gplots_3.0.1
[87] Hmisc_4.0-2 foreign_0.8-67
[89] survival_2.40-1 RCurl_1.95-4.8
[91] nnet_7.3-12 tibble_1.2
[93] KernSmooth_2.23-15 OrganismDbi_1.14.1
[95] PFAM.db_3.3.0 locfit_1.5-9.1
[97] grid_3.3.1 data.table_1.10.4
[99] digest_0.6.12 xtable_1.8-2
[101] ff_2.2-13 httpuv_1.3.3
[103] R.utils_2.5.0 munsell_0.4.3
Dear James,
thank you for your answer and suggestions-regarding your comment about phenoData object, i procedeed as above, because of other errors initially appeared--but i have checked consistently that the rownames of the pdat object are identical and in the same order with the row names of the affy data object by the function identical above. In detail, when i initially used:
affyRaw <- read.celfiles(celfiles, phenoData = pdat)
Platform design info loaded.
Reading in : Eogh1_DN41.CEL
Reading in : Eogh1_DN42.CEL
Reading in : Eogh1_DN43.CEL
Reading in : Eogh1_MN241.CEL
Reading in : Eogh1_MN242.CEL
Reading in : Eogh1_MN243.CEL
Error in environmentIsLocked(object) : not an environment
Then, when i moved as above, adding after the phenodata info:
validObject(affyRaw)
Error in validObject(affyRaw) : invalid class "GeneFeatureSet" object:
'NChannelSet' varMetadata must have a 'channel' column
And finally, why the error appears in the raw data and not after normalization with boxplot ?
Thus, what is your opinion based on the above errors ?
(*Just to mention, this is not a public dataset and is based on a collaboration, that's why the phenodata data frame is added externally (if you meant so above))
That's exactly what I was talking about. It's non-trivial to add a phenoData object to any of the eSet-derived objects, because there are expectations for the format that you might not know about. In this example, you are trying to put a phenoData object into something that derives from an NChannelSet. And as you see in your error, an NChannelSet's phenoData has to contain a varMetadata that contains a 'channel' column.
So there are two ways you can approach this issue. You can educate yourself as to the ins and outs of eSet derived classes, which sounds pretty fun now that I mention it, or you can just let oligo generate the correct phenoData object, and then modify what it made to suit. As an example, using some random data I happen to have floating around here:
So now I have a GeneFeatureSet that is officially up to snuff, and we can now modify the information in the phenoData slot. Let's see what's in the pData slot of the phenoData object:
That's pretty boring. Let's say we want to add extra stuff like the treatment and type of animal.
And now I have a GeneFeatureSet that has the phenotypic information I care about, while remaining a valid object!
Dear James,
thank you again for your answer and comprehensive example !! Although i have read extensively about eSet containers, especially with the Affy package, but it seems that i have encoutered this problem with the oligo R package for the first time-also regarding the possible choises you suggest, because i have only 6 samples, i will try your second approach from scratch for a quick way-and then perhaps check more extensively the above problem and the specific format-NChannelSet