I am trying to reanalyze some old microarray data from my lab, using this tutorial: https://bioconductor.org/packages/release/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html
The origonal data was collected on an Affymetrix human genome 2.0 microarray. I have 6 .CEL files - 3 controls, 3 patients. While I am able to load the files and perform most of the analyses in the tutorial, the .CEL files lack meaningful metadata, and as a result, I cannot identify specific samples or groups in the resulting analyses, nor can I get some of the data to plot properly.
Everything goes fine until step 5 of the tutorial. Here, because my files lack metadata, the command:
head(Biobase::pData(raw_data))
Gives the following output instead of a list of metadata columns I can select for additional analysis:
index
ctrlMac1.CEL 1
ctrlMac2.CEL 2
ctrMac3.CEL 3
PtMac1.CEL 4
PtMac2.CEL 5
PtMac3.CEL 6
I have searched the documentation, but cannot figure out how to add metainformation that can then be extracted/used using the pData command. How do I go about annotating my data?
Thank you,
Bryan
Output of sessioninfo():
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8
[3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Canada.utf8
time zone: America/Toronto
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] openxlsx_4.2.7.1 genefilter_1.88.0
[3] matrixStats_1.4.1 stringr_1.5.1
[5] tidyr_1.3.1 dplyr_1.1.4
[7] enrichplot_1.26.5 pheatmap_1.0.12
[9] RColorBrewer_1.1-3 geneplotter_1.84.0
[11] annotate_1.84.0 XML_3.99-0.18
[13] lattice_0.22-6 clusterProfiler_4.14.4
[15] ReactomePA_1.50.0 topGO_2.58.0
[17] SparseM_1.84-2 GO.db_3.20.0
[19] graph_1.84.0 arrayQualityMetrics_3.62.0
[21] hugene10sttranscriptcluster.db_8.8.0 pd.hugene.1.0.st.v1_3.14.1
[23] ArrayExpress_1.66.0 ggplot2_3.5.1
[25] gplots_3.2.0 hugene20sttranscriptcluster.db_8.8.0
[27] org.Hs.eg.db_3.20.0 AnnotationDbi_1.68.0
[29] BiocManager_1.30.25 pd.hugene.2.0.st_3.14.1
[31] DBI_1.2.3 RSQLite_2.3.9
[33] limma_3.62.1 affy_1.84.0
[35] oligo_1.70.0 Biostrings_2.74.1
[37] GenomeInfoDb_1.42.1 XVector_0.46.0
[39] IRanges_2.40.1 S4Vectors_0.44.0
[41] Biobase_2.66.0 oligoClasses_1.68.0
[43] BiocGenerics_0.52.0
loaded via a namespace (and not attached):
[1] fs_1.6.5 bitops_1.0-9
[3] httr_1.4.7 tools_4.4.2
[5] gcrma_2.78.0 backports_1.5.0
[7] R6_2.5.1 lazyeval_0.2.2
[9] withr_3.0.2 graphite_1.52.0
[11] gridExtra_2.3 base64_2.0.2
[13] preprocessCore_1.68.0 cli_3.6.3
[15] labeling_0.4.3 askpass_1.2.1
[17] systemfonts_1.1.0 yulab.utils_0.1.8
[19] gson_0.1.0 foreign_0.8-87
[21] illuminaio_0.48.0 DOSE_4.0.0
[23] svglite_2.1.3 R.utils_2.12.3
[25] affyPLM_1.82.0 BeadDataPackR_1.58.0
[27] rstudioapi_0.17.1 generics_0.1.3
[29] gridGraphics_0.5-1 hwriter_1.3.2.1
[31] gtools_3.9.5 zip_2.3.1
[33] Matrix_1.7-1 interp_1.1-6
[35] abind_1.4-8 R.methodsS3_1.8.2
[37] lifecycle_1.0.4 SummarizedExperiment_1.36.0
[39] beadarray_2.56.0 qvalue_2.38.0
[41] SparseArray_1.6.0 grid_4.4.2
[43] blob_1.2.4 affxparser_1.78.0
[45] crayon_1.5.3 ggtangle_0.0.6
[47] cowplot_1.1.3 KEGGREST_1.46.0
[49] pillar_1.10.0 knitr_1.49
[51] fgsea_1.32.2 GenomicRanges_1.58.0
[53] codetools_0.2-20 fastmatch_1.1-6
[55] glue_1.8.0 ggfun_0.1.8
[57] data.table_1.16.4 vctrs_0.6.5
[59] png_0.1-8 treeio_1.30.0
[61] gtable_0.3.6 cachem_1.1.0
[63] xfun_0.49 S4Arrays_1.6.0
[65] tidygraph_1.3.1 survival_3.8-3
[67] iterators_1.0.14 statmod_1.5.0
[69] nlme_3.1-166 ggtree_3.14.0
[71] bit64_4.5.2 affyio_1.76.0
[73] KernSmooth_2.23-26 rpart_4.1.23
[75] colorspace_2.1-1 Hmisc_5.2-1
[77] nnet_7.3-20 tidyselect_1.2.1
[79] bit_4.5.0.1 compiler_4.4.2
[81] htmlTable_2.4.3 DelayedArray_0.32.0
[83] checkmate_2.3.2 scales_1.3.0
[85] caTools_1.18.3 hexbin_1.28.5
[87] rappdirs_0.3.3 digest_0.6.37
[89] rmarkdown_2.29 htmltools_0.5.8.1
[91] pkgconfig_2.0.3 jpeg_0.1-10
[93] base64enc_0.1-3 MatrixGenerics_1.18.0
[95] fastmap_1.2.0 rlang_1.1.4
[97] htmlwidgets_1.6.4 UCSC.utils_1.2.0
[99] farver_2.1.2 jsonlite_1.8.9
[101] BiocParallel_1.40.0 GOSemSim_2.32.0
[103] R.oo_1.27.0 magrittr_2.0.3
[105] Formula_1.2-5 GenomeInfoDbData_1.2.13
[107] ggplotify_0.1.2 patchwork_1.3.0
[109] munsell_0.5.1 Rcpp_1.0.13-1
[111] ape_5.8-1 viridis_0.6.5
[113] vsn_3.74.0 stringi_1.8.4
[115] ggraph_2.2.1 zlibbioc_1.52.0
[117] MASS_7.3-63 plyr_1.8.9
[119] parallel_4.4.2 ggrepel_0.9.6
[121] deldir_2.0-4 graphlayouts_1.2.1
[123] splines_4.4.2 igraph_2.1.2
[125] reshape2_1.4.4 evaluate_1.0.1
[127] latticeExtra_0.6-30 foreach_1.5.2
[129] tweenr_2.0.3 openssl_2.3.0
[131] purrr_1.0.2 polyclip_1.10-7
[133] ggforce_0.4.2 xtable_1.8-4
[135] ff_4.5.0 reactome.db_1.89.0
[137] tidytree_0.4.6 viridisLite_0.4.2
[139] tibble_3.2.1 aplot_0.2.4
[141] memoise_2.0.1 setRNG_2024.2-1
[143] cluster_2.1.8 gridSVG_1.7-5
Thanks for the documentation. I have most of it working, except for the most critical step - incorporating the new metadata with the dataset.
At this point I am successfully:
However, when I try to merge the raw_data (e.g. microarray data) with this data, using the command:
I get the error:
You don't need to generate an
ExpressionSet
, as you already have aGeneFeatureSet
, which extendseSet
and is a better fit for what you are doing.I pointed you to the help for
ExpressionSet
because it shows how to generate aphenoData
object. But maybe that is a bit too pedantic. Since you already have aGeneFeatureSet
, it's easier to just pull out the existing 'phenoData' object, add your stuff, and put it back in usingpData<-
Makes sense now, thanks for the help!