I'm trying to use the "intgroup" parameter of the arrayQualityMetrics function to specify the column in my dataset where sample groupings are stored. This is the aptly named "group" column.
When I execute the following:
arrayQualityMetrics(expressionset = sym.eset, outdir = paste(studyName, "Quality1",sep="_" ), intgroup = "group", force = T)
I get an error message:
Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) : all elements of 'intgroup' should match column names of 'pData(expressionset)'.
I have verified that the column named "group" is present in my original dataset:
colnames(pData(sym.eset)) [1] "geo_accession" "Patient_ID" "geo_accn_hg.u133b" "geo_accn_hg.u133plus2" "series" [6] "age" "grade" "size" "ER_STATUS" "pgr" [11]"node" "DFS_TIME" "EVENT_DFS" "DMFS_TIME" "EVENT_DMFS" [16]"treatment" "group" "supplementary_file"
There is an internal function, "prepdata" that is called by "arrayQualityMetrics" and performs some preprocessing steps. I added a trace trace(prepdata, edit = T)
, and a print-line at the section where prepdata checks "intgroup" against the column names in the expressionset's pData object.
function (expressionset, intgroup, do.logtransform) { conversions = c(RGList = "NChannelSet") for (i in seq_along(conversions)) { if (is(expressionset, names(conversions)[i])) { expressionset = try(as(expressionset, conversions[i])) if (is(expressionset, "try-error")) { stop(sprintf("The argument 'expressionset' is of class '%s', and its automatic conversion into '%s' failed. Please try to convert it manually, or contact the creator of that object.\n", names(conversions)[i], conversions[i])) } else { break } } } x = platformspecific(expressionset, intgroup, do.logtransform) if (!all(intgroup %in% colnames(x$pData))) print(colnames(x$pData)) stop("all elements of 'intgroup' should match column names of 'pData(expressionset)'.") x = append(x, list(numArrays = ncol(x$M), intgroup = intgroup, do.logtransform = do.logtransform)) x = append(x, intgroupColors(x)) return(x) }
It looks like only the first 10 pData columns are preserved during preprocessing:
[1] "geo_accession" "geo_accn_hg.u133plus2" "series" "age" "grade" [6] "size" "node" "DFS_TIME" "EVENT_DFS" "DMFS_TIME" Show Traceback Rerun with Debug Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) : all elements of 'intgroup' should match column names of 'pData(expressionset)'.
Re-ordering the columns such that my grouping variable comes first fixes this error,
as was previously suggested by others:
https://stat.ethz.ch/pipermail/bioconductor/2012-June/046295.html