Question

arrayQualityMetrics: "intgroup" error, groups dropped from x$pData.

0

Entering edit mode

abf ▴ 30

@abf-14661

Last seen 2.7 years ago

United States

I'm trying to use the "intgroup" parameter of the arrayQualityMetrics function to specify the column in my dataset where sample groupings are stored. This is the aptly named "group" column.

When I execute the following:

arrayQualityMetrics(expressionset = sym.eset,
                    outdir = paste(studyName, "Quality1",sep="_" ),
                    intgroup = "group",
                    force = T)

I get an error message:

 Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) :
  all elements of 'intgroup' should match column names of 'pData(expressionset)'.

I have verified that the column named "group" is present in my original dataset:

colnames(pData(sym.eset))
[1] "geo_accession"  "Patient_ID" "geo_accn_hg.u133b" "geo_accn_hg.u133plus2" "series"    
[6] "age"            "grade"      "size"              "ER_STATUS"             "pgr"  
[11]"node"           "DFS_TIME"   "EVENT_DFS"         "DMFS_TIME"             "EVENT_DMFS"          
[16]"treatment"      "group"      "supplementary_file"

There is an internal function, "prepdata" that is called by "arrayQualityMetrics" and performs some preprocessing steps. I added a trace trace(prepdata, edit = T), and a print-line at the section where prepdata checks "intgroup" against the column names in the expressionset's pData object.

function (expressionset, intgroup, do.logtransform)
{
    conversions = c(RGList = "NChannelSet")
    for (i in seq_along(conversions)) {
        if (is(expressionset, names(conversions)[i])) {
            expressionset = try(as(expressionset, conversions[i]))
            if (is(expressionset, "try-error")) {
                stop(sprintf("The argument 'expressionset' is of class '%s', and its automatic conversion into '%s' failed. Please try to convert it manually, or contact the creator of that object.\n",
                  names(conversions)[i], conversions[i]))
            }
            else {
                break
            }
        }
    }
    x = platformspecific(expressionset, intgroup, do.logtransform)
    if (!all(intgroup %in% colnames(x$pData)))
        print(colnames(x$pData))
    stop("all elements of 'intgroup' should match column names of 'pData(expressionset)'.")
    x = append(x, list(numArrays = ncol(x$M), intgroup = intgroup,
        do.logtransform = do.logtransform))
    x = append(x, intgroupColors(x))
    return(x)
}

It looks like only the first 10 pData columns are preserved during preprocessing:

[1] "geo_accession"  "geo_accn_hg.u133plus2" "series"   "age"           "grade"     

[6] "size"           "node"                  "DFS_TIME" "EVENT_DFS"     "DMFS_TIME"        
 Show Traceback
 Rerun with Debug
 Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) :
  all elements of 'intgroup' should match column names of 'pData(expressionset)'.

Re-ordering the columns such that my grouping variable comes first fixes this error,

as was previously suggested by others:

https://stat.ethz.ch/pipermail/bioconductor/2012-June/046295.html

arrayqualitymetrics R • 1.7k views

ADD COMMENT • link updated 7.4 years ago by Mike Smith ★ 6.6k • written 7.4 years ago by abf ▴ 30

score 2 · Accepted Answer · 2017-12-20

I can't really find a justification for the restriction to 10 columns in the code. For now I've increased it to 50, which is arbitrary but sufficient for your data. It'll appear in the devel branch of Bioconductor in a couple of days, but you can install the updated version immediately using:

BiocInstaller::biocLite('grimbough/arrayQualityMetrics')

Please let me know if anything doesn't work as expected.