arrayQualityMetrics: "intgroup" error, groups dropped from x$pData.
1
0
Entering edit mode
abf ▴ 30
@abf-14661
Last seen 2.3 years ago
United States

I'm trying to use the "intgroup" parameter of the arrayQualityMetrics function to specify the column in my dataset where sample groupings are stored.  This is the aptly named "group" column.

When I execute the following:

arrayQualityMetrics(expressionset = sym.eset,
                    outdir = paste(studyName, "Quality1",sep="_" ),
                    intgroup = "group",
                    force = T)

I get an error message:

 Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) :
  all elements of 'intgroup' should match column names of 'pData(expressionset)'.

I have verified that the column named "group" is present in my original dataset:

colnames(pData(sym.eset))
[1] "geo_accession"  "Patient_ID" "geo_accn_hg.u133b" "geo_accn_hg.u133plus2" "series"    
[6] "age"            "grade"      "size"              "ER_STATUS"             "pgr"  
[11]"node"           "DFS_TIME"   "EVENT_DFS"         "DMFS_TIME"             "EVENT_DMFS"          
[16]"treatment"      "group"      "supplementary_file"  

 

There is an internal function, "prepdata" that is called by "arrayQualityMetrics" and performs some preprocessing steps.  I added a trace trace(prepdata, edit = T), and a print-line at the section where prepdata checks "intgroup" against the column names in the expressionset's pData object.

function (expressionset, intgroup, do.logtransform)
{
    conversions = c(RGList = "NChannelSet")
    for (i in seq_along(conversions)) {
        if (is(expressionset, names(conversions)[i])) {
            expressionset = try(as(expressionset, conversions[i]))
            if (is(expressionset, "try-error")) {
                stop(sprintf("The argument 'expressionset' is of class '%s', and its automatic conversion into '%s' failed. Please try to convert it manually, or contact the creator of that object.\n",
                  names(conversions)[i], conversions[i]))
            }
            else {
                break
            }
        }
    }
    x = platformspecific(expressionset, intgroup, do.logtransform)
    if (!all(intgroup %in% colnames(x$pData)))
        print(colnames(x$pData))
    stop("all elements of 'intgroup' should match column names of 'pData(expressionset)'.")
    x = append(x, list(numArrays = ncol(x$M), intgroup = intgroup,
        do.logtransform = do.logtransform))
    x = append(x, intgroupColors(x))
    return(x)
}

 

It looks like only the first 10 pData columns are preserved during preprocessing:

[1] "geo_accession"  "geo_accn_hg.u133plus2" "series"   "age"           "grade"     

[6] "size"           "node"                  "DFS_TIME" "EVENT_DFS"     "DMFS_TIME"        
 Show Traceback
 Rerun with Debug
 Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) :
  all elements of 'intgroup' should match column names of 'pData(expressionset)'.

 

Re-ordering the columns such that my grouping variable comes first fixes this error,

as was previously suggested by others:

https://stat.ethz.ch/pipermail/bioconductor/2012-June/046295.html

 

 

 

 

 

 

 

arrayqualitymetrics R • 1.6k views
ADD COMMENT
2
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 18 hours ago
EMBL Heidelberg

I can't really find a justification for the restriction to 10 columns in the code.  For now I've increased it to 50, which is arbitrary but sufficient for your data.  It'll appear in the devel branch of Bioconductor in a couple of days, but you can install the updated version immediately using:

BiocInstaller::biocLite('grimbough/arrayQualityMetrics')

Please let me know if anything doesn't work as expected.

ADD COMMENT

Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6