making ExpressionSet from RGList via NChannelSet for use with 'sva'
2
0
Entering edit mode
chris.rock • 0
@chrisrock-9883
Last seen 8.7 years ago

Dear Bioconductor Supporters,

I am new to R. The solution might be proper understanding of the nature of object classes, data structures and 'wrangling' the same.

I have a custom Nimblegen microarray dataset with clear batch effects that I want to address with the package sva (surrogate variables analysis).  I have successfully found the data quality result by using 'arrayQualityMetrics' run on the RGList object, so it seems that there is a solution, since arrayQualityMetrics also needs to convert the RGList object to other classes.

The specific problem is with an NChannelSet ‘convert’ed from an RGList (pair files read in by Ringo 'readNimblegen' with a targets file that has covariates of interest). 

To run 'sva' going forward I need to get the NChannelSet, made from an RGList object using 'convert' function 'as(RGListobj, "NChannel")] into an ExpressionSet class object. On p. 57 of the Biobase manual there is an example for NChannelSet-class:

## G channel as ExpressionSet

channel(obj, "G")

I did this and it worked:

> testG <- channel(NChan_object, “G”) where ‘NChan_object’ has 15 columns of data-dimension green “G” and red “R” channel elements of interest.

So I got half the data.  But when I repeat the function with “R” character in the 'channel' function of Biobase, instead of getting the other half in another object 'testR' (that I plan I would next bind with the G object [will this strategy work?]), I get the error:

> testR <- channel(Sb750raw_Nchan,"R")

Error in validObject(.Object) :

  invalid class “NChannelSet” object: sampleNames differ between phenoData and protocolData.

Here is a link to load the RGList class data at issue [can easily be made into NChannelSet object using 'as(RGListobj, "NChannel") described above]:

http://apps.biol.ttu.edu/shared/Sb750rawnewRG.zip

I have omitted the last part of the 'traceback()' output to fit the post, and below that the sessionInfo().

Thank you for your kind consideration and support! 

Chris Rock, Assoc Prof, Biological Sciences, Texas Tech University

________________________________

Error in validObject(.Object) :
  invalid class “NChannelSet” object: sampleNames differ between phenoData and protocolData
> traceback()
24: stop(msg, ": ", errors, domain = NA)
23: validObject(.Object)
22: .nextMethod(.Object, ...)
21: eval(expr, envir, enclos)
20: eval(call, callEnv)
19: callNextMethod(.Object, ...)
18: .local(.Object, ...)
17: .nextMethod(.Object, assayData = assayData, phenoData = phenoData,
        featureData = featureData, experimentData = experimentData,
        annotation = annotation, protocolData = protocolData)
16: eval(expr, envir, enclos)
15: eval(call, callEnv)
14: callNextMethod(.Object, assayData = assayData, phenoData = phenoData,
        featureData = featureData, experimentData = experimentData,
        annotation = annotation, protocolData = protocolData)
13: .local(.Object, ...)
12: .nextMethod(<S4 object of class "NChannelSet">, assayData = <environment>,
        phenoData = <S4 object of class "AnnotatedDataFrame">, featureData = <S4 object of class "AnnotatedDataFrame">,
        experimentData = <S4 object of class "MIAME">, annotation = character(0),
        protocolData = <S4 object of class "AnnotatedDataFrame">)
11: eval(expr, envir, enclos)
10: eval(call, callEnv)
9: (function (...)

{
       method <- nextMethod <- NULL
       dotNextMethod <- as.name(".nextMethod")
       parent <- sys.parent(1)
       maybeMethod <- sys.function(parent)

if
[many lines deleted]

8: do.call(callNextMethod, c(.Object, assayData = assayData, phenoData = phenoData,
       dotArgs[isSlot]))
7: .local(.Object, ...)
6: initialize(object, assayData = assayData, phenoData = phenoData,
       featureData = featureData(object), experimentData = experimentData(object),
       annotation = annotation(object), protocolData = protocolData(object),
       ...)
5: initialize(object, assayData = assayData, phenoData = phenoData,
       featureData = featureData(object), experimentData = experimentData(object),
       annotation = annotation(object), protocolData = protocolData(object),
       ...)
4: selectChannels(object, name)
3: selectChannels(object, name)
2: channel(Sb750raw_Nchan, "R")
1: channel(Sb750raw_Nchan, "R")

__________________________________

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] convert_1.46.0      marray_1.48.0       limma_3.26.8       
[4] Biobase_2.30.0      BiocGenerics_0.16.1

biobase convert sva ringo arrayqualitymetrics • 2.7k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 12 minutes ago
WEHI, Melbourne, Australia

Dear Chris,

It is essential that you complete the within-array normalization steps in the limma package before you attempt to pass the data to sva. You will then have an MAList object, and you can simply pass the matrix of normalized log-ratios MA$M to sva. No need to do any class conversion.

If fact you can't properly judge whether there is a batch effect until the within-array normalization has been done.

Trying to represent an RGList as a ExpressionSet will never give you good results because it will lose the two-color structure. That's why there is no easy pipeline for doing it.

BTW, your arrays are clearly not standard expression arrays. Normalization may need some custom treatment of the different types of control probes. I suspect that this consideration will be more important than anything that sva might do. I don't know what your arrays are measuring, but I find it worrying that the probes labeled "RANDOMbkg" have intensities just as high as the other probes.

ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…

Gordon is right. The most common approach is to 'normalise' the data with the facilities in limma, transform the data from (R,G) to (M,A) coordinates (this is essentially a 45-degree rotation in log2-space, M = ld R - ld G, 2A = ld R + ld G), throw away the A values, and do the further analyses (e.g. sva, linear model, ANOVA,...) on the M-values (log2-ratios) only. This is what the MAList to ExpressionSet conversion in the convert package does. There is no RGList to ExpressionSet conversion in the package for the reasons Gordon mentions.

For some more uncommon points:

There may be reasons to use single-channel normalisation (e.g. Yang and Thorne in Section 5 of [1]) and do downstream analyses on the normalised (R,G), with color as a covariate. An extreme example is if the data from one of the two dyes have poor quality, but the data from the other dye can still be rescued. Gordon mentions that Chris' data do not look like standard expression arrays; and that may also be a reason for looking into such custom treatments in the normalised (R,G) representation. In those cases, the NChannelSet or ExpressionSet are both reasonable containers for the data.

Another use case is independent filtering [1] or independent hypothesis weighting [2] with A as the covariate.

Statistically speaking, only keeping M and not using A amounts to assuming M is a sufficient statistic of the data. However, we know that at least for quality control and noise modelling, A contains additional information.


[1] Normalization for Two-color cDNA Microarray Data, Yee Hwa Yang and Natalie P. Thorne - Lecture Notes-Monograph Series, 2003
[2] http://www.pnas.org/content/107/21/9546.full
[3] http://biorxiv.org/content/early/2015/12/13/034330

PS It is  pretty straightforward to build an ExpressionSet from scratch, from its three main components: a numeric matrix, a data.frame of row annotations (e.g. gene identifiers etc.) and a data.frame of column (sample) annotations. See the Biobase vignette "An introduction to Biobase and ExpressionSets".

 

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6