Question

ExpressionSet or MAList

0

Entering edit mode

Daniel Brewer ★ 1.9k

@daniel-brewer-1791

Last seen 10.7 years ago

Martin Morgan wrote: > Hi Daniel -- > > Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> writes: > >> Hi, >> >> I am starting to think about grouping a series of microarray datasets >> into bioconductor objects so that I can quickly look to see how a gene >> behaves in each dataset. The two main options seem to be to use >> ExpressionSet or Limma's MAList. Has anyone got an opinion on which >> would be best to use or the advantages and disadvantages of both. > > Some biases on my part, but... > > I guess either ExpressionSet or MAList is really meant to represent a > single 'experiment'. Sounds like you're going to create a collection > of experiments, so a collection of ExpressionSet or MAList objects (it > would be a mistake, I think, to jam all your experiments into a single > object of either of these classes). > >> To my mind MAList stores the annotation with the dataset which I feel is > > Storing annotations with the object can be a bad thing if the > annotations are the same, because then there are effectively different > variants of the same annotation, one for each object. These will > inevitably drift apart, leading to confusion. There is also a memory > use issue. > > That said, annotations can be added to ExpressionSet, specifically > using featureData to store an AnnotatedDataFrame (data.frame + > annotation on column labels). > >> an advantage whereas ExpressionSet is the base implementation for many >> libraries. > > ExpressionSet is a little more tightly designed than MAList (MAList is > essentially a list and so can contain (or not contain) any data; > ExpressionSet is an S4 class that has to contain certain data. While > you lose on freedom with ExpressionSet, the constriction probably > comes with a benefit in terms of greater certainty about what the > object actually contains. This imposed uniformity likely has benefits > when the number of experiments you're managing increases. Many users > probably view their MAList / ExpressionSet as 'read-only', so for > these users the fact that you could do something to mess up an MAList > really is only an academic possibility (you can also do things to mess > up an ExpressionSet, again maybe just a bit harder to do that). > > ExpressionSet also contains an experimentData slot, which would be an > ideal location to document which experiment the ExpressionSet > represents. > >> Dan > > hope that helps, > > Martin > Hi, I think you have sold me on the idea of ExpressionSet (mainly becuase of the MIAME stuff in ExperimentData), but I have one question about it. Is there anyway to store associated detection p-values/weights with it? This would be useful information to retain for later analysis. Dan -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}

Annotation ExperimentData Cancer Annotation ExperimentData Cancer • 2.0k views

ADD COMMENT • link updated 17.0 years ago by Gordon Smyth 52k • written 17.0 years ago by Daniel Brewer ★ 1.9k

score 0 · Answer 1 · 2008-05-01

Hi Daniel -- Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> writes: > Martin Morgan wrote: >> Hi Daniel -- >> >> Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> writes: >> >>> Hi, >>> >>> I am starting to think about grouping a series of microarray datasets >>> into bioconductor objects so that I can quickly look to see how a gene >>> behaves in each dataset. The two main options seem to be to use >>> ExpressionSet or Limma's MAList. Has anyone got an opinion on which >>> would be best to use or the advantages and disadvantages of both. >> >> Some biases on my part, but... >> >> I guess either ExpressionSet or MAList is really meant to represent a >> single 'experiment'. Sounds like you're going to create a collection >> of experiments, so a collection of ExpressionSet or MAList objects (it >> would be a mistake, I think, to jam all your experiments into a single >> object of either of these classes). >> >>> To my mind MAList stores the annotation with the dataset which I feel is >> >> Storing annotations with the object can be a bad thing if the >> annotations are the same, because then there are effectively different >> variants of the same annotation, one for each object. These will >> inevitably drift apart, leading to confusion. There is also a memory >> use issue. >> >> That said, annotations can be added to ExpressionSet, specifically >> using featureData to store an AnnotatedDataFrame (data.frame + >> annotation on column labels). >> >>> an advantage whereas ExpressionSet is the base implementation for many >>> libraries. >> >> ExpressionSet is a little more tightly designed than MAList (MAList is >> essentially a list and so can contain (or not contain) any data; >> ExpressionSet is an S4 class that has to contain certain data. While >> you lose on freedom with ExpressionSet, the constriction probably >> comes with a benefit in terms of greater certainty about what the >> object actually contains. This imposed uniformity likely has benefits >> when the number of experiments you're managing increases. Many users >> probably view their MAList / ExpressionSet as 'read-only', so for >> these users the fact that you could do something to mess up an MAList >> really is only an academic possibility (you can also do things to mess >> up an ExpressionSet, again maybe just a bit harder to do that). >> >> ExpressionSet also contains an experimentData slot, which would be an >> ideal location to document which experiment the ExpressionSet >> represents. >> >>> Dan >> >> hope that helps, >> >> Martin >> > > Hi, > > I think you have sold me on the idea of ExpressionSet (mainly becuase of > the MIAME stuff in ExperimentData), but I have one question about it. > Is there anyway to store associated detection p-values/weights with it? Well, maybe this will un-sell you ;) ExpressionSet is guaranteed to have an 'exprs' matrix in its assayData. What you want to do is add another, identically dimensioned, matrix, e.g., 'weights'. ExpressionSet allows you to do that, though then you're sort of back in the MAList realm of not being sure exactly what you have. If you were creating an ExpressionSet from scratch and had an 'exprs' matrix and a 'weights' matrix you could do something like > new("ExpressionSet", exprs=exprs, weights=weights) and weights would end up in assayData. Things are a little more complicated if you have an existing ExpressionSet that you want to add a matrix to. The basic steps are > storageMode(obj) [1] "lockedEnvironment" > storageMode(obj) = "environment" > assayData(obj)[["weights"]] = weights > storageMode(obj) = "lockedEnvironment" > validObject(obj, complete=TRUE) First, by default ExpressionSet stores its 'big' data in a special container called a 'lockedEnvironment'. This container can't normally be modified, and so we change it's storage mode to a modifiable form (this actually makes a copy of the underyling environment; we could also have changed the storage mode to 'list', and then assayData would behave like a list). We then add our data, and lock the environment again (locking is important). Finally we check that the object we've just created conforms to ExpressionSet expectations (e.g., that the matrix we've added has the right dimensions and dimnames). Once 'weights' is in assayData, subssetting the expression set, accessing the assayData elements (e.g., assayData(obj)[["weights"]]), etc should all work as expected. The 'convert' package has this coercion method defined setAs("MAList", "ExpressionSet", function(from) { nM <- new("MIAME") notes(nM) <- list("Converted from MAList object, exprs are M-values") new("ExpressionSet", exprs = as.matrix(from$M), phenoData = new("AnnotatedDataFrame", data=from$targets), experimentData = nM) }) and you might consider writing your own version that customizes which infomration is moved from MAList to ExpressionSet. Hope that helps, Martin > This would be useful information to retain for later analysis. > > Dan > > -- > ************************************************************** > Daniel Brewer, Ph.D. > > Institute of Cancer Research > Email: daniel.brewer at icr.ac.uk > ************************************************************** > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > This e-mail message is confidential and for use by the...{{dropped:11}}

score 0 · Answer 2 · 2008-05-06

Although somewhat tangential to the discussion, because ExpressionSet and MAList both can store annotation, I thought it might be interesting to explain why I view the ability to store more than one column of probe annotation in a microarray data object as essential. There are many reasons, including 1. If the data object is subsetted frequently, the annotation should subset appropriately. 2. I want to be able to come back to an analysis years afterwards and be able to repeat it exactly, including the annotation, not be completely dependent on a constantly changing annotation package. This is part of reproducible research as I see it. Of course I also want to be able to update the annotation, but in a controlled way. 3. Applications requiring annotation such as the limma controlStatus() function. 4. I am frequently presented with academic arrays for which no single annotation column of unique probe identifiers is provided. Instead several columns may be needed to identify the probe. People who haven't had this experience are fortunate. In general, the need to work with annotation as an associated data.frame is greater with "messier" microarray platforms such as academic two- colour cDNA arrays and with once-off custom platforms. Gordon > Date: Wed, 30 Apr 2008 10:10:21 -0700 > From: Martin Morgan <mtmorgan at="" fhcrc.org=""> > Subject: Re: [BioC] ExpressionSet or MAList > To: Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> > Cc: bioconductor at stat.math.ethz.ch > > Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> writes: > >> To my mind MAList stores the annotation with the dataset which I feel is > > Storing annotations with the object can be a bad thing if the > annotations are the same, because then there are effectively different > variants of the same annotation, one for each object. These will > inevitably drift apart, leading to confusion. There is also a memory > use issue. > > That said, annotations can be added to ExpressionSet, specifically > using featureData to store an AnnotatedDataFrame (data.frame + > annotation on column labels). > >> an advantage whereas ExpressionSet is the base implementation for many >> libraries.