Hi Vanessa -- Sounds like you want to 1) subset using character,
numeric, or logical vectors to select or reorder; 2) have some way to
access features as 'groups', e.g., because of duplicate probe set
names. I'd encourage you to think carefully about part 2, as
ExpressionSets are designed the way they are (unique featureNames)
because this is what makes most biological and statistical sense for
the type of data they are designed to represent.
Some details:
I think your second question is easier
> In another stage, when combining from different platforms with
> different genes, I would like to extract just the information for a
> specific gene probe list. Is this possible?
do you want
> library(Biobase)
> data(sample.ExpressionSet)
> sample.ExpressionSet
ExpressionSet (storageMode: lockedEnvironment)
assayData: 500 features, 26 samples
element names: exprs, se.exprs
phenoData
sampleNames: A, B, ..., Z (26 total)
varLabels and varMetadata description:
sex: Female/Male
type: Case/Control
score: Testing Score
featureData
featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at (500
total)
fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: hgu95av2
> sample.ExpressionSet[c("AFFX-MurIL2_at", "31739_at"),]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 2 features, 26 samples
element names: exprs, se.exprs
phenoData
sampleNames: A, B, ..., Z (26 total)
varLabels and varMetadata description:
sex: Female/Male
type: Case/Control
score: Testing Score
featureData
featureNames: AFFX-MurIL2_at, 31739_at
fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: hgu95av2
i.e., provide the vector of featureName as the first argument to
subset?
The first sounds more complicated, the following might get you
going, but proceed with some thought!
> I have created eSets from RGLists for cDNA microarrays. I would like
> to combine in the end data from several different platforms. As a
> special case, I would like to combine 2 eSets with the same gene
> probes, but in a different order on the array (so 2 different array
> platforms).
'combine' *might* help (see ?combine and class?eSet or ?"eSet-class")
You could subset one of the sets using indicies (i.e., featureNames)
of the other (this will reorder expression values to match the order
in the subset), and then manipulate.
> The IDs of my probes are not unique, so I cannot use them as
> FeatureNames...some have a duplicate in there (extension #2 after
> its name) and the control probes are not uniquely named
> e.g. luciferase (10 x). Is there a way to delete the duplicates or
> integrate their information in the original (taking the average)?
I think first you want to clarify what you're doing here, and whether
it has statistical & biological meaning.
You can leave featureNames unspecificed, and they will then be
provided for you. You might then add a column to featureData to keep
track of which probes map to which (non-unique) identifiers (though
how are you going to interpret multiple expresion values for the same
identiifer?). Subsetting by these features then becomes more awkward,
e.g.,
> obj <- sample.ExpressionSet
> featureData(obj)[["my_ids"]] <- paste("id", seq(1, nrow(obj)))
> qids=c("id 10", "id 100")
> idx <- featureData(obj)[["my_ids"]] %in% qids
> obj[idx,]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 2 features, 26 samples
element names: exprs, se.exprs
phenoData
sampleNames: A, B, ..., Z (26 total)
varLabels and varMetadata description:
sex: Female/Male
type: Case/Control
score: Testing Score
featureData
featureNames: AFFX-BioDn-5_at, 31339_at
fvarLabels and fvarMetadata description:
my_ids: NA
experimentData: use 'experimentData(object)'
Annotation: hgu95av2
These types of operations would allow you to average or do other
operations on feature names.
> How to delete the control probes? This would enable me to end up
> with unique IDs, so I could use them as feature names and then it is
> fairly easy to combine the two expression sets.
This is subsetting again, probably most easily done using a logical
index along the lines of
> not_ctrls <- !(featureData(obj)[["my_ids"]] %in% ctrl_ids)
> obj[not_ctrls,]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 498 features, 26 samples
element names: exprs, se.exprs
phenoData
sampleNames: A, B, ..., Z (26 total)
varLabels and varMetadata description:
sex: Female/Male
type: Case/Control
score: Testing Score
featureData
featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at (498
total)
fvarLabels and fvarMetadata description:
my_ids: NA
experimentData: use 'experimentData(object)'
Annotation: hgu95av2
You can use similar ideas with other R objects, including the RGList
of limma, and with basic structures like a matrix or data frame.
Hope that helps,
Martin
Vanessa Vermeirssen <vanessa.vermeirssen at="" psb.ugent.be=""> writes:
> Hi,
>
> How to delete the control probes?
> This would enable me to end
> up with unique IDs, so I could use them as feature names and then it
is
> fairly easy to combine the two expression sets.
>
> In another stage, when combining from different platforms with
different
> genes, I would like to extract just the information for a specific
gene
> probe list. Is this possible?
>
> I am new to Bioconductor, but learning a lot every day... I hope
that
> somebody can help me.
>
> Thanks so much already,
> Vanessa Vermeirssen
>
> --
> ==================================================================
> Vanessa Vermeirssen, PhD
>
> Tel:+32 (0)9 331 38 23 fax:+32 (0)9 3313809
> VIB Department of Plant Systems Biology, Ghent University
> Technologiepark 927, 9052 Gent, BELGIUM
> vamei at psb.ugent.be
http://www.psb.ugent.be
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org