Hello to everybody
i'm writing this email because i need some explanation about the
"estrogen"
dataset analyzed in the "factDesign" package.
I have to perform the same analysis on 8 sample (affychip):
> pData(data.rma)
>
> ES TYPE
> SHR-PUFA5.CEL PUFA SHR
> SHR-PUFA6.CEL PUFA SHR
> SHR-st7.CEL ST SHR
> SHR-st8.CEL ST SHR
> WK-PUFA3.CEL PUFA WK
> WK-PUFA4.CEL PUFA WK
> WK-st1.CEL ST WK
> WK-st2.CEL ST WK
>
> data.rma
ExpressionSet (storageMode: lockedEnvironment)
assayData: 31099 features, 8 samples
element names: exprs
phenoData
sampleNames: SHR-PUFA5.CEL, SHR-PUFA6.CEL, ..., WK-st2.CEL (8
total)
varLabels and varMetadata description:
sample: arbitrary numbering
featureData
featureNames: 1367452_at, 1367453_at, ..., AFFX-TrpnX-M_at (31099
total)
fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: rat2302
What i need to know is if i have to analyze all toghether:
nomalization with
rma, filtering with IQR and then i can perform factDesign technique or
i
have to threat the two group (1:4) and (5:8) separately and then to
rebuild
and exprset at the end.
So my curiosity is to understand how the "estrogen" dataset has been
analyzed in order to obtain the 500 genes listed in pData(estrogen).
that all
best regards
--
-----------------------------------------------------
Dr. Alberto Goldoni
Bologna, Italy
-----------------------------------------------------
[[alternative HTML version deleted]]
Hi Alberto,
Alberto Goldoni wrote:
> Hello to everybody
>
> i'm writing this email because i need some explanation about the
"estrogen"
> dataset analyzed in the "factDesign" package.
> I have to perform the same analysis on 8 sample (affychip):
>
>> pData(data.rma)
>
>> ES TYPE
>> SHR-PUFA5.CEL PUFA SHR
>> SHR-PUFA6.CEL PUFA SHR
>> SHR-st7.CEL ST SHR
>> SHR-st8.CEL ST SHR
>> WK-PUFA3.CEL PUFA WK
>> WK-PUFA4.CEL PUFA WK
>> WK-st1.CEL ST WK
>> WK-st2.CEL ST WK
>>
>
>
>> data.rma
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 31099 features, 8 samples
> element names: exprs
> phenoData
> sampleNames: SHR-PUFA5.CEL, SHR-PUFA6.CEL, ..., WK-st2.CEL (8
total)
> varLabels and varMetadata description:
> sample: arbitrary numbering
> featureData
> featureNames: 1367452_at, 1367453_at, ..., AFFX-TrpnX-M_at (31099
total)
> fvarLabels and fvarMetadata description: none
> experimentData: use 'experimentData(object)'
> Annotation: rat2302
>
>
> What i need to know is if i have to analyze all toghether:
nomalization with
> rma, filtering with IQR and then i can perform factDesign technique
or i
> have to threat the two group (1:4) and (5:8) separately and then to
rebuild
> and exprset at the end.
You *must* jointly normalize, and that is what we did.
There is no such thing as an exprset anymore (they were deprecated a
long time ago).
>
> So my curiosity is to understand how the "estrogen" dataset has been
> analyzed in order to obtain the 500 genes listed in pData(estrogen).
You seem very confused. pData accesses the phenotypic data. I have no
idea
where you are getting 500 genes from? Perhaps you have a script or
something?
Perhaps you are reading the vignette? If the vignette then you have
access to
all the code and can easily answer these questions.
I think you will need to be more explicit about where you are
getting 500
genes from (but I don't see how it has anything to do with
pData(estrogen).)
best wishes
Robert
>
> that all
> best regards
>
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Dear Gentleman,
i'm not confuded about my datasets and I have read all the
documentation and
all the "vignette" about factDesign, but i have found nothing at all.
In the factDesign vignette there is only one example and the
explanation
about the dataset called "estrogen" contains gene expression levels
for 500
genes from A
ymetrix HGU95av2 chips for eight samples from a breast cancer
cell line.
In my experiment i have 8 samples in more or less the same 2
conditions: ES
(ST and PUFA) and TYPE (WK and SHR) so i think (but you can correct
me)
factDesign is the right package to perform an analysis from a
factorial
designed microarray experiment.
WK = normal rats
SHR means for rats appeared frankly hypertensive at the beginning of
the
study
PUFA = n-3 polyunsaturated fatty acids (PUFAs)
ST= starndard rat with no dietary treatment
So in total 8 samples... i know the arrays are a very small number but
i'm
not the experiment designer! I have only to analyze this dataset if it
is
possible from a statistical point of view.
These are my dataset after normalizing with RMA all the samples.
>> pData(data.rma)
>
>> ES TYPE
>> SHR-PUFA5.CEL PUFA SHR
>> SHR-PUFA6.CEL PUFA SHR
>> SHR-st7.CEL ST SHR
>> SHR-st8.CEL ST SHR
>> WK-PUFA3.CEL PUFA WK
>> WK-PUFA4.CEL PUFA WK
>> WK-st1.CEL ST WK
>> WK-st2.CEL ST WK
So my question is if i have to filter these samples toghether (you can
see
data.rma above and then perform IQR ) or for example rma for all the
samples
together and then filter by IQR WK-st VS WK-PUFA and SHR-PUFA VS SHR-
st
separately. In the second step i can add what i obtain from the first
group
with the second in order to obtain only one list of genes.
So i have read the results of the analysis of the full data set
(12,625
probes, 32 samples) like are discussed in Scholtens, et al. Analyzing
Factorial Designed Microarray Experiments. Journal of Multivari-ate
Analysis
where the expression estimates were calculated using the rma method
after
quantile normalization from the aff
y package, but the paper doesn't explain
how the technician has obtained the 500 genes.
The microarray expert has obtained the "estrogen" dataset (500 genes,
8
samples) from 12,625 probes, 32 samples filtering all the samples
togheter
or adding many different dataset (by the function "combine" or
something
else) from different sub-groups?
If i know the right procedure perhaps i can analyze my dataset in the
right
way.
That's all.
I hope to be clear now, and sorry for the inconvenience.
2009/6/10 Robert Gentleman <rgentlem@fhcrc.org>
> Hi Alberto,
>
>
> Alberto Goldoni wrote:
> > Hello to everybody
> >
> > i'm writing this email because i need some explanation about the
> "estrogen"
> > dataset analyzed in the "factDesign" package.
> > I have to perform the same analysis on 8 sample (affychip):
> >
> >> pData(data.rma)
> >
> >> ES TYPE
> >> SHR-PUFA5.CEL PUFA SHR
> >> SHR-PUFA6.CEL PUFA SHR
> >> SHR-st7.CEL ST SHR
> >> SHR-st8.CEL ST SHR
> >> WK-PUFA3.CEL PUFA WK
> >> WK-PUFA4.CEL PUFA WK
> >> WK-st1.CEL ST WK
> >> WK-st2.CEL ST WK
> >>
> >
> >
> >> data.rma
> > ExpressionSet (storageMode: lockedEnvironment)
> > assayData: 31099 features, 8 samples
> > element names: exprs
> > phenoData
> > sampleNames: SHR-PUFA5.CEL, SHR-PUFA6.CEL, ..., WK-st2.CEL (8
total)
> > varLabels and varMetadata description:
> > sample: arbitrary numbering
> > featureData
> > featureNames: 1367452_at, 1367453_at, ..., AFFX-TrpnX-M_at
(31099
> total)
> > fvarLabels and fvarMetadata description: none
> > experimentData: use 'experimentData(object)'
> > Annotation: rat2302
> >
> >
> > What i need to know is if i have to analyze all toghether:
nomalization
> with
> > rma, filtering with IQR and then i can perform factDesign
technique or i
> > have to threat the two group (1:4) and (5:8) separately and then
to
> rebuild
> > and exprset at the end.
>
> You *must* jointly normalize, and that is what we did.
> There is no such thing as an exprset anymore (they were deprecated a
long
> time ago).
>
> >
> > So my curiosity is to understand how the "estrogen" dataset has
been
> > analyzed in order to obtain the 500 genes listed in
pData(estrogen).
>
> You seem very confused. pData accesses the phenotypic data. I have
no idea
> where you are getting 500 genes from? Perhaps you have a script or
> something?
> Perhaps you are reading the vignette? If the vignette then you have
access
> to
> all the code and can easily answer these questions.
> I think you will need to be more explicit about where you are
getting 500
> genes from (but I don't see how it has anything to do with
> pData(estrogen).)
>
> best wishes
> Robert
>
> >
> > that all
> > best regards
> >
> >
>
> --
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem@fhcrc.org
>
--
-----------------------------------------------------
Dr. Alberto Goldoni
Bologna, Italy
-----------------------------------------------------
[[alternative HTML version deleted]]
On Wed, Jun 10, 2009 at 2:27 AM, Alberto
Goldoni<alberto.goldoni1975 at="" gmail.com=""> wrote:
> Dear Gentleman,
>
> i'm not confuded about my datasets and I have read all the
documentation and
> all the "vignette" about factDesign, but i have found nothing at
all.
>
> In the factDesign vignette there is only one example and the
explanation
> about the dataset called "estrogen" contains gene expression levels
for 500
> genes from A ymetrix HGU95av2 chips for eight samples from a breast
cancer
> cell line.
>
It is true that the factDesign package includes a dataset called
'estrogen'
that has 500 genes. I will go out on a limb and guess that the
selection of 500 genes
was made for illustrative purposes to avoid having a package that is
too big and
to avoid conflict with an impending publication at the time of package
release.
The details of selecting 500 genes for the illustrative dataset are
not provided in
the documentation I have seen. However, in the experimental data
archive from
Bioconductor, there is a _package_ called 'estrogen' that provides the
CEL files underlying this dataset. From this you can get all 12625
probe sets, or
can work at the probe level if you like.
> In my experiment i have 8 samples in more or less the same 2
conditions: ES
> (ST and PUFA) and TYPE (WK and SHR) so i think (but you can correct
me)
> factDesign is the right package to perform an analysis from a
factorial
> designed microarray experiment.
>
factDesign gives lots of relevant information in the vignette and has
some software
that will help do an effective analysis. it is not _the_ 'right'
package for this, though,
because other linear modeling packages could be used in the same way.
> WK = normal rats
> SHR means for rats appeared frankly hypertensive at the beginning of
the
> study
> PUFA = n-3 polyunsaturated fatty acids (PUFAs)
> ST= starndard rat with no dietary treatment
>
> So in total 8 samples... i know the arrays are a very small number
but i'm
> not the experiment designer! I have only to analyze this dataset if
it is
> possible from a statistical point of view.
>
> These are my dataset after normalizing with RMA all the samples.
>>> pData(data.rma)
>>
>>> ? ES ? ? ? ? ? ? ?TYPE
>>> SHR-PUFA5.CEL ? ? PUFA ? ? ? ? ?SHR
>>> SHR-PUFA6.CEL ? ? PUFA ? ? ? ? ?SHR
>>> SHR-st7.CEL ? ? ? ? ? ST ? ? ? ? ? ? ? SHR
>>> SHR-st8.CEL ? ? ? ? ? ST ? ? ? ? ? ? ? SHR
>>> WK-PUFA3.CEL ? ? ?PUFA ? ? ? ? ? WK
>>> WK-PUFA4.CEL ? ? ?PUFA ? ? ? ? ? WK
>>> WK-st1.CEL ? ? ? ? ? ?ST ? ? ? ? ? ? ? ?WK
>>> WK-st2.CEL ? ? ? ? ? ?ST ? ? ? ? ? ? ? ?WK
>
>
> So my question is if i have to filter these samples toghether (you
can see
> data.rma above and then perform IQR ) or for example rma for all the
samples
> together and then filter by IQR WK-st VS WK-PUFA and ?SHR-PUFA VS
SHR-st
> separately. In the second step i can add what i obtain from the
first group
> with the second in order to obtain only one list of genes.
In the paragraph above it is hard to understand what you are talking
about.
In the first phrase you talk about filtering samples (possibly using
IQR)
but IQR is used in some cases to filter _genes_ nonspecifically. Then
you mention
rma -- so perhaps you are talking about preprocessing.
>
> So i have read the results of the analysis of the full data set
(12,625
> probes, 32 samples) like are discussed in Scholtens, et al.
Analyzing
> Factorial Designed Microarray Experiments. Journal of Multivari-ate
Analysis
> where the expression estimates were calculated using the rma method
after
> quantile normalization from the aff y package, but the paper doesn't
explain
> how the technician has obtained the 500 genes.
> The microarray expert has obtained the "estrogen" dataset (500
genes, 8
> samples) from 12,625 probes, 32 samples filtering all the samples
togheter
> or adding many different dataset (by the function "combine" or
something
> else) from different sub-groups?
Whatever was done with the estrogen CEL files doesn't have much
connection
to the detailed conduct of the factorial analysis. Preprocessing
steps are undertaken in
an attempt to remove nonbiologic sources of variation from our
expression data.
If you read the vignette from the estrogen package in the experimental
data
archive, you will see that expresso with vsn was employed to
preprocess. I don't
know if anyone has looked at the impact of preprocessing method on
inference for
this dataset, but the vignette proposes some investigation of this
question.
>
> If i know the right procedure perhaps i can analyze my dataset in
the right
> way.
There is no _right_ way -- the best you can do is make informed
choices that are defensible
in scientific arguments. The documentation of the packages mentioned
can help you
to make an informed choice -- but there are evidently some gaps. Your
questions about
filtering have some basis because you are curious about the selection
of the 500 genes
that are in the factDesign estrogen data object, but I believe the
selection of 500 is
immaterial to the statistical analysis -- it was probably mostly for
convenience. Although the
choice of 500 may have had some other motivation, it has nothing to do
with how you should
analyze your data.
>
> That's ?all.
>
> I hope to be clear now, and sorry for the inconvenience.
>
>
> 2009/6/10 Robert Gentleman <rgentlem at="" fhcrc.org="">
>
>> Hi Alberto,
>>
>>
>> Alberto Goldoni wrote:
>> > Hello to everybody
>> >
>> > i'm writing this email because i need some explanation about the
>> "estrogen"
>> > dataset analyzed in the "factDesign" package.
>> > I have to perform the same analysis on 8 sample (affychip):
>> >
>> >> pData(data.rma)
>> >
>> >> ? ES ? ? ? ? ? ? ?TYPE
>> >> SHR-PUFA5.CEL ? ? PUFA ? ? ? ? ?SHR
>> >> SHR-PUFA6.CEL ? ? PUFA ? ? ? ? ?SHR
>> >> SHR-st7.CEL ? ? ? ? ? ST ? ? ? ? ? ? ? SHR
>> >> SHR-st8.CEL ? ? ? ? ? ST ? ? ? ? ? ? ? SHR
>> >> WK-PUFA3.CEL ? ? ?PUFA ? ? ? ? ? WK
>> >> WK-PUFA4.CEL ? ? ?PUFA ? ? ? ? ? WK
>> >> WK-st1.CEL ? ? ? ? ? ?ST ? ? ? ? ? ? ? ?WK
>> >> WK-st2.CEL ? ? ? ? ? ?ST ? ? ? ? ? ? ? ?WK
>> >>
>> >
>> >
>> >> data.rma
>> > ExpressionSet (storageMode: lockedEnvironment)
>> > assayData: 31099 features, 8 samples
>> > ? element names: exprs
>> > phenoData
>> > ? sampleNames: SHR-PUFA5.CEL, SHR-PUFA6.CEL, ..., WK-st2.CEL ?(8
total)
>> > ? varLabels and varMetadata description:
>> > ? ? sample: arbitrary numbering
>> > featureData
>> > ? featureNames: 1367452_at, 1367453_at, ..., AFFX-TrpnX-M_at
?(31099
>> total)
>> > ? fvarLabels and fvarMetadata description: none
>> > experimentData: use 'experimentData(object)'
>> > Annotation: rat2302
>> >
>> >
>> > What i need to know is if i have to analyze all toghether:
nomalization
>> with
>> > rma, filtering with IQR and then i can perform factDesign
technique or i
>> > have to threat the two group (1:4) and (5:8) separately and then
to
>> rebuild
>> > and exprset at the end.
>>
>> ?You *must* jointly normalize, and that is what we did.
>> There is no such thing as an exprset anymore (they were deprecated
a long
>> time ago).
>>
>> >
>> > So my curiosity is to understand how the "estrogen" dataset has
been
>> > analyzed in order to obtain the 500 genes listed in
pData(estrogen).
>>
>> ?You seem very confused. pData accesses the phenotypic data. I have
no idea
>> where you are getting 500 genes from? Perhaps you have a script or
>> something?
>> Perhaps you are reading the vignette? If the vignette then you have
access
>> to
>> all the code and can easily answer these questions.
>> ?I think you will need to be more explicit about where you are
getting 500
>> genes from (but I don't see how it has anything to do with
>> pData(estrogen).)
>>
>> ?best wishes
>> ? Robert
>>
>> >
>> > that all
>> > best regards
>> >
>> >
>>
>> --
>> Robert Gentleman, PhD
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> PO Box 19024
>> Seattle, Washington 98109-1024
>> 206-667-7700
>> rgentlem at fhcrc.org
>>
>
>
>
> --
> -----------------------------------------------------
> Dr. Alberto Goldoni
> Bologna, Italy
> -----------------------------------------------------
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Vincent Carey, PhD
Biostatistics, Channing Lab
617 525 2265
Having successfully loaded the MAQC data and so forth required for
this example:
> sessionInfo()
R version 2.8.1 (2008-12-22)
powerpc-apple-darwin8.11.1
locale:
C
attached base packages:
[1] splines tools stats graphics grDevices utils
datasets
[8] methods base
other attached packages:
[1] RColorBrewer_1.0-2 maqcExpression4plex_1.2 oligo_1.6.0
[4] oligoClasses_1.4.0 affxparser_1.14.2
AnnotationDbi_1.4.3
[7] preprocessCore_1.4.0 RSQLite_0.7-1 DBI_0.2-4
[10] Biobase_2.2.2
loaded via a namespace (and not attached):
[1] tcltk_2.8.1
I am unable to see the MAQC data required to proceed:
> list.xysfiles(full.names=TRUE)
character(0)
Any insight appreciated.
Tom