help with estrogen dataset in factdesign package

0

Entering edit mode

Alberto Goldoni ▴ 420

@alberto-goldoni-3477

Last seen 10.6 years ago

Hello to everybody i'm writing this email because i need some explanation about the "estrogen" dataset analyzed in the "factDesign" package. I have to perform the same analysis on 8 sample (affychip): > pData(data.rma) > > ES TYPE > SHR-PUFA5.CEL PUFA SHR > SHR-PUFA6.CEL PUFA SHR > SHR-st7.CEL ST SHR > SHR-st8.CEL ST SHR > WK-PUFA3.CEL PUFA WK > WK-PUFA4.CEL PUFA WK > WK-st1.CEL ST WK > WK-st2.CEL ST WK > > data.rma ExpressionSet (storageMode: lockedEnvironment) assayData: 31099 features, 8 samples element names: exprs phenoData sampleNames: SHR-PUFA5.CEL, SHR-PUFA6.CEL, ..., WK-st2.CEL (8 total) varLabels and varMetadata description: sample: arbitrary numbering featureData featureNames: 1367452_at, 1367453_at, ..., AFFX-TrpnX-M_at (31099 total) fvarLabels and fvarMetadata description: none experimentData: use 'experimentData(object)' Annotation: rat2302 What i need to know is if i have to analyze all toghether: nomalization with rma, filtering with IQR and then i can perform factDesign technique or i have to threat the two group (1:4) and (5:8) separately and then to rebuild and exprset at the end. So my curiosity is to understand how the "estrogen" dataset has been analyzed in order to obtain the 500 genes listed in pData(estrogen). that all best regards -- ----------------------------------------------------- Dr. Alberto Goldoni Bologna, Italy ----------------------------------------------------- [[alternative HTML version deleted]]

factDesign factDesign • 1.5k views

ADD COMMENT • link updated 15.8 years ago by rgentleman ★ 5.5k • written 15.8 years ago by Alberto Goldoni ▴ 420

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 9.9 years ago

United States

Hi Alberto, Alberto Goldoni wrote: > Hello to everybody > > i'm writing this email because i need some explanation about the "estrogen" > dataset analyzed in the "factDesign" package. > I have to perform the same analysis on 8 sample (affychip): > >> pData(data.rma) > >> ES TYPE >> SHR-PUFA5.CEL PUFA SHR >> SHR-PUFA6.CEL PUFA SHR >> SHR-st7.CEL ST SHR >> SHR-st8.CEL ST SHR >> WK-PUFA3.CEL PUFA WK >> WK-PUFA4.CEL PUFA WK >> WK-st1.CEL ST WK >> WK-st2.CEL ST WK >> > > >> data.rma > ExpressionSet (storageMode: lockedEnvironment) > assayData: 31099 features, 8 samples > element names: exprs > phenoData > sampleNames: SHR-PUFA5.CEL, SHR-PUFA6.CEL, ..., WK-st2.CEL (8 total) > varLabels and varMetadata description: > sample: arbitrary numbering > featureData > featureNames: 1367452_at, 1367453_at, ..., AFFX-TrpnX-M_at (31099 total) > fvarLabels and fvarMetadata description: none > experimentData: use 'experimentData(object)' > Annotation: rat2302 > > > What i need to know is if i have to analyze all toghether: nomalization with > rma, filtering with IQR and then i can perform factDesign technique or i > have to threat the two group (1:4) and (5:8) separately and then to rebuild > and exprset at the end. You *must* jointly normalize, and that is what we did. There is no such thing as an exprset anymore (they were deprecated a long time ago). > > So my curiosity is to understand how the "estrogen" dataset has been > analyzed in order to obtain the 500 genes listed in pData(estrogen). You seem very confused. pData accesses the phenotypic data. I have no idea where you are getting 500 genes from? Perhaps you have a script or something? Perhaps you are reading the vignette? If the vignette then you have access to all the code and can easily answer these questions. I think you will need to be more explicit about where you are getting 500 genes from (but I don't see how it has anything to do with pData(estrogen).) best wishes Robert > > that all > best regards > > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD COMMENT • link 15.8 years ago rgentleman ★ 5.5k

0

Entering edit mode

Dear Gentleman, i'm not confuded about my datasets and I have read all the documentation and all the "vignette" about factDesign, but i have found nothing at all. In the factDesign vignette there is only one example and the explanation about the dataset called "estrogen" contains gene expression levels for 500 genes from A ymetrix HGU95av2 chips for eight samples from a breast cancer cell line. In my experiment i have 8 samples in more or less the same 2 conditions: ES (ST and PUFA) and TYPE (WK and SHR) so i think (but you can correct me) factDesign is the right package to perform an analysis from a factorial designed microarray experiment. WK = normal rats SHR means for rats appeared frankly hypertensive at the beginning of the study PUFA = n-3 polyunsaturated fatty acids (PUFAs) ST= starndard rat with no dietary treatment So in total 8 samples... i know the arrays are a very small number but i'm not the experiment designer! I have only to analyze this dataset if it is possible from a statistical point of view. These are my dataset after normalizing with RMA all the samples. >> pData(data.rma) > >> ES TYPE >> SHR-PUFA5.CEL PUFA SHR >> SHR-PUFA6.CEL PUFA SHR >> SHR-st7.CEL ST SHR >> SHR-st8.CEL ST SHR >> WK-PUFA3.CEL PUFA WK >> WK-PUFA4.CEL PUFA WK >> WK-st1.CEL ST WK >> WK-st2.CEL ST WK So my question is if i have to filter these samples toghether (you can see data.rma above and then perform IQR ) or for example rma for all the samples together and then filter by IQR WK-st VS WK-PUFA and SHR-PUFA VS SHR- st separately. In the second step i can add what i obtain from the first group with the second in order to obtain only one list of genes. So i have read the results of the analysis of the full data set (12,625 probes, 32 samples) like are discussed in Scholtens, et al. Analyzing Factorial Designed Microarray Experiments. Journal of Multivari-ate Analysis where the expression estimates were calculated using the rma method after quantile normalization from the aff y package, but the paper doesn't explain how the technician has obtained the 500 genes. The microarray expert has obtained the "estrogen" dataset (500 genes, 8 samples) from 12,625 probes, 32 samples filtering all the samples togheter or adding many different dataset (by the function "combine" or something else) from different sub-groups? If i know the right procedure perhaps i can analyze my dataset in the right way. That's all. I hope to be clear now, and sorry for the inconvenience. 2009/6/10 Robert Gentleman <rgentlem@fhcrc.org> > Hi Alberto, > > > Alberto Goldoni wrote: > > Hello to everybody > > > > i'm writing this email because i need some explanation about the > "estrogen" > > dataset analyzed in the "factDesign" package. > > I have to perform the same analysis on 8 sample (affychip): > > > >> pData(data.rma) > > > >> ES TYPE > >> SHR-PUFA5.CEL PUFA SHR > >> SHR-PUFA6.CEL PUFA SHR > >> SHR-st7.CEL ST SHR > >> SHR-st8.CEL ST SHR > >> WK-PUFA3.CEL PUFA WK > >> WK-PUFA4.CEL PUFA WK > >> WK-st1.CEL ST WK > >> WK-st2.CEL ST WK > >> > > > > > >> data.rma > > ExpressionSet (storageMode: lockedEnvironment) > > assayData: 31099 features, 8 samples > > element names: exprs > > phenoData > > sampleNames: SHR-PUFA5.CEL, SHR-PUFA6.CEL, ..., WK-st2.CEL (8 total) > > varLabels and varMetadata description: > > sample: arbitrary numbering > > featureData > > featureNames: 1367452_at, 1367453_at, ..., AFFX-TrpnX-M_at (31099 > total) > > fvarLabels and fvarMetadata description: none > > experimentData: use 'experimentData(object)' > > Annotation: rat2302 > > > > > > What i need to know is if i have to analyze all toghether: nomalization > with > > rma, filtering with IQR and then i can perform factDesign technique or i > > have to threat the two group (1:4) and (5:8) separately and then to > rebuild > > and exprset at the end. > > You *must* jointly normalize, and that is what we did. > There is no such thing as an exprset anymore (they were deprecated a long > time ago). > > > > > So my curiosity is to understand how the "estrogen" dataset has been > > analyzed in order to obtain the 500 genes listed in pData(estrogen). > > You seem very confused. pData accesses the phenotypic data. I have no idea > where you are getting 500 genes from? Perhaps you have a script or > something? > Perhaps you are reading the vignette? If the vignette then you have access > to > all the code and can easily answer these questions. > I think you will need to be more explicit about where you are getting 500 > genes from (but I don't see how it has anything to do with > pData(estrogen).) > > best wishes > Robert > > > > > that all > > best regards > > > > > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem@fhcrc.org > -- ----------------------------------------------------- Dr. Alberto Goldoni Bologna, Italy ----------------------------------------------------- [[alternative HTML version deleted]]

ADD REPLY • link 15.8 years ago Alberto Goldoni ▴ 420

0

Entering edit mode

On Wed, Jun 10, 2009 at 2:27 AM, Alberto Goldoni<alberto.goldoni1975 at="" gmail.com=""> wrote: > Dear Gentleman, > > i'm not confuded about my datasets and I have read all the documentation and > all the "vignette" about factDesign, but i have found nothing at all. > > In the factDesign vignette there is only one example and the explanation > about the dataset called "estrogen" contains gene expression levels for 500 > genes from A ymetrix HGU95av2 chips for eight samples from a breast cancer > cell line. > It is true that the factDesign package includes a dataset called 'estrogen' that has 500 genes. I will go out on a limb and guess that the selection of 500 genes was made for illustrative purposes to avoid having a package that is too big and to avoid conflict with an impending publication at the time of package release. The details of selecting 500 genes for the illustrative dataset are not provided in the documentation I have seen. However, in the experimental data archive from Bioconductor, there is a _package_ called 'estrogen' that provides the CEL files underlying this dataset. From this you can get all 12625 probe sets, or can work at the probe level if you like. > In my experiment i have 8 samples in more or less the same 2 conditions: ES > (ST and PUFA) and TYPE (WK and SHR) so i think (but you can correct me) > factDesign is the right package to perform an analysis from a factorial > designed microarray experiment. > factDesign gives lots of relevant information in the vignette and has some software that will help do an effective analysis. it is not _the_ 'right' package for this, though, because other linear modeling packages could be used in the same way. > WK = normal rats > SHR means for rats appeared frankly hypertensive at the beginning of the > study > PUFA = n-3 polyunsaturated fatty acids (PUFAs) > ST= starndard rat with no dietary treatment > > So in total 8 samples... i know the arrays are a very small number but i'm > not the experiment designer! I have only to analyze this dataset if it is > possible from a statistical point of view. > > These are my dataset after normalizing with RMA all the samples. >>> pData(data.rma) >> >>> ? ES ? ? ? ? ? ? ?TYPE >>> SHR-PUFA5.CEL ? ? PUFA ? ? ? ? ?SHR >>> SHR-PUFA6.CEL ? ? PUFA ? ? ? ? ?SHR >>> SHR-st7.CEL ? ? ? ? ? ST ? ? ? ? ? ? ? SHR >>> SHR-st8.CEL ? ? ? ? ? ST ? ? ? ? ? ? ? SHR >>> WK-PUFA3.CEL ? ? ?PUFA ? ? ? ? ? WK >>> WK-PUFA4.CEL ? ? ?PUFA ? ? ? ? ? WK >>> WK-st1.CEL ? ? ? ? ? ?ST ? ? ? ? ? ? ? ?WK >>> WK-st2.CEL ? ? ? ? ? ?ST ? ? ? ? ? ? ? ?WK > > > So my question is if i have to filter these samples toghether (you can see > data.rma above and then perform IQR ) or for example rma for all the samples > together and then filter by IQR WK-st VS WK-PUFA and ?SHR-PUFA VS SHR-st > separately. In the second step i can add what i obtain from the first group > with the second in order to obtain only one list of genes. In the paragraph above it is hard to understand what you are talking about. In the first phrase you talk about filtering samples (possibly using IQR) but IQR is used in some cases to filter _genes_ nonspecifically. Then you mention rma -- so perhaps you are talking about preprocessing. > > So i have read the results of the analysis of the full data set (12,625 > probes, 32 samples) like are discussed in Scholtens, et al. Analyzing > Factorial Designed Microarray Experiments. Journal of Multivari-ate Analysis > where the expression estimates were calculated using the rma method after > quantile normalization from the aff y package, but the paper doesn't explain > how the technician has obtained the 500 genes. > The microarray expert has obtained the "estrogen" dataset (500 genes, 8 > samples) from 12,625 probes, 32 samples filtering all the samples togheter > or adding many different dataset (by the function "combine" or something > else) from different sub-groups? Whatever was done with the estrogen CEL files doesn't have much connection to the detailed conduct of the factorial analysis. Preprocessing steps are undertaken in an attempt to remove nonbiologic sources of variation from our expression data. If you read the vignette from the estrogen package in the experimental data archive, you will see that expresso with vsn was employed to preprocess. I don't know if anyone has looked at the impact of preprocessing method on inference for this dataset, but the vignette proposes some investigation of this question. > > If i know the right procedure perhaps i can analyze my dataset in the right > way. There is no _right_ way -- the best you can do is make informed choices that are defensible in scientific arguments. The documentation of the packages mentioned can help you to make an informed choice -- but there are evidently some gaps. Your questions about filtering have some basis because you are curious about the selection of the 500 genes that are in the factDesign estrogen data object, but I believe the selection of 500 is immaterial to the statistical analysis -- it was probably mostly for convenience. Although the choice of 500 may have had some other motivation, it has nothing to do with how you should analyze your data. > > That's ?all. > > I hope to be clear now, and sorry for the inconvenience. > > > 2009/6/10 Robert Gentleman <rgentlem at="" fhcrc.org=""> > >> Hi Alberto, >> >> >> Alberto Goldoni wrote: >> > Hello to everybody >> > >> > i'm writing this email because i need some explanation about the >> "estrogen" >> > dataset analyzed in the "factDesign" package. >> > I have to perform the same analysis on 8 sample (affychip): >> > >> >> pData(data.rma) >> > >> >> ? ES ? ? ? ? ? ? ?TYPE >> >> SHR-PUFA5.CEL ? ? PUFA ? ? ? ? ?SHR >> >> SHR-PUFA6.CEL ? ? PUFA ? ? ? ? ?SHR >> >> SHR-st7.CEL ? ? ? ? ? ST ? ? ? ? ? ? ? SHR >> >> SHR-st8.CEL ? ? ? ? ? ST ? ? ? ? ? ? ? SHR >> >> WK-PUFA3.CEL ? ? ?PUFA ? ? ? ? ? WK >> >> WK-PUFA4.CEL ? ? ?PUFA ? ? ? ? ? WK >> >> WK-st1.CEL ? ? ? ? ? ?ST ? ? ? ? ? ? ? ?WK >> >> WK-st2.CEL ? ? ? ? ? ?ST ? ? ? ? ? ? ? ?WK >> >> >> > >> > >> >> data.rma >> > ExpressionSet (storageMode: lockedEnvironment) >> > assayData: 31099 features, 8 samples >> > ? element names: exprs >> > phenoData >> > ? sampleNames: SHR-PUFA5.CEL, SHR-PUFA6.CEL, ..., WK-st2.CEL ?(8 total) >> > ? varLabels and varMetadata description: >> > ? ? sample: arbitrary numbering >> > featureData >> > ? featureNames: 1367452_at, 1367453_at, ..., AFFX-TrpnX-M_at ?(31099 >> total) >> > ? fvarLabels and fvarMetadata description: none >> > experimentData: use 'experimentData(object)' >> > Annotation: rat2302 >> > >> > >> > What i need to know is if i have to analyze all toghether: nomalization >> with >> > rma, filtering with IQR and then i can perform factDesign technique or i >> > have to threat the two group (1:4) and (5:8) separately and then to >> rebuild >> > and exprset at the end. >> >> ?You *must* jointly normalize, and that is what we did. >> There is no such thing as an exprset anymore (they were deprecated a long >> time ago). >> >> > >> > So my curiosity is to understand how the "estrogen" dataset has been >> > analyzed in order to obtain the 500 genes listed in pData(estrogen). >> >> ?You seem very confused. pData accesses the phenotypic data. I have no idea >> where you are getting 500 genes from? Perhaps you have a script or >> something? >> Perhaps you are reading the vignette? If the vignette then you have access >> to >> all the code and can easily answer these questions. >> ?I think you will need to be more explicit about where you are getting 500 >> genes from (but I don't see how it has anything to do with >> pData(estrogen).) >> >> ?best wishes >> ? Robert >> >> > >> > that all >> > best regards >> > >> > >> >> -- >> Robert Gentleman, PhD >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> PO Box 19024 >> Seattle, Washington 98109-1024 >> 206-667-7700 >> rgentlem at fhcrc.org >> > > > > -- > ----------------------------------------------------- > Dr. Alberto Goldoni > Bologna, Italy > ----------------------------------------------------- > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Vincent Carey, PhD Biostatistics, Channing Lab 617 525 2265

ADD REPLY • link 15.8 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Having successfully loaded the MAQC data and so forth required for this example: > sessionInfo() R version 2.8.1 (2008-12-22) powerpc-apple-darwin8.11.1 locale: C attached base packages: [1] splines tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] RColorBrewer_1.0-2 maqcExpression4plex_1.2 oligo_1.6.0 [4] oligoClasses_1.4.0 affxparser_1.14.2 AnnotationDbi_1.4.3 [7] preprocessCore_1.4.0 RSQLite_0.7-1 DBI_0.2-4 [10] Biobase_2.2.2 loaded via a namespace (and not attached): [1] tcltk_2.8.1 I am unable to see the MAQC data required to proceed: > list.xysfiles(full.names=TRUE) character(0) Any insight appreciated. Tom

ADD REPLY • link 15.8 years ago Thomas Hampton ▴ 750

Login before adding your answer.