Entrez Gene ID to Probe Set Name

Entering edit mode

McGee, Monnie ▴ 300

@mcgee-monnie-1108

Last seen 10.5 years ago

Here is the previous query with a more descriptive subject. -----Original Message----- From: McGee, Monnie Sent: Thu 10/23/2008 11:14 AM To: bioconductor at stat.math.ethz.ch Subject: RE: Bioconductor Digest, Vol 68, Issue 23 Dear List, Is there an elegant way to obtain the name of a probe set from an Affymetrix platform (doesn't matter which one) corresponding to a given ENTREZ gene ID? It seems that it is fairly simple to obtain the entrez ID if you have a probe set, but the reverse problem seems non- trival -at least it is to me. There's no particular reason I need to know. I just want to know if it's possible. Thanks! Monnie Monnie McGee, Ph.D. Associate Professor Department of Statistical Science Southern Methodist University Ph: 214-768-2462 Fax: 214-768-4035 -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch on behalf of bioconductor-request@stat.math.ethz.ch Sent: Thu 10/23/2008 5:00 AM To: bioconductor at stat.math.ethz.ch Subject: Bioconductor Digest, Vol 68, Issue 23 Send Bioconductor mailing list submissions to bioconductor at stat.math.ethz.ch To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/bioconductor or, via email, send a message with subject or body 'help' to bioconductor-request at stat.math.ethz.ch You can reach the person managing the list at bioconductor-owner at stat.math.ethz.ch When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioconductor digest..." Today's Topics: 1. GOstat: listing genes from hyperGTest (Tim Smith) 2. export toptables into Genespring (Pemmasani, Kalyani) 3. Re: Limma contrasts question (James W. MacDonald) 4. Re: GOstat: listing genes from hyperGTest (James W. MacDonald) 5. Re: Limma contrasts question (Daniel Brewer) 6. quality assessment and preprocessing for tiling array-based CGH data (Leon Yee) 7. GOstats and org.EcK12.eg.db (Robert Castelo) 8. Re: quality assessment and preprocessing for tiling array-based CGH data (Sean Davis) 9. Re: GOstat: listing genes from hyperGTest (Tim Smith) 10. Re: quality assessment and preprocessing for tiling array-based CGH data (Leon Yee) 11. Re: Beadarray and illumina methylation arrays (Mark Dunning) 12. Re: quality assessment and preprocessing for tiling array-based CGH data (Sean Davis) 13. Problem using Rgraphviz (edge weights going missing). (Dan Bolser) 14. Re: newbie problems with AnnBuilder (Mark Kimpel) 15. Re: newbie problems with AnnBuilder (Sean Davis) 16. Re: newbie problems with AnnBuilder (Mark Kimpel) 17. Re: GOstats and org.EcK12.eg.db (Robert Gentleman) 18. Re: quality assessment and preprocessing for tiling array-based CGH data (Leon Yee) 19. Bioconductor installation problem: unable to access repository (Shinichiro Wachi) 20. Re: quality assessment and preprocessing for tiling array-based CGH data (Sean Davis) 21. Re: GOstat: listing genes from hyperGTest (James W. MacDonald) 22. Re: Bioconductor installation problem: unable to access repository (Patrick Aboyoun) 23. Bioconductor 2.3 is released (Patrick Aboyoun) 24. Re: How to save result from limma (Jenny Drnevich) 25. scale questions (Hui-Yi Chu) 26. Re: [Fwd: batch info for cellHTS] (Florian Hahne) 27. problem with Category package and custom annotationDbi (Mark Kimpel) 28. Re: problem with Category package and custom annotationDbi (Marc Carlson) 29. Re: scale questions (Sean Davis) 30. Re: scale questions (Sean Davis) 31. Re: problem with Category package and custom annotationDbi (Mark Kimpel) 32. Re: How to save result from limma (Gordon K Smyth) 33. Package "xps" "import.expr.scheme" error (Wei,Caimiao) 34. Re: Lumi and Beadstudio 1.5.13 (Leon Peshkin) 35. Offre exceptionnelle suite au probl?me technique (Clara de Dessous Ch?ri) ---------------------------------------------------------------------- Message: 1 Date: Wed, 22 Oct 2008 03:43:33 -0700 (PDT) From: Tim Smith <tim_smith_666@yahoo.com> Subject: [BioC] GOstat: listing genes from hyperGTest To: bioc <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <257981.79114.qm at web58005.mail.re3.yahoo.com> Content-Type: text/plain Hi, I was performing a hyperGTest for genes in homo-sapiens. For a set of input genes, this function returns some 'significant' GO terms. What I wanted to now do was to co-relate each significant GO term (returned by this function) with genes (from my set of input genes) associated with that GO term. However, I think that I may be using the wrong package/function to get the releveant set of genes. Currently, what I'm doing is finding the significant GO terms by using the following code: ----------------------- ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs paramsGO <- new("GOHyperGParams", geneIds = genes1, universeGeneIds = allGenes, annotation = "org.Hs.eg.db", ontology = "BP", pvalueCutoff = 1, conditional = FALSE, testDirection = "over") GO <- hyperGTest(paramsGO) -------------------------- This gives me a set of significant GO terms. Now, I would like to find which subset of genes in 'genes1' is associated with each of the significant GO term. To do this I map all GO terms to their Entrez IDs using the 'org.Hs.eg.db' package using the following: xx <- as.list(org.Hs.egGO2EG) to get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't this number small?) that map to at least one Entrez ID. So, from here I look up which Entrez IDs are associated with my GO term of interest. My problem is that often, the GO term from hyperGTest is not associated with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described above ), i.e. the GO term/ID is not in the list obtained from 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by hyperGTest, but does not appear to be associated with any Entrez IDs in the org.Hs.eg.db package. Where could I be going wrong? I would give a set of genes so that the example is reproducible, but [[elided Yahoo spam]] Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... My sessionInfo() is: -------------------------------- R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] grid splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 Category_2.6.0 [8] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 GO.db_2.2.0 KEGG.db_2.2.0 [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0 [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 loaded via a namespace (and not attached): [1] cluster_1.11.11 MASS_7.2-44 --------------------------------- [[alternative HTML version deleted]] ------------------------------ Message: 2 Date: Wed, 22 Oct 2008 12:34:38 +0100 From: "Pemmasani, Kalyani" <kalyani.pemmasani@nuigalway.ie> Subject: [BioC] export toptables into Genespring To: <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <6B017AD2AE2F6F489087FC986588136B88FA42 at EVS1.ac.nuigalway.ie> Content-Type: text/plain; charset="iso-8859-1" Hi all, Is there a way to export toptables from LIMMA into Genespring software program (from Agilent technologies) for clustering? Best regards, Kalyani ------------------------------------------- Kalyani Pemmasani Marie Curie research fellow National Diagnostics Centre National University of Ireland Galway, IRELAND e.mail: kalyani.pemmasani at nuigalway.ie Ph.no: +353(0)91492815 Fax: +353 (0) 91586570 ------------------------------ Message: 3 Date: Wed, 22 Oct 2008 09:07:16 -0400 From: "James W. MacDonald" <jmacdon@med.umich.edu> Subject: Re: [BioC] Limma contrasts question To: Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <48FF2584.5010509 at med.umich.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Daniel Brewer wrote: > Hi Jim, > > Could you go into the maths of the contrast formulas a bit? I would > like to get a really solid understanding of what it is doing for future > analyses. Once you understand what the coefficients are, the contrasts are just simple algebra. In your case, all of the coefficients are estimating the difference between the sample and PC3M (e.g., Knockdown - PC3M). So the algebra is something like this: 2(Knockdown - PC3M) - (Scramble - PC3M) = 2Knockdown - 2PC3M - Scramble + PC3M = 2Knockdown - Scramble - PC3M = Knockdown - (Scramble + PC3M)/2 Which is knockdown minus the mean of the controls. Note that this will be the numerator of the resulting t-statistic. The denominator will be sort of an average of the variability within each of the three groups being compared. So the question being answered is 'What genes are different in Knockdown as compared to the average of the controls?'. However, there is nothing here to test if the two controls are similar at all (and you might not care). So for instance, you might have a gene with average expression like this: Knockdown = 10 PC3M = 4 Scramble = 7 If the intra-group variability is small for each sample type, then you will likely get a significant t-statistic even though the two controls are probably significantly different as well. Which is why I mentioned earlier that you might want to test the Scramble - PC3M contrast as well. Best, Jim > > Many thanks > > Dan > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662 ------------------------------ Message: 4 Date: Wed, 22 Oct 2008 09:10:39 -0400 From: "James W. MacDonald" <jmacdon@med.umich.edu> Subject: Re: [BioC] GOstat: listing genes from hyperGTest Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <48FF264F.50404 at med.umich.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Tim, Does probeSetSummary() do what you want? Best, Jim Tim Smith wrote: > > Hi, > > I > was performing a hyperGTest for genes in homo-sapiens. For a set of > input genes, this function returns some 'significant' GO terms. What I > wanted to now do was to co-relate each significant GO term (returned by > this function) with genes (from my set of input genes) associated with > that GO term. However, I think that I may be using the wrong > package/function to get the releveant set of genes. > > Currently, what I'm doing is finding the significant GO terms by using the following code: > > ----------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > -------------------------- > This > gives me a set of significant GO terms. Now, I would like to find which > subset of genes in 'genes1' is associated with each of the significant > GO term. To do this I map all GO terms to their Entrez IDs using the > 'org.Hs.eg.db' package using the following: > > xx <- as.list(org.Hs.egGO2EG) > > to > get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't > this number small?) that map to at least one Entrez ID. So, from here I > look up which Entrez IDs are associated with my GO term of interest. > > My > problem is that often, the GO term from hyperGTest is not associated > with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described > above ), i.e. the GO term/ID is not in the list obtained from > 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by > hyperGTest, but does not appear to be associated with any Entrez IDs in > the org.Hs.eg.db package. Where could I be going wrong? > > I would give a set of genes so that the example is reproducible, but [[elided Yahoo spam]] > > Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... > > My sessionInfo() is: > -------------------------------- > R version 2.7.2 (2008-08-25) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United > States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] grid splines tools stats graphics grDevices utils datasets methods base > > other attached packages: > [1] > gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 > gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 > Category_2.6.0 > [8] RBGL_1.16.0 annotate_1.18.0 > xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 > GO.db_2.2.0 KEGG.db_2.2.0 > [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0 > [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 MASS_7.2-44 > > > --------------------------------- > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662 ------------------------------ Message: 5 Date: Wed, 22 Oct 2008 14:50:06 +0100 From: Daniel Brewer <daniel.brewer@icr.ac.uk> Subject: Re: [BioC] Limma contrasts question To: "James W. MacDonald" <jmacdon at="" med.umich.edu="">, bioconductor at stat.math.ethz.ch Message-ID: <48FF2F8E.4020208 at icr.ac.uk> Content-Type: text/plain; charset=ISO-8859-1 James W. MacDonald wrote: > Daniel Brewer wrote: > >> Hi Jim, >> >> Could you go into the maths of the contrast formulas a bit? I would >> like to get a really solid understanding of what it is doing for future >> analyses. > > Once you understand what the coefficients are, the contrasts are just > simple algebra. In your case, all of the coefficients are estimating the > difference between the sample and PC3M (e.g., Knockdown - PC3M). > > So the algebra is something like this: > > 2(Knockdown - PC3M) - (Scramble - PC3M) > = > 2Knockdown - 2PC3M - Scramble + PC3M > = > 2Knockdown - Scramble - PC3M > = > Knockdown - (Scramble + PC3M)/2 > > Which is knockdown minus the mean of the controls. > > Note that this will be the numerator of the resulting t-statistic. The > denominator will be sort of an average of the variability within each of > the three groups being compared. So the question being answered is 'What > genes are different in Knockdown as compared to the average of the > controls?'. However, there is nothing here to test if the two controls > are similar at all (and you might not care). > > So for instance, you might have a gene with average expression like this: > > Knockdown = 10 > PC3M = 4 > Scramble = 7 > > If the intra-group variability is small for each sample type, then you > will likely get a significant t-statistic even though the two controls > are probably significantly different as well. Which is why I mentioned > earlier that you might want to test the Scramble - PC3M contrast as well. > > Best, > > Jim > > >> >> Many thanks >> >> Dan >> > Thanks for that, that is brilliant and has clarified everything. I did a comparison of the controls and there were no significant genes found which is encouraging. Many thanks Dan -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Molecular Carcinogenesis Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}} ------------------------------ Message: 6 Date: Wed, 22 Oct 2008 21:51:24 +0800 From: Leon Yee <yee.leon@gmail.com> Subject: [BioC] quality assessment and preprocessing for tiling array-based CGH data To: bioconductor at stat.math.ethz.ch Message-ID: <48FF2FDC.6020203 at gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dear all, Is there any well-established routine for quality assessment and preprocessing of array CGH data, especially tiling array-based CGH data? I found most of the quality assessment of array data are about expression array, while few are related to array CGH data. We are using agilent 244k CGH array of rat, and now we have the text files produced by Feature Extraction, don't know whether they are of good quality. Could anyone help provide some clues? Thanks in advance! After read.maimage(), we got the RGlist object, which contain several components including R, G, Rb, Gb, and so on. The probes are of 3 types: -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding probe sequence]; 1 means positive control, i guess, and the probe names are like DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no corresponding probe sequence]. The number of -1 is 1275, while the number of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these values for quality assessment and normalization? How? In addition, in the normal probes, we have 1000 probes repeating 3 times in the array. How could we use these data for quality assessment and normalization? Regards, Leon ------------------------------ Message: 7 Date: Wed, 22 Oct 2008 15:48:22 +0200 From: Robert Castelo <robert.castelo@upf.edu> Subject: [BioC] GOstats and org.EcK12.eg.db To: bioconductor at stat.math.ethz.ch Message-ID: <1224683302.5889.35.camel at llull.imim.es> Content-Type: text/plain dear list, I cannot get to work GOstats with the annotation for E. coli in org.EcK12.eg.db. Please find below the code that reproduces the problem including the error message, and my sessionInfo() at the end of this email. I have included the same exercise with the human annotation package org.Hs.eg.db which runs fine in my system. Any help with this will be very much appreciated. thanks!! robert. ==========CODE STARTS HERE=========== library(org.Hs.eg.db) library(org.EcK12.eg.db) library(GOstats) geneuniverse <- mappedkeys(org.EcK12.egSYMBOL) set.seed(12345) geneset <- sample(geneuniverse, size=100, replace=FALSE) goHypGparams <- new("GOHyperGParams", geneIds=geneset, universeGeneIds=geneuniverse, annotation="org.EcK12.eg.db", ontology="BP", pvalueCutoff=1.0, conditional=TRUE, testDirection="over") goHypGcond <- hyperGTest(goHypGparams) Error in get(mapName, envir = pkgEnv, inherits = FALSE) : variable "org.EcK12.egENTREZID" was not found Error in mget(probes, ID2EntrezID(datPkg)) : error in evaluating the argument 'envir' in selecting a method for function 'mget' geneuniverse <- mappedkeys(org.Hs.egSYMBOL) set.seed(12345) geneset <- sample(geneuniverse, size=100, replace=FALSE) goHypGparams <- new("GOHyperGParams", geneIds=geneset, universeGeneIds=geneuniverse, annotation="org.Hs.eg.db", ontology="BP", pvalueCutoff=1.0, conditional=TRUE, testDirection="over") goHypGcond <- hyperGTest(goHypGparams) sessionInfo() R version 2.8.0 beta (2008-10-05 r46601) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] splines tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] org.Hs.eg.db_2.2.6 GOstats_2.7.0 Category_2.7.6 [4] genefilter_1.21.5 survival_2.34-1 RBGL_1.17.2 [7] annotate_1.19.2 xtable_1.5-4 GO.db_2.2.5 [10] graph_1.19.6 org.EcK12.eg.db_2.2.6 AnnotationDbi_1.3.12 [13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.1.7 loaded via a namespace (and not attached): [1] cluster_1.11.11 GSEABase_1.3.6 XML_1.98-1 ------------------------------ Message: 8 Date: Wed, 22 Oct 2008 10:02:35 -0400 From: "Sean Davis" <sdavis2@mail.nih.gov> Subject: Re: [BioC] quality assessment and preprocessing for tiling array-based CGH data To: "Leon Yee" <yee.leon at="" gmail.com=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <264855a00810220702m3ee17381y221a8e5ec18ee6ff at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at="" gmail.com=""> wrote: > Dear all, > > Is there any well-established routine for quality assessment and > preprocessing of array CGH data, especially tiling array-based CGH data? I > found most of the quality assessment of array data are about expression > array, while few are related to array CGH data. > We are using agilent 244k CGH array of rat, and now we have the text > files produced by Feature Extraction, don't know whether they are of good [[elided Yahoo spam]] > > After read.maimage(), we got the RGlist object, which contain several > components including R, G, Rb, Gb, and so on. The probes are of 3 types: > -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and > the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding probe > sequence]; 1 means positive control, i guess, and the probe names are like > DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no > corresponding probe sequence]. The number of -1 is 1275, while the number > of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these > values for quality assessment and normalization? How? > In addition, in the normal probes, we have 1000 probes repeating 3 times > in the array. How could we use these data for quality assessment and > normalization? You generally will not want to do any normalization besides a possible shift of the center. Any linear normalization that affects the slope of the M vs. A plot or nonlinear normalization will likely decrease signal. As for quality control, a good, general measure to track is the dlrs, a robust measure of the standard deviation. dlrs <- function(x) { nx <- length(x) if (nx<3) { stop("Vector length>2 needed for computation") } tmp <- embed(x,2) diffs <- tmp[,2]-tmp[,1] dlrs <- IQR(diffs)/(sqrt(2)*1.34) return(dlrs) } For agilent arrays, most of the dlrs should be around or under 0.2, generally. However, this might vary a bit based on lab-to-lab variation. In any case, if there is a significant outlier, that is suspect. The input to the above function is the log ratios for a single array arranged in chromosome and position order. Sean ------------------------------ Message: 9 Date: Wed, 22 Oct 2008 07:24:50 -0700 (PDT) Subject: Re: [BioC] GOstat: listing genes from hyperGTest To: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <418040.17653.qm at web58004.mail.re3.yahoo.com> Content-Type: text/plain Thanks James. If I can tweak that function, I'll get exactly what I want. I tried what you suggested and got the following error: --------------------------- ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs paramsGO <- new("GOHyperGParams", geneIds = genes1, universeGeneIds = allGenes, annotation = "org.Hs.eg.db", ontology = "BP", pvalueCutoff = 1, conditional = FALSE, testDirection = "over") GO <- hyperGTest(paramsGO) ps <- probeSetSummary(GO) Error in get(mapName, envir = pkgEnv, inherits = FALSE) : variable "org.Hs.egENTREZID" was not found -------------------------------- I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input. Or am I doing something wrong? thanks! ----- Original Message ---- From: James W. MacDonald <jmacdon@med.umich.edu> Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> Sent: Wednesday, October 22, 2008 9:10:39 AM Subject: Re: [BioC] GOstat: listing genes from hyperGTest Hi Tim, Does probeSetSummary() do what you want? Best, Jim Tim Smith wrote: > > Hi, > > I > was performing a hyperGTest for genes in homo-sapiens. For a set of > input genes, this function returns some 'significant' GO terms. What I > wanted to now do was to co-relate each significant GO term (returned by > this function) with genes (from my set of input genes) associated with > that GO term. However, I think that I may be using the wrong > package/function to get the releveant set of genes. > > Currently, what I'm doing is finding the significant GO terms by using the following code: > > ----------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > -------------------------- > This > gives me a set of significant GO terms. Now, I would like to find which > subset of genes in 'genes1' is associated with each of the significant > GO term. To do this I map all GO terms to their Entrez IDs using the > 'org.Hs.eg.db' package using the following: > > xx <- as.list(org.Hs.egGO2EG) > > to > get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't > this number small?) that map to at least one Entrez ID. So, from here I > look up which Entrez IDs are associated with my GO term of interest. > > My > problem is that often, the GO term from hyperGTest is not associated > with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described > above ), i.e. the GO term/ID is not in the list obtained from > 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by > hyperGTest, but does not appear to be associated with any Entrez IDs in > the org.Hs.eg.db package. Where could I be going wrong? > [[elided Yahoo spam]] > > Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... > > My sessionInfo() is: > -------------------------------- > R version 2.7.2 (2008-08-25) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United > States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] grid splines tools stats graphics grDevices utils datasets methods base > > other attached packages: > [1] > gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 > gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 > Category_2.6.0 > [8] RBGL_1.16.0 annotate_1.18.0 > xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 > GO.db_2.2.0 KEGG.db_2.2.0 > [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0 > [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 MASS_7.2-44 > > > --------------------------------- > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662 [[alternative HTML version deleted]] ------------------------------ Message: 10 Date: Wed, 22 Oct 2008 22:32:34 +0800 From: Leon Yee <yee.leon@gmail.com> Subject: Re: [BioC] quality assessment and preprocessing for tiling array-based CGH data To: Sean Davis <sdavis2 at="" mail.nih.gov=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <48FF3982.9020005 at gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Sean Davis wrote: > On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at="" gmail.com=""> wrote: >> Dear all, >> >> Is there any well-established routine for quality assessment and >> preprocessing of array CGH data, especially tiling array-based CGH data? I >> found most of the quality assessment of array data are about expression >> array, while few are related to array CGH data. >> We are using agilent 244k CGH array of rat, and now we have the text >> files produced by Feature Extraction, don't know whether they are of good [[elided Yahoo spam]] >> >> After read.maimage(), we got the RGlist object, which contain several >> components including R, G, Rb, Gb, and so on. The probes are of 3 types: >> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and >> the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding probe >> sequence]; 1 means positive control, i guess, and the probe names are like >> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no >> corresponding probe sequence]. The number of -1 is 1275, while the number >> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these >> values for quality assessment and normalization? How? >> In addition, in the normal probes, we have 1000 probes repeating 3 times >> in the array. How could we use these data for quality assessment and >> normalization? > > You generally will not want to do any normalization besides a possible > shift of the center. Any linear normalization that affects the slope > of the M vs. A plot or nonlinear normalization will likely decrease > signal. As for quality control, a good, general measure to track is > the dlrs, a robust measure of the standard deviation. > > > dlrs <- > function(x) { > nx <- length(x) > if (nx<3) { > stop("Vector length>2 needed for computation") > } > tmp <- embed(x,2) > diffs <- tmp[,2]-tmp[,1] > dlrs <- IQR(diffs)/(sqrt(2)*1.34) > return(dlrs) > } > > For agilent arrays, most of the dlrs should be around or under 0.2, > generally. However, this might vary a bit based on lab-to-lab > variation. In any case, if there is a significant outlier, that is > suspect. The input to the above function is the log ratios for a > single array arranged in chromosome and position order. > > Sean > Hi, Sean Thanks for your advice. However, I have still several questions: 1. The input of dlrs is the log ratios, the log ration extracted from the text file produced by Feature Extraction? or calculated from RGlist --> MAlist ? I have searched the mailist and seen a post of you mentioned the difference of log ration from Feature Extraction and the default M value from read.maimages. 2. I can get the log ratios of all features including control type of -1 and 1, but these features don't have chromosome positions, does this mean I don't need all of them for quality assessment? 3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get a proper mapping on the chromosome, so I should remove these values from the input of dlrs. Is it so? 4. How could I handle those 1000 probes repeating 3 times? They will be mapped on the same chromosome position by three per group. Regards, Leon ------------------------------ Message: 11 Date: Wed, 22 Oct 2008 15:42:11 +0100 From: "Mark Dunning" <md392@cam.ac.uk> Subject: Re: [BioC] Beadarray and illumina methylation arrays To: "'Katrina bell'" <katrina.bell at="" mcri.edu.au="">, <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <000c01c93454$5dc9e720$195db560$@ac.uk> Content-Type: text/plain; charset="us-ascii" Hi Katrina, I only have limited experience with methylation data, but hopefully I might be able to give you a few pointers. -The error with readIllumina is quite hard to diagnose without seeing the example. I haven't seen any data from this type of Methylation array. What do the first few lines of the bead-level text files (.csv in your case) look like? It could be that the x and y coordinates are in a slightly different format to that we have seen before. -Yes, 25% of outliers does seem a little high I'm afraid. Have you also looked at whereabouts the outliers are located on the arrays or made some imageplots? We have just developed a new function for automatic artefact detection called BASH that will be available in the forthcoming Bioconductor release. I could be interesting to run that on your data as Illumina do sometimes miss some beads in obvious artefacts and remove too many beads on the rest of the array. BASH should be a good compromise. -Yes, currently the only way of reading methylation data into beadarray is by using the bead-level data. -I'm not very familiar with the output of BeadStudio. Do you get separate detection values for the green and red channels? If so, then I don't think it would be problem if a bead-type was detected in one channel but not the other (since they are measure of either methylated or unmethylated respectively). Bead types that are not detected in either channel could be a problem though. -I haven't really seen many guidelines for normalization and this is something I would like to look into. There is an obvious dye-bias that needs to be corrected and the background normalisation used by Illumina might actually help in this regard (although I wouldn't usually recommend it for other Illumina data). Quantile normalisation has been used for other types of two-colour Illumina data (http://www.biomedcentral.com/1471-2105/9/409, http://genome.cshlp.org/cgi/content/abstract/17/3/368) so it could work here. Hope this helps, Mark -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Katrina bell Sent: 21 October 2008 03:04 To: bioconductor at stat.math.ethz.ch Subject: [BioC] Beadarray and illumina methylation arrays Hello, This is the first time I have used beadarray . I am using it for the analysis of an illumina 27 methylation array and I am having a few issues I hope that you could help me with. 1. The first time I tried to load the methylation data, I didn't write in singleChannel=FALSE. It happily read in my 12 arrays with no problems what so ever. I tried plotting a few things which worked fine. Seeing my mistake, I then went back to reload my data with the red channel (singleChannel=FALSE) and got the following error. > BLData = readIllumina(arrayNames = targets$ChipBarcode, textType=".csv", targets=targets, backgroundMethod="none", singleChannel=FALSE) Found 12 arrays Reading pixels of ./4408100017_A_Grn.tif Calculating background Sharpening Image Calculating foregound Background correcting: method = none Reading pixels of ./4408100017_A_Red.tif Calculating background *** caught segfault *** address (nil), cause 'memory not mapped' Traceback: 1: .C("readBeadImage", as.character(tifFiles2[i]), as.double(RedX[ord]), as.double(RedY[ord]), as.integer(numBeads), foreGround = double(length = numBeads), backGround = double(length = numBeads), as.integer(backgroundSize), as.integer(manip), as.integer(fground), PACKAGE = "beadarray") 2: readIllumina(arrayNames = targets$ChipBarcode, textType = ".csv", targets = targets, backgroundMethod = "none", singleChannel = FALSE) session info Below. So I ended up loading in the data with images=FALSE, which worked, but I would like to be able to look at the background. Is there a way around this issue? 2. When I plotted the outliers (bar chart) I got an astounding 25% for the majority of my 12 samples, both in the red and green channel (unlogged data). In addition, 3 of the samples had a peak of intensity at 5 in the green channel, leading me to believe that I have some real quality control issues with my samples. Any opinions/suggestions on these results would be most welcome. 3. Is it correct that readBeadSummaryData, is not set up for two colour arrays such as the methylation arrays? So the only way to look at methylation data is through reading in BLData? 4. Some of my samples seem to have a large number of targets which have a p value detection rate above 0.05 (beadstudio output). Illumina have indicated that they disregard these. If I can not read in the bead summary data from bead studio, I am assuming that these detection p values can not be taken into account in the analysis? Or are there other methods that remove/down grade these less than optimal probes (most removed as outliers?). 5. Has any one had any experience with normalisation of the methylation arrays? I know that many of the usual array methods are out of the question due to the assumption that most probes on the array will not be differentially expressed is invalid. I read in a bioconductor list someone suggesting quantile normalisation? I would really appreciate any feeback from people who have tried this or other methods, especially if they have verified their methylation results. Thanks for any help/advice you may be able to give. Cheers Katrina > sessionInfo() below R version 2.7.0 (2008-04-22) x86_64-redhat-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8 ;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C; LC_ADD RESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] beadarray_1.8.0 affy_1.18.2 preprocessCore_1.2.1 [4] affyio_1.8.0 geneplotter_1.18.0 annotate_1.18.0 [7] xtable_1.5-2 AnnotationDbi_1.2.2 RSQLite_0.6-9 [10] DBI_0.2-4 lattice_0.17-6 Biobase_2.0.1 [13] limma_2.14.5 loaded via a namespace (and not attached): [1] grid_2.7.0 KernSmooth_2.22-22 RColorBrewer_1.0-2 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 12 Date: Wed, 22 Oct 2008 10:46:24 -0400 From: "Sean Davis" <sdavis2@mail.nih.gov> Subject: Re: [BioC] quality assessment and preprocessing for tiling array-based CGH data To: "Leon Yee" <yee.leon at="" gmail.com=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <264855a00810220746w6d5ecb11ica5542bf46042f20 at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 On Wed, Oct 22, 2008 at 10:32 AM, Leon Yee <yee.leon at="" gmail.com=""> wrote: > Sean Davis wrote: >> >> On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at="" gmail.com=""> wrote: >>> >>> Dear all, >>> >>> Is there any well-established routine for quality assessment and >>> preprocessing of array CGH data, especially tiling array-based CGH data? >>> I >>> found most of the quality assessment of array data are about expression >>> array, while few are related to array CGH data. >>> We are using agilent 244k CGH array of rat, and now we have the text >>> files produced by Feature Extraction, don't know whether they are of good [[elided Yahoo spam]] >>> >>> After read.maimage(), we got the RGlist object, which contain several >>> components including R, G, Rb, Gb, and so on. The probes are of 3 types: >>> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and >>> the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding >>> probe >>> sequence]; 1 means positive control, i guess, and the probe names are >>> like >>> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no >>> corresponding probe sequence]. The number of -1 is 1275, while the >>> number >>> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these >>> values for quality assessment and normalization? How? >>> In addition, in the normal probes, we have 1000 probes repeating 3 >>> times >>> in the array. How could we use these data for quality assessment and >>> normalization? >> >> You generally will not want to do any normalization besides a possible >> shift of the center. Any linear normalization that affects the slope >> of the M vs. A plot or nonlinear normalization will likely decrease >> signal. As for quality control, a good, general measure to track is >> the dlrs, a robust measure of the standard deviation. >> >> >> dlrs <- >> function(x) { >> nx <- length(x) >> if (nx<3) { >> stop("Vector length>2 needed for computation") >> } >> tmp <- embed(x,2) >> diffs <- tmp[,2]-tmp[,1] >> dlrs <- IQR(diffs)/(sqrt(2)*1.34) >> return(dlrs) >> } >> >> For agilent arrays, most of the dlrs should be around or under 0.2, >> generally. However, this might vary a bit based on lab-to-lab >> variation. In any case, if there is a significant outlier, that is >> suspect. The input to the above function is the log ratios for a >> single array arranged in chromosome and position order. >> >> Sean >> > > Hi, Sean > > Thanks for your advice. However, I have still several questions: > > 1. The input of dlrs is the log ratios, the log ration extracted from the > text file produced by Feature Extraction? or calculated from RGlist --> > MAlist ? I have searched the mailist and seen a post of you mentioned the > difference of log ration from Feature Extraction and the default M value > from read.maimages. You can read the Agilent FE manual for more details, but you can probably use either and come to very similar conclusions. If you use the MAlist version, make sure to use only median centering or none for normalization. > 2. I can get the log ratios of all features including control type of -1 > and 1, but these features don't have chromosome positions, does this mean I > don't need all of them for quality assessment? We have not routinely used these probes, no. If an array fails miserably, then these control probes might be useful for determining the reason for the failure, though. > 3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get a > proper mapping on the chromosome, so I should remove these values from the > input of dlrs. Is it so? You can either remove them or treat chr2_random as a separate chromosome. > 4. How could I handle those 1000 probes repeating 3 times? They will be > mapped on the same chromosome position by three per group. You could choose one at random or use a mean or median of them. My guess is that they agree very closely with one another so the choice should not affect the results much. Sean ------------------------------ Message: 13 Date: Wed, 22 Oct 2008 15:54:25 +0100 From: "Dan Bolser" <dan.bolser@gmail.com> Subject: [BioC] Problem using Rgraphviz (edge weights going missing). To: bioconductor at stat.math.ethz.ch Message-ID: <2c8757af0810220754r3a16507l29dd77f43a5151d7 at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 Hi all, After successfully installing Rgraphviz on this Centos 5.2 (RHEL 4) box (basically with yum install *graphviz*** ..., then using the standard BioC install procedure) and then navigating the packaging error (that actually caused R to crash) described here: http://lists.rpmforge.net/pipermail/users/2007-August/000958.html (Just manually run dot -c as root) I finally got to produce some nice graphs of my data. However, I found that when I tried to write my graphs to file in dot format, the edge weights go missing. I load my graph like this: library(Rgraphviz) myNodes <- paste(nodes$MAPC) myG <- new("graphNEL", nodes=myNodes) for(i in 1:nrow(dat)){ paste(dat[i,"A"], dat[i,"B"]) myG <- addEdge(paste(dat[i,"A"]), paste(dat[i,"B"]), myG, dat[i,"N"]) } plot(myG, "neato") and all well and good (except that I couldn't get the edge weights to translate into line thickness - more on that below). Then I (foolishly?) tried: agwrite(agopen(myG,"test"), filename="test") This produced a file, but the weight attribute of the edges (in dot format?) were all clearly set to 1 (and not the range of values that I have stored in dat$N). I confirmed that my loop was putting the weights somewhere with the following one-liner: lapply(myG at edgeData@data, function(l){l$weight}) but I don't really know what this means. So I think its a bug. The graph should be written to file with the correct weight attributes. AFAICT. Poking around in the data structure I was confused by the apparent number of edges... ew <- unlist(edgeWeights(myG)) length(ew) Why is this not equal to the number of edges that I put in (and the number reported by just printing the graph object)? I tried to set the width of the edges plotted on the graph by using the penwidth attribute, but it had no effect, and was not written to file as an attribute of the graph. ## Try defining some EDGE attributes eAttrs <- list() ## Get the edge weights ew <- unlist(edgeWeights(myG)) length(ew) ## Some need removing for some reason somehow... ew <- ew[setdiff(seq(along = ew), removedEdges(myG))] length(ew) ## Label the attributes vector (so it can be used properly) names(ew) <- edgeNames(myG) eAttrs$penwidth <- ew plot(myG, "neato", edgeAttrs=eAttrs) toFile(agopen(myG, name="test"), filename="test", edgeAttrs=eAttrs) So I read in the vignette, " A list of all available attributes is accessible online at: http://www.graphviz.org/pub/scm/graphviz2/doc/info/attrs. html. Note that there are some di?erences between default values and also some attributes will not have an e?ect in Rgraphviz. Please see the man page for graphvizAttributes for more details." Where is the "man page for graphvizAttributes for more details" ?? It would be good to know what attributes Rgraphviz is ignoring and if penwidth is one of them in particular. Also the vignette didn't tell me who to contact in case of feedback, bugs, questions, etc. Who else should I be bugging with this email? Thanks for reading! Dan. -- R2spec --bioc -u http://www.bioconductor.org/packages/release/bioc/src/contrib/Rgraphvi z_1.18.1.tar.gz ------------------------------ Message: 14 Date: Wed, 22 Oct 2008 11:07:02 -0400 From: "Mark Kimpel" <mwkimpel@gmail.com> Subject: Re: [BioC] newbie problems with AnnBuilder To: "Marc Carlson" <mcarlson at="" fhcrc.org=""> Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <6b93d1830810220807l788e52f8wa4bafef9726892d0 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Build with AnnotationDbi was uneventful, but I have been unable to install the package or use it as is. If the package is just placed in my site-library, I get: 'ragene10stv1.db' is not a valid package -- installed < 2.0.0? If I tar the package and try R CMD INSTALL, I get: cannot extract package from 'ragene10stv1.db.tar.gz' What approach should I be using? Mark ------------------------------------------------------------ Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 399-1219 Home Skype: mkimpel "The real problem is not whether machines think but whether men do." -- B. F. Skinner ****************************************************************** On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: > > Hi Mark, > > AnnBuilder is on its way out. Please have a look at the SQLforge.pdf > vignette which can be found here: > > http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html > > If you have further questions after reading this, then please just ask, > and we should be able to help you out. > > > Marc > > > > Mark Kimpel wrote: > > I'm having problems getting AnnBuilder to work with the Affy Rat Gene ST > > array data supplied by Affy. Using the code chunk below, I am able to get > > AnnBuilder to create the annotation object, but it crashes, I believe, when > > it tries to save it. I should also mention that I had a previous crash when > > I had a madecdfenv package directory in place that used the same name. I got > > a "file lock" error, so I temporarily renamed the directory to see if this > > fixed the problem. As below, the error changed, but I still can't get the > > script to work. > > > > I suspect that there is a fundamental misunderstaning on my part related to > > how the annotation package should relate to the cdf package or some naming > > convention related to one or both. > > > > Mark > > > > > >> require(AnnBuilder); require(makecdfenv) > >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe- entrez_gene.csv" > >> myBaseType <- "ll" > >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus") > >> > >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType = > >> > > + myBaseType, pkgName = > > substring(cleancdfname("RaGene-1_0-st-v1"), > > + 1, (nchar(cleancdfname("RaGene- 1_0-st-v1")) - 3)), > > + pkgPath = '~/R_HOME/site-library-2.8.0', organism = "Rattus > > norvegicus", version = "1.0", > > + author = list(authors = "Mark W Kimpel", maintainer = > > + "mkimpel at iupui.edu"), fromWeb = TRUE) > > Read 1 item > > Read 1 item > > [1] "4450 2 2" > > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) : > > cannot open file > > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No such file or > > directory > > > > Enter a frame number, or 0 to exit > > > > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType = > > myBaseTy > > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE) > > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress) > > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress, > > envhook) > > 5: function (e) > > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) > > > > Selection: 1 > > Browse[1]> annotation[1:2,] > > ENTREZID PROBE ACCNUM GO > > [1,] "100008565" "10766774" NA NA > > [2,] "100034253" "10937540" NA "GO:0005525 at IEA;GO:0005622 at IEA" > > PMID SYMBOL > > [1,] "16804107;17292978" "Slc39a4l" > > [2,] "8889548" "Gnl3l" > > GENENAME CHR > > [1,] "solute carrier family 39 (zinc transporter), member 4-like" NA > > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like" "X" > > MAP REFSEQ UNIGENE CHRLOC PATH > > [1,] NA "NM_001101021;NP_001094491" "Rn.10120" "NA" NA > > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274" "-40301218 at X" NA > > ENZYME PFAM PROSITE > > [1,] NA "NA at IPI00817057" "NA at IPI00817057" > > [2,] NA "PF01926 at IPI00362844" "NA at IPI00362844" > > ------------------------------------------------------------ > > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry > > Indiana University School of Medicine > > > > 15032 Hunter Court, Westfield, IN 46074 > > > > (317) 490-5129 Work, & Mobile & VoiceMail > > (317) 399-1219 Home > > Skype: mkimpel > > > > "The real problem is not whether machines think but whether men do." -- B. > > F. Skinner > > ****************************************************************** > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > ------------------------------ Message: 15 Date: Wed, 22 Oct 2008 11:53:59 -0400 From: "Sean Davis" <sdavis2@mail.nih.gov> Subject: Re: [BioC] newbie problems with AnnBuilder To: "Mark Kimpel" <mwkimpel at="" gmail.com=""> Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <264855a00810220853m5afad74apfb3df17499f90f49 at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 On Wed, Oct 22, 2008 at 11:07 AM, Mark Kimpel <mwkimpel at="" gmail.com=""> wrote: > Build with AnnotationDbi was uneventful, but I have been unable to > install the package or use it as is. > > If the package is just placed in my site-library, I get: > 'ragene10stv1.db' is not a valid package -- installed < 2.0.0? > > If I tar the package and try R CMD INSTALL, I get: > cannot extract package from 'ragene10stv1.db.tar.gz' > > What approach should I be using? You can just R CMD INSTALL ragene10stv1.db where ragene10stv1.db is the directory that contains the package (right above the DESCRIPTION file). Sean > On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: >> >> Hi Mark, >> >> AnnBuilder is on its way out. Please have a look at the SQLforge.pdf >> vignette which can be found here: >> >> http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html >> >> If you have further questions after reading this, then please just ask, >> and we should be able to help you out. >> >> >> Marc >> >> >> >> Mark Kimpel wrote: >> > I'm having problems getting AnnBuilder to work with the Affy Rat Gene ST >> > array data supplied by Affy. Using the code chunk below, I am able to get >> > AnnBuilder to create the annotation object, but it crashes, I believe, when >> > it tries to save it. I should also mention that I had a previous crash when >> > I had a madecdfenv package directory in place that used the same name. I got >> > a "file lock" error, so I temporarily renamed the directory to see if this >> > fixed the problem. As below, the error changed, but I still can't get the >> > script to work. >> > >> > I suspect that there is a fundamental misunderstaning on my part related to >> > how the annotation package should relate to the cdf package or some naming >> > convention related to one or both. >> > >> > Mark >> > >> > >> >> require(AnnBuilder); require(makecdfenv) >> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe- entrez_gene.csv" >> >> myBaseType <- "ll" >> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus") >> >> >> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType = >> >> >> > + myBaseType, pkgName = >> > substring(cleancdfname("RaGene-1_0-st-v1"), >> > + 1, (nchar(cleancdfname("RaGene- 1_0-st-v1")) - 3)), >> > + pkgPath = '~/R_HOME/site-library-2.8.0', organism = "Rattus >> > norvegicus", version = "1.0", >> > + author = list(authors = "Mark W Kimpel", maintainer = >> > + "mkimpel at iupui.edu"), fromWeb = TRUE) >> > Read 1 item >> > Read 1 item >> > [1] "4450 2 2" >> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) : >> > cannot open file >> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No such file or >> > directory >> > >> > Enter a frame number, or 0 to exit >> > >> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType = >> > myBaseTy >> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE) >> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress) >> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress, >> > envhook) >> > 5: function (e) >> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) >> > >> > Selection: 1 >> > Browse[1]> annotation[1:2,] >> > ENTREZID PROBE ACCNUM GO >> > [1,] "100008565" "10766774" NA NA >> > [2,] "100034253" "10937540" NA "GO:0005525 at IEA;GO:0005622 at IEA" >> > PMID SYMBOL >> > [1,] "16804107;17292978" "Slc39a4l" >> > [2,] "8889548" "Gnl3l" >> > GENENAME CHR >> > [1,] "solute carrier family 39 (zinc transporter), member 4-like" NA >> > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like" "X" >> > MAP REFSEQ UNIGENE CHRLOC PATH >> > [1,] NA "NM_001101021;NP_001094491" "Rn.10120" "NA" NA >> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274" "-40301218 at X" NA >> > ENZYME PFAM PROSITE >> > [1,] NA "NA at IPI00817057" "NA at IPI00817057" >> > [2,] NA "PF01926 at IPI00362844" "NA at IPI00362844" >> > ------------------------------------------------------------ >> > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry >> > Indiana University School of Medicine >> > >> > 15032 Hunter Court, Westfield, IN 46074 >> > >> > (317) 490-5129 Work, & Mobile & VoiceMail >> > (317) 399-1219 Home >> > Skype: mkimpel >> > >> > "The real problem is not whether machines think but whether men do." -- B. >> > F. Skinner >> > ****************************************************************** >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ------------------------------ Message: 16 Date: Wed, 22 Oct 2008 12:05:25 -0400 From: "Mark Kimpel" <mwkimpel@gmail.com> Subject: Re: [BioC] newbie problems with AnnBuilder To: "Sean Davis" <sdavis2 at="" mail.nih.gov=""> Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <6b93d1830810220905m46a4e41kfe052fd768d458e5 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Interesting, it works from a command line outside of R, but when I had tried R> system('R CMD INSTALL ragene10stv1.db') I received an error message. I have used this approach successfully with other packages to avoid leaving R and starting a shell, but with this package I get: '* Installing to library '/home/mkimpel/R_HOME/site-library-2.8.0' ERROR: no packages specified' Well, thanks, it now is installed. Any comment on why my system() approach might not have worked? Mark ------------------------------------------------------------ Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 399-1219 Home Skype: mkimpel "The real problem is not whether machines think but whether men do." -- B. F. Skinner ****************************************************************** On Wed, Oct 22, 2008 at 11:53 AM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > On Wed, Oct 22, 2008 at 11:07 AM, Mark Kimpel <mwkimpel at="" gmail.com=""> wrote: >> Build with AnnotationDbi was uneventful, but I have been unable to >> install the package or use it as is. >> >> If the package is just placed in my site-library, I get: >> 'ragene10stv1.db' is not a valid package -- installed < 2.0.0? >> >> If I tar the package and try R CMD INSTALL, I get: >> cannot extract package from 'ragene10stv1.db.tar.gz' >> >> What approach should I be using? > > You can just > > R CMD INSTALL ragene10stv1.db > > where ragene10stv1.db is the directory that contains the package > (right above the DESCRIPTION file). > > Sean > > >> On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: >>> >>> Hi Mark, >>> >>> AnnBuilder is on its way out. Please have a look at the SQLforge.pdf >>> vignette which can be found here: >>> >>> http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html >>> >>> If you have further questions after reading this, then please just ask, >>> and we should be able to help you out. >>> >>> >>> Marc >>> >>> >>> >>> Mark Kimpel wrote: >>> > I'm having problems getting AnnBuilder to work with the Affy Rat Gene ST >>> > array data supplied by Affy. Using the code chunk below, I am able to get >>> > AnnBuilder to create the annotation object, but it crashes, I believe, when >>> > it tries to save it. I should also mention that I had a previous crash when >>> > I had a madecdfenv package directory in place that used the same name. I got >>> > a "file lock" error, so I temporarily renamed the directory to see if this >>> > fixed the problem. As below, the error changed, but I still can't get the >>> > script to work. >>> > >>> > I suspect that there is a fundamental misunderstaning on my part related to >>> > how the annotation package should relate to the cdf package or some naming >>> > convention related to one or both. >>> > >>> > Mark >>> > >>> > >>> >> require(AnnBuilder); require(makecdfenv) >>> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe- entrez_gene.csv" >>> >> myBaseType <- "ll" >>> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus") >>> >> >>> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType = >>> >> >>> > + myBaseType, pkgName = >>> > substring(cleancdfname("RaGene-1_0-st-v1"), >>> > + 1, (nchar(cleancdfname("RaGene- 1_0-st-v1")) - 3)), >>> > + pkgPath = '~/R_HOME/site-library-2.8.0', organism = "Rattus >>> > norvegicus", version = "1.0", >>> > + author = list(authors = "Mark W Kimpel", maintainer = >>> > + "mkimpel at iupui.edu"), fromWeb = TRUE) >>> > Read 1 item >>> > Read 1 item >>> > [1] "4450 2 2" >>> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) : >>> > cannot open file >>> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No such file or >>> > directory >>> > >>> > Enter a frame number, or 0 to exit >>> > >>> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType = >>> > myBaseTy >>> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE) >>> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress) >>> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress, >>> > envhook) >>> > 5: function (e) >>> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) >>> > >>> > Selection: 1 >>> > Browse[1]> annotation[1:2,] >>> > ENTREZID PROBE ACCNUM GO >>> > [1,] "100008565" "10766774" NA NA >>> > [2,] "100034253" "10937540" NA "GO:0005525 at IEA;GO:0005622 at IEA" >>> > PMID SYMBOL >>> > [1,] "16804107;17292978" "Slc39a4l" >>> > [2,] "8889548" "Gnl3l" >>> > GENENAME CHR >>> > [1,] "solute carrier family 39 (zinc transporter), member 4-like" NA >>> > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like" "X" >>> > MAP REFSEQ UNIGENE CHRLOC PATH >>> > [1,] NA "NM_001101021;NP_001094491" "Rn.10120" "NA" NA >>> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274" "-40301218 at X" NA >>> > ENZYME PFAM PROSITE >>> > [1,] NA "NA at IPI00817057" "NA at IPI00817057" >>> > [2,] NA "PF01926 at IPI00362844" "NA at IPI00362844" >>> > ------------------------------------------------------------ >>> > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry >>> > Indiana University School of Medicine >>> > >>> > 15032 Hunter Court, Westfield, IN 46074 >>> > >>> > (317) 490-5129 Work, & Mobile & VoiceMail >>> > (317) 399-1219 Home >>> > Skype: mkimpel >>> > >>> > "The real problem is not whether machines think but whether men do." -- B. >>> > F. Skinner >>> > ****************************************************************** >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > _______________________________________________ >>> > Bioconductor mailing list >>> > Bioconductor at stat.math.ethz.ch >>> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> > >>> > >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > ------------------------------ Message: 17 Date: Wed, 22 Oct 2008 09:49:20 -0700 From: Robert Gentleman <rgentlem@fhcrc.org> Subject: Re: [BioC] GOstats and org.EcK12.eg.db To: Robert Castelo <robert.castelo at="" upf.edu=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <48FF5990.20501 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Robert, Yet another one that I will need to write some specialized code for. I should get it done by the end of the week, and will push it to both release and devel. I will post an email when it is done, best wishes Robert Robert Castelo wrote: > dear list, > > I cannot get to work GOstats with the annotation for E. coli in > org.EcK12.eg.db. Please find below the code that reproduces the problem > including the error message, and my sessionInfo() at the end of this > email. I have included the same exercise with the human annotation > package org.Hs.eg.db which runs fine in my system. Any help with this > will be very much appreciated. > > thanks!! > robert. > ==========CODE STARTS HERE=========== > > library(org.Hs.eg.db) > library(org.EcK12.eg.db) > library(GOstats) > > geneuniverse <- mappedkeys(org.EcK12.egSYMBOL) > set.seed(12345) > geneset <- sample(geneuniverse, size=100, replace=FALSE) > > > goHypGparams <- new("GOHyperGParams", > geneIds=geneset, > universeGeneIds=geneuniverse, > annotation="org.EcK12.eg.db", ontology="BP", > pvalueCutoff=1.0, conditional=TRUE, > testDirection="over") > > goHypGcond <- hyperGTest(goHypGparams) > > Error in get(mapName, envir = pkgEnv, inherits = FALSE) : > variable "org.EcK12.egENTREZID" was not found > Error in mget(probes, ID2EntrezID(datPkg)) : > error in evaluating the argument 'envir' in selecting a method for > function 'mget' > > geneuniverse <- mappedkeys(org.Hs.egSYMBOL) > set.seed(12345) > geneset <- sample(geneuniverse, size=100, replace=FALSE) > > > goHypGparams <- new("GOHyperGParams", > geneIds=geneset, > universeGeneIds=geneuniverse, > annotation="org.Hs.eg.db", ontology="BP", > pvalueCutoff=1.0, conditional=TRUE, > testDirection="over") > > goHypGcond <- hyperGTest(goHypGparams) > > > sessionInfo() > R version 2.8.0 beta (2008-10-05 r46601) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] splines tools stats graphics grDevices utils > datasets > [8] methods base > > other attached packages: > [1] org.Hs.eg.db_2.2.6 GOstats_2.7.0 Category_2.7.6 > [4] genefilter_1.21.5 survival_2.34-1 RBGL_1.17.2 > [7] annotate_1.19.2 xtable_1.5-4 GO.db_2.2.5 > [10] graph_1.19.6 org.EcK12.eg.db_2.2.6 AnnotationDbi_1.3.12 > [13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.1.7 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 GSEABase_1.3.6 XML_1.98-1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org ------------------------------ Message: 18 Date: Thu, 23 Oct 2008 01:14:09 +0800 From: Leon Yee <yee.leon@gmail.com> Subject: Re: [BioC] quality assessment and preprocessing for tiling array-based CGH data To: Sean Davis <sdavis2 at="" mail.nih.gov=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <48FF5F61.5060906 at gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Sean Davis wrote: >> Hi, Sean >> >> Thanks for your advice. However, I have still several questions: >> >> 1. The input of dlrs is the log ratios, the log ration extracted from the >> text file produced by Feature Extraction? or calculated from RGlist --> >> MAlist ? I have searched the mailist and seen a post of you mentioned the >> difference of log ration from Feature Extraction and the default M value >> from read.maimages. > > You can read the Agilent FE manual for more details, but you can > probably use either and come to very similar conclusions. If you use > the MAlist version, make sure to use only median centering or none for > normalization. > >> 2. I can get the log ratios of all features including control type of -1 >> and 1, but these features don't have chromosome positions, does this mean I >> don't need all of them for quality assessment? > > We have not routinely used these probes, no. If an array fails > miserably, then these control probes might be useful for determining > the reason for the failure, though. > >> 3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get a >> proper mapping on the chromosome, so I should remove these values from the >> input of dlrs. Is it so? > > You can either remove them or treat chr2_random as a separate chromosome. > >> 4. How could I handle those 1000 probes repeating 3 times? They will be >> mapped on the same chromosome position by three per group. > > You could choose one at random or use a mean or median of them. My > guess is that they agree very closely with one another so the choice > should not affect the results much. Hi, Sean Thank you very much for your detailed reply and help. Where can I get the references or official documentations about dlrs method? In addition, we have design our array with dye-swap [test-cy3 vs ref-cy5, and test-cy5 vs ref-cy3]. Is there any method for utilizing the information here for quality assessment? Best wishes! Leon ------------------------------ Message: 19 Date: Wed, 22 Oct 2008 17:13:10 +0000 (UTC) From: Shinichiro Wachi <swachi@ucdavis.edu> Subject: [BioC] Bioconductor installation problem: unable to access repository To: bioconductor at stat.math.ethz.ch Message-ID: <loom.20081022t170334-485 at="" post.gmane.org=""> Content-Type: text/plain; charset=us-ascii I am installing Bioconductor on a new machine (Intel Mac running OSX 10.5.5), R version is 2.8.0. sudo R is used for this session. These are the error messages I get: source("http://bioconductor.org/biocLite.R") biocinstall() Running biocinstall version 2.3.8 with R version 2.8.0 (under development) Your version of R requires version 2.3 of Bioconductor. Warning: unable to access index for repository http://bioconductor.org/packages/2.3/bioc/bin/macosx//contrib/2.8 Warning: unable to access index for repository http://bioconductor.org/packages/2.3/data/annotation/bin/macosx//contr ib/2.8 Warning: unable to access index for repository http://bioconductor.org/packages/2.3/data/experiment/bin/macosx//contr ib/2.8 Warning: unable to access index for repository http://bioconductor.org/packages/2.3/extra/bin/macosx//contrib/2.8 Warning: unable to access index for repository http://cran.fhcrc.org/bin/macosx//contrib/2.8 Warning message: package 'Biobase' is not available http://bioconductor.org/packages/2.3/bioc/src/contrib/PACKAGES will produce a web page displaying packages. (I read a post that asked for this. I am assuming this was a quick server diagnostics). Is there a problem with the server, or is this a temporary glitch in the installer? Is there a workaround? Much thanks. Shin ------------------------------ Message: 20 Date: Wed, 22 Oct 2008 13:29:58 -0400 From: "Sean Davis" <sdavis2@mail.nih.gov> Subject: Re: [BioC] quality assessment and preprocessing for tiling array-based CGH data To: "Leon Yee" <yee.leon at="" gmail.com=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <264855a00810221029i18a08bc3nb1a34d786a6b30a6 at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 On Wed, Oct 22, 2008 at 1:14 PM, Leon Yee <yee.leon at="" gmail.com=""> wrote: > Sean Davis wrote: >>> >>> Hi, Sean >>> >>> Thanks for your advice. However, I have still several questions: >>> >>> 1. The input of dlrs is the log ratios, the log ration extracted from >>> the >>> text file produced by Feature Extraction? or calculated from RGlist --> >>> MAlist ? I have searched the mailist and seen a post of you mentioned >>> the >>> difference of log ration from Feature Extraction and the default M value >>> from read.maimages. >> >> You can read the Agilent FE manual for more details, but you can >> probably use either and come to very similar conclusions. If you use >> the MAlist version, make sure to use only median centering or none for >> normalization. >> >>> 2. I can get the log ratios of all features including control type of -1 >>> and 1, but these features don't have chromosome positions, does this mean >>> I >>> don't need all of them for quality assessment? >> >> We have not routinely used these probes, no. If an array fails >> miserably, then these control probes might be useful for determining >> the reason for the failure, though. >> >>> 3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get >>> a >>> proper mapping on the chromosome, so I should remove these values from >>> the >>> input of dlrs. Is it so? >> >> You can either remove them or treat chr2_random as a separate chromosome. >> >>> 4. How could I handle those 1000 probes repeating 3 times? They will be >>> mapped on the same chromosome position by three per group. >> >> You could choose one at random or use a mean or median of them. My >> guess is that they agree very closely with one another so the choice >> should not affect the results much. > > Hi, Sean > > Thank you very much for your detailed reply and help. > > Where can I get the references or official documentations about dlrs > method? It is a standard robust estimator of the variance and is not specific to microarrays. If you look at the code, it simply subtracts the difference between adjacent probes and then normalizes the result. If the array is "noisy", the dlrs will be high. This assumes that the contribution due to large copy number changes is negligible which is likely true since even the most abnormal cancer samples have fewer than 1000 breaks. > In addition, we have design our array with dye-swap [test-cy3 vs ref-cy5, > and test-cy5 vs ref-cy3]. Is there any method for utilizing the information > here for quality assessment? Not that I know of, but you could certainly look at correlations between replicates, etc. Our experience with Agilent CGH arrays is that the contribution due to dye bias is small compared to changes due to copy number. Sean Sean ------------------------------ Message: 21 Date: Wed, 22 Oct 2008 13:40:01 -0400 From: "James W. MacDonald" <jmacdon@med.umich.edu> Subject: Re: [BioC] GOstat: listing genes from hyperGTest Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <48FF6571.5090806 at med.umich.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Tim, Yeah, probeSetSummary() is probably not what you want, if you are not starting with an Affy chip. There are some gymnastics required to map things back to the original Affy chip that you won't need to do. In addition, if you are not using a conditional hypergeometric analysis, it should be pretty simple to get what you want without even needing to parse things out of the GOHyperGResult object. An example: ## fake up some data > geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)] > univ <- Lkeys(org.Hs.egGO) > param <- new("GOHyperGParams", geneIds = geneIds, universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP") > hyp <- hyperGTest(param) > summary(hyp, categorySize=10) GOBPID Pvalue OddsRatio ExpCount Count Size Term 1 GO:0007338 0.002723500 29.25101 0.07808304 2 54 single fertilization 2 GO:0009566 0.002925855 28.16374 0.08097501 2 56 fertilization So we have two terms of interest. Getting the Entrez Gene IDs from the input set that map to these terms is easy: > geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))] [1] "100131137" "10007" Now you might also want to know which 54 Entrez Gene IDs map to that particular GO term. Since you are not conditioning, this includes that particular GO term and all its offspring. > offspring <- get("GO:0007338", GOBPOFFSPRING) > egids <- unique(unlist(mget(c("GO:0007338", offspring), revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE)) > egids[!is.na(egids)] [1] "1047" "4179" "4240" "4486" "4809" "5016" [7] "6674" "7783" "7784" "7802" "7993" "8747" [13] "8748" "8852" "9082" "10007" "10361" "22917" [19] "26476" "53340" "57055" "57829" "64100" "93185" [25] "158062" "442868" "100131137" "49" "410" "2683" [31] "3010" "4184" "6677" "7142" "7455" "8857" [37] "11055" "124626" "2054" "2741" "10343" "10566" [43] "27297" "152015" "3074" "167" "928" "2515" [49] "5104" "23553" "284359" "164684" "7141" "79400" Best, Jim Tim Smith wrote: > Thanks James. If I can tweak that function, I'll get exactly what I want. > > I tried what you suggested and got the following error: > > --------------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > ps <- probeSetSummary(GO) > > Error in get(mapName, envir = pkgEnv, inherits = FALSE) : > variable "org.Hs.egENTREZID" was not found > -------------------------------- > > I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input. > > Or am I doing something wrong? > > thanks! > > > > > > ----- Original Message ---- > From: James W. MacDonald <jmacdon at="" med.umich.edu=""> > > Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> > Sent: Wednesday, October 22, 2008 9:10:39 AM > Subject: Re: [BioC] GOstat: listing genes from hyperGTest > > Hi Tim, > > Does probeSetSummary() do what you want? > > Best, > > Jim > > > > Tim Smith wrote: >> >> Hi, >> >> I >> was performing a hyperGTest for genes in homo-sapiens. For a set of >> input genes, this function returns some 'significant' GO terms. What I >> wanted to now do was to co-relate each significant GO term (returned by >> this function) with genes (from my set of input genes) associated with >> that GO term. However, I think that I may be using the wrong >> package/function to get the releveant set of genes. >> >> Currently, what I'm doing is finding the significant GO terms by using the following code: >> >> ----------------------- >> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs >> >> paramsGO <- new("GOHyperGParams", geneIds = genes1, >> universeGeneIds = allGenes, annotation = "org.Hs.eg.db", >> ontology = "BP", pvalueCutoff = 1, conditional = FALSE, >> testDirection = "over") >> >> GO <- hyperGTest(paramsGO) >> -------------------------- >> This >> gives me a set of significant GO terms. Now, I would like to find which >> subset of genes in 'genes1' is associated with each of the significant >> GO term. To do this I map all GO terms to their Entrez IDs using the >> 'org.Hs.eg.db' package using the following: >> >> xx <- as.list(org.Hs.egGO2EG) >> >> to >> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't >> this number small?) that map to at least one Entrez ID. So, from here I >> look up which Entrez IDs are associated with my GO term of interest. >> >> My >> problem is that often, the GO term from hyperGTest is not associated >> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described >> above ), i.e. the GO term/ID is not in the list obtained from >> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by >> hyperGTest, but does not appear to be associated with any Entrez IDs in >> the org.Hs.eg.db package. Where could I be going wrong? >> > [[elided Yahoo spam]] >> Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... >> >> My sessionInfo() is: >> -------------------------------- >> R version 2.7.2 (2008-08-25) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_United >> States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] grid splines tools stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] >> gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 >> gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 >> Category_2.6.0 >> [8] RBGL_1.16.0 annotate_1.18.0 >> xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 >> GO.db_2.2.0 KEGG.db_2.2.0 >> [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0 >> [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 >> >> loaded via a namespace (and not attached): >> [1] cluster_1.11.11 MASS_7.2-44 >> >> >> --------------------------------- >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662 ------------------------------ Message: 22 Date: Wed, 22 Oct 2008 10:48:02 -0700 From: Patrick Aboyoun <paboyoun@fhcrc.org> Subject: Re: [BioC] Bioconductor installation problem: unable to access repository To: Shinichiro Wachi <swachi at="" ucdavis.edu=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <48FF6752.1020901 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Shin, I appears that R 2.8 has changed the way it regulates Mac OS X binary packages for users running Mac OS X 10.5 (Leopard). We have just become aware of this change and will be adjusting the Bioconductor Mac OS X repositories accordingly over the next few days to adjust to these changes. The good news is that R 2.8 supports binary packages for both Mac OS X 10.4 (Tiger) and Mac OS X 10.5 (Leopard). I'll send out an e-mail to this group when the Mac OS X 10.5 (Leopard) packages are available for BioC 2.3. Patrick Shinichiro Wachi wrote: > I am installing Bioconductor on a new machine (Intel Mac running OSX 10.5.5), R > version is 2.8.0. > > sudo R is used for this session. > > These are the error messages I get: > > source("http://bioconductor.org/biocLite.R") > biocinstall() > > Running biocinstall version 2.3.8 with R version 2.8.0 (under development) > Your version of R requires version 2.3 of Bioconductor. > Warning: unable to access index for repository > http://bioconductor.org/packages/2.3/bioc/bin/macosx//contrib/2.8 > Warning: unable to access index for repository > http://bioconductor.org/packages/2.3/data/annotation/bin/macosx//con trib/2.8 > Warning: unable to access index for repository > http://bioconductor.org/packages/2.3/data/experiment/bin/macosx//con trib/2.8 > Warning: unable to access index for repository > http://bioconductor.org/packages/2.3/extra/bin/macosx//contrib/2.8 > Warning: unable to access index for repository > http://cran.fhcrc.org/bin/macosx//contrib/2.8 > Warning message: > package 'Biobase' is not available > > http://bioconductor.org/packages/2.3/bioc/src/contrib/PACKAGES > will produce a web page displaying packages. (I read a post that asked for this. > I am assuming this was a quick server diagnostics). > > Is there a problem with the server, or is this a temporary glitch in the > installer? Is there a workaround? > > Much thanks. > > Shin > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ------------------------------ Message: 23 Date: Wed, 22 Oct 2008 13:36:30 -0700 From: Patrick Aboyoun <paboyoun@fhcrc.org> Subject: [BioC] Bioconductor 2.3 is released To: BioC list <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <48FF8ECE.4060201 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Bioconductors: We are pleased to announce the release of Bioconductor 2.3. This release includes 36 new software packages, and many changes to existing packages. Bioconductor 2.3 is comprised of 294 software packages and is compatible with the recently released R 2.8.0. Please visit http://bioconductor.org for details and downloads. IMPORTANT NOTE FOR MAC USERS: R 2.8.0 is using a new Mac OS X binary package distribution system and the CRAN and BioC repositories need to catch up with this change. If you are using Mac OS X, please refrain from migrating to R-2.8.0 until these new binary package repositories are put in place, or use 'type="source"' when installing packages using biocLite. Contents ======== o Release Highlight o Getting Started with Bioconductor 2.3 o New Software Packages o Software Packages in 2.2 that didn't make it to 2.3 Release Highlight ================= This release contains a septet of packages (BSgenome, Biostrings, ShortRead, IRanges, HilbertVis, HilbertVisGUI, and rtracklayer) that are suited to analyze 'next generation' high-throughput DNA sequence data. The BSgenome package provides the backbone for representing genome sequences from many different organisms including human, mouse, rat, dog, chimp, chicken, cow, fruit fly, honey bee, yeast, E. coli, C. elegans, and arabidopsis. The Biostrings package performs fast or flexible alignments between reads and genomes. The ShortRead package provides tools for importation/exportation and quality assurance of common data formats. The IRanges package offers an emerging infrastructure for managing very large data objects and for range- based data representation. The packages HilbertVis and HilbertVisGUI display data with space-filling (Hilbert) curves that maintain the spatial information implied by the linearity of chromosomes. The rtracklayer package provides an interface to genome browsers and their annotation tracks. Getting Started with Bioconductor 2.3 ===================================== IMPORTANT: MAC USERS: see the important note above. To install Bioconductor 2.3 1. Install R 2.8.0. Bioconductor 2.3 has been designed expressly for this version of R. 2. Follow the instructions here: http://bioconductor.org/docs/install Please visit http://bioconductor.org for details and downloads. New Packages ============ The following packages are new in this release of Bioconductor; visit http://bioconductor.org/packages/release/Software.html for links to all package descriptions. affyContam Structured corruption of cel file data to demonstrate QA effectiveness Agi4x44PreProcess Preprocesses Agilent 4x44 array data ArrayExpress Accesses the ArrayExpress microarray database at EBI arrayMvout Analyzes AffyBatch instances ArrayTools Quality assessment and differentially gene expression detection for Affymetrix GeneChips BicARE Biclustering Analysis and Results Exploration CGHbase Base functions and classes for arrayCGH data analysis CGHregions Dimension reduction for arrayCGH data with minimal information loss ChemmineR Compound Data Mining Framework CMA Synthesis of microarray-based classification DFP Supervised technique for identifying differentially expressed genes using Fuzzy Patterns (FPs). domainsignatures Finds significantly enriched gene classifications based on their InterPro domain structure dualKS Training and classifying gene expression data sets using a Kolmogorov-Smirnov rank-sum based algorithm edgeR Estimates and tests for differential expression in multiple digital gene expression libraries HELP Pipeline for analyzing HELP microarray data that includes graphical and mathematical tools HilbertVis Functions to visualize long vectors of integer data by means of Hilbert curves HilbertVisGUI An interactive tool to visualize long vectors of integer data by means of Hilbert curves IRanges Infrastructure for managing large data objects and range-based data representations ITALICS Normalizes Affymetrix GeneChip Human Mapping 100K and 500K set iterativeBMA Bayesian Model Averaging (BMA) of classification models of 2-class microarray samples iterativeBMAsurv Uses Bayesian Model Averaging (BMA) of survival analysis models of microarray data KCsmart Multi-sample aCGH analysis package using kernel convolution logitT Implements the Logit-t algorithm LPEadj Extends the LPE algorithm MEDME Determines absolute and relative DNA methylation scores from MeDIP enrichment measurements miRNApath Provides pathway enrichment techniques for miRNA expression data microRNA Accesses different data resources for microRNAs minet Implements methods for inferring mutual information networks from data. multiscan Estimates gene expressions from several laser scans of the same microarray parody Provides routines for univariate and multivariate outlier detection PLPE Performs tests for paired high-throughput data RNAither Analyzes cell-based RNAi screens RpsiXML Queries, data structure and interface to visualization of interaction datasets SIM Finds associations between DNA copy number and gene expression ShortRead Representation of high-throughput, short-read sequencing data xmapbridge Plots graphs in the X:Map genome browser Software Packages in 2.2 that didn't make it to 2.3 =================================================== 1. SemSim 2. widgetInvoke [[elided Yahoo spam]] The Biocore Team ------------------------------ Message: 24 Date: Wed, 22 Oct 2008 16:04:53 -0500 From: Jenny Drnevich <drnevich@illinois.edu> Subject: Re: [BioC] How to save result from limma To: Gordon K Smyth <smyth at="" wehi.edu.au=""> Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <200810222104.m9ML4ss3012169 at expredir5.cites.uiuc.edu> Content-Type: text/plain; charset="us-ascii"; format=flowed Hi Gordon, I just downloaded the new R 2.8.0 release and limma 2.16.0. I was checking out the new sort="none" option in topTable and I found an error in the help page for ?topTable. It doesn't list sort.by="none" as a possibility for topTable or toptable, but does list it for topTableF. However, it's actually the reverse; sort.by="none" works when using topTable with only 1 coef but not with more than 1 coef. Just thought I'd let you and the archives know... Thanks again! Jenny At 07:15 PM 8/19/2008, Gordon K Smyth wrote: >On Tue, 19 Aug 2008, Jenny Drnevich wrote: > >>At 06:14 PM 8/13/2008, Gordon K Smyth wrote: >>>OK, I've added sort="none" to the possibilities. >>>Best wishes >>>Gordon >> >>Hi Gordon, >> >>Should this change to topTable be up on BioC by now? I just updated >>my packages on R 2.7.1 and the latest limma_2.14.5 does not have >>it. Neither does the developmental version limma_2.15.10 on R 2.8.0 >>dev. Usually your changes appear very quickly... >> [[elided Yahoo spam]] >>Jenny > >Not committed to BioC yet. I'm getting older and slower. Also, >there will be a number of code additions in my next commit to BioC, >which I'm still finalising. > >Best wishes >Gordon Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu ------------------------------ Message: 25 Date: Wed, 22 Oct 2008 17:39:24 -0400 From: "Hui-Yi Chu" <huiyi.chu@gmail.com> Subject: [BioC] scale questions To: bioconductor at stat.math.ethz.ch Message-ID: <aaeddd3f0810221439r19fe9f56rc06ffc812fa8405e at="" mail.gmail.com=""> Content-Type: text/plain Dear List, I think this may be a simple question for you but I wanna make it sure for further steps. I have already done some of data pre-processing procedures for my affymetrix yeast2 arrays. My next step is to get *ratios* from various conditions in wt and mutant following by fold-change comparison. So my question is which step I should scale my dataframe for comparison? Here are parts of my codes (codes with underline are the questions): wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17]) wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18]) wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2) wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f)) *wt.pt.f <- scale(exprs(wt.pt.f)) ### not sure *mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21]) mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22]) mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2) mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f)) *mut.pt.f <- scale(exprs(mut.pt.f)) **### not sure* gg <- cbind(wt.pt.f, mut.pt.f) *gg <- scale(gg)* *### not sure* pt.f1 <- gg[,3]-gg[,1] pt.f2 <- gg[,4]-gg[,2] gg1 <- cbind(gg, pt.f1, pt.f2) gg2 <- new("ExpressionSet", exprs=as.matrix(gg)) ....... followed by further steps As seen above, where should I scale the ratios? Scale wt and mut separately or together before getting pt.f1 and pt.f2 ratios?? [[elided Yahoo spam]] Hui-Yi [[alternative HTML version deleted]] ------------------------------ Message: 26 Date: Wed, 22 Oct 2008 15:09:08 -0700 From: Florian Hahne <fhahne@fhcrc.org> Subject: Re: [BioC] [Fwd: batch info for cellHTS] To: Yan Zhou <yan.zhou at="" fccc.edu=""> Cc: Bioconductor_help <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <48FFA484.7050405 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Yan, the idea was that the user constructs the batch array manually and assigns it to the batch slot using the accessor method, e.g. bt <- array(rep(1:2, each=5, dim=c(5,2,1)) batch(x) <- bt I agree that it would be useful to have that functionality directly in the import functions. The only natural place where the batch information could go is in the platelist file. In the plateconf file, we don't have the notion of plate replicates or samples any more. Attached you find a slightly modified readPlateList function which evaluates an (optional) "Batch" column in the platelist file. Hope that is the functionality you where looking for. Please let me know if you have further input. Thanks, Florian PS: I forwarded this conversation to the mailing list. Others might benefit from it... readPlateList <- function(filename, path=dirname(filename), name, importFun, verbose=interactive()) { file <- basename(filename) dfiles <- dir(path) if(!(is.character(path)&&length(path)==1)) stop("'path' must be character of length 1") pd <- read.table(file.path(path, file), sep="\t", header=TRUE, as.is=TRUE) checkColumns(pd, file, mandatory=c("Filename", "Plate", "Replicate"), numeric=c("Plate", "Replicate", "Channel", "Batch")) ## consistency check for "importFun" if (!missing(importFun)) { if (!is(importFun, "function")) stop("'importFun' should be a function to use to read the raw data files") } else { ## default function (compatible with the file format of the plate reader) importFun <- function(f) { txt <- readLines(f, warn=FALSE) sp <- strsplit(txt, "\t") well <- sapply(sp, "[", 2) val <- sapply(sp, "[", 3) out <- list(data.frame(well=I(well), val=as.numeric(val)), txt=I(txt)) return(out) } } ## check if the data files are in the given directory a <- unlist(sapply(pd$Filename, function(z) grep(z, dfiles, ignore.case=TRUE))) if (length(a)==0) stop(sprintf("None of the files were found in the given 'path': %s", path)) f <- file.path(path, dfiles[a]) ## check if 'importFun' gives the output in the desired form aux <- importFun(f[1]) if (which(unlist(lapply(aux, is, "data.frame"))) != 1 | !all(c("val", "well") %in% names(aux[[1]])) | length(aux)!=2) stop("The output of 'importFun' must be a list with 2 components;\n", "the first component should be a 'data.frame' with slots 'well' and 'val'.") ## auto-determine the plate format well <- as.character(importFun(f[1])[[1]]$well) let <- substr(well, 1, 1) num <- substr(well, 2, 3) let <- match(let, LETTERS) num <- as.integer(num) if(anyis.na(let))||anyis.na(num))) stop(sprintf("Malformated column 'well' in input file %s", f[1])) dimPlate <- c(nrow=max(let), ncol=max(num)) nrWell <- prod(dimPlate) if(verbose) cat(sprintf("%s: found data in %d x %d (%d well) format.\n", name, dimPlate[1], dimPlate[2], nrWell)) ## Should we check whether these are true? ## "96" = c(nrow=8, ncol=12), ## "384" = c(nrow=16, ncol=24), nrRep <- max(pd$Replicate) nrPlate <- max(pd$Plate) combo <- paste(pd$Plate, pd$Replicate) ## Channel: if not given, this implies that there is just one if("Channel" %in% colnames(pd)) { nrChannel <- max(pd$Channel) channel <- pd$Channel combo <- paste(combo, pd$Channel) } else { nrChannel <- 1L channel <- rep(1L, nrow(pd)) pd$Channel <- channel } whDup <- which(duplicated(combo)) if(length(whDup)>0L) { idx <- whDup[1:min(5L, length(whDup))] msg <- paste("The following rows are duplicated in the plateList table:\n", "\tPlate Replicate Channel\n", "\t", paste(idx, combo[idx], sep="\t", collapse="\n\t"), if(length(whDup)>5) sprintf("\n\t...and %d more.\n", length(whDup)-5), "\n", sep="") stop(msg) } xraw <- array(NA_real_, dim=c(nrWell, nrPlate, nrRep, nrChannel)) intensityFiles <- vector(mode="list", length=nrow(pd)) names(intensityFiles) <- pd[, "Filename"] status <- character(nrow(pd)) for(i in seq_len(nrow(pd))) { if(verbose) cat("\rReading ", i, ": ", pd$Filename[i], sep="") ff <- grep(pd[i, "Filename"], dfiles, ignore.case=TRUE) if (length(ff)!=1) { f <- file.path(path, pd[i, "Filename"]) status[i] <- sprintf("File not found: %s", f) } else { f <- file.path(path, dfiles[ff]) names(intensityFiles)[i] <- dfiles[ff] status[i] <- tryCatch({ out <- importFun(f) pos <- convertWellCoordinates(out[[1]]$well, dimPlate)$num intensityFiles[[i]] <- out[[2]] xraw[pos, pd$Plate[i], pd$Replicate[i], channel[i]] <- out[[1]]$val "OK" }, warning=function(e) paste(class(e)[1], e$message, sep=": "), error=function(e) paste(class(e)[1], e$message, sep=": ") ) ## tryCatch } ## else } ## for if(verbose) cat("\rRead", nrow(pd), "plates. \n\n") ## ---- Store the data as a "cellHTS" object ---- ## arrange the assayData slot: dat <- lapply(seq_len(nrChannel), function(ch) matrix(xraw[,,,ch], ncol=nrRep, nrow=nrWell*nrPlate)) names(dat) <- paste("ch", seq_len(nrChannel), sep="") adata <- do.call("assayDataNew", c(storage.mode="lockedEnvironment", dat)) ## arrange the phenoData slot: pdata <- new("AnnotatedDataFrame", data <- data.frame(replicate=seq_len(nrRep), assay=rep(name, nrRep), stringsAsFactors=FALSE), varMetadata=data.frame(labelDescription=c("Replicate number", "Biological assay"), channel=factor(rep("_ALL_", 2L), levels=c(names(dat), "_ALL_")), row.names=c("replicate", "assay"), stringsAsFactors=FALSE)) ## arrange the featureData slot: well <- convertWellCoordinates(seq_len(nrWell), pdim=dimPlate)$letnum fdata <- new("AnnotatedDataFrame", data <- data.frame(plate=rep(seq_len(nrPlate), each=nrWell), well=rep(well, nrPlate), controlStatus=factor(rep("unknown", nrWell*nrPlate)), stringsAsFactors=FALSE), varMetadata=data.frame(labelDescription=c("Plate number", "Well ID", "Well annotation"), row.names=c("plate", "well", "controlStatus"), stringsAsFactors=FALSE)) res <- new("cellHTS", assayData=adata, phenoData=pdata, featureData=fdata, plateList=cbind(pd[,1L,drop=FALSE], status=I(status), pd[,-1L,drop=FALSE]), intensityFiles=intensityFiles) ## if there is a batch column in the platelist file we want to import it if("Batch" %in% colnames(pd)){ bat <- pd$Batch[order(pd$Replicate, pd$Channel)] dim(bat) <- c(max(plate(res)), ncol(res), length(channelNames(res))) res at batch <- bat } ## output the possible errors that were encountered along the way: whHadProbs <- which(status!="OK") if(length(whHadProbs)>0 & verbose) { idx <- whHadProbs[1:min(5, length(whHadProbs))] msg <- paste("Please check the following problems encountered while reading the data:\n", "\tFilename \t Error\n", "\t", paste(plateList(res)$Filename[idx], status[idx], sep="\t", collapse="\n\t"), if(length(whHadProbs)>5) sprintf("\n\t...and %d more.\n", length(whHadProbs)-5), "\n", sep="") stop(msg) } return(res) } Yan Zhou wrote: > Dear Florian, > > I understand the different meaning of batch from the 2 cellHTS > packages. I just don't know how to add the "batch" slot.(we have a > large screen with screen done on different day,we want to use the > variance adjustment by batch function). I tried to add a "batch" > column in the plate configuration file, but doesn't seem to be taken > into cellHTS2 object. then I tried to add "batch" column in the plate > list file , still didn't do anything. In another word, I couldn't > figure out how to add the batch slot to the cellHTS object. please [[elided Yahoo spam]] > > yan > > Florian Hahne wrote: > >> Hi Yan, >> Ligia forwarded your mail to me. >> >> The batch concept is a little bit different in cellHTS2. Basically, >> we separated the ability to included several plate configurations and >> the batch-specific parameter estimation (e.g. in the normalizePlate >> function). For the former, you can use a regular expression syntax in >> your plate configuration file, while the latter has to be added >> manually into the new 'batch' slot. All of this is explained in much >> more detail in the package vignette: cellHTS2 - Main vignette >> (complete version): End-to-end analysis of cell-based screens >> >> Let me know if this doesn't get you anywhere. >> >> Florian >> >> ligia at ebi.ac.uk wrote: >> >>> Hi Florian, >>> >>> Hope you're doing fine. >>> Could you take care of this for me? >>> Cheers, >>> Ligia >>> >>> >>> ------------------------- Original Message ---------------------------- >>> Subject: batch info for cellHTS >>> From: "Yan Zhou" <yan.zhou at="" fccc.edu=""> >>> Date: Fri, October 17, 2008 19:14 >>> To: ligia at ebi.ac.uk >>> ------------------------------------------------------------------ -------- >>> >>> >>> Dear Ligia, >>> >>> I'm using the cellHTS2 package for HTS data analysis. For the old >>> cellHTS package, I knew how to incorporate the batch information. But >>> for the new cellHTS2 package, I couldn't have it done right. I attached >>> my plate configuration file and also plate list file with this email. I >>> was wondering whether you would have time to help me take a look, and >>> point me to the right directions. Thanks a lot for your time and >>> kind help. >>> >>> yan >>> >> >> >> > -- Florian Hahne, PhD Computational Biology Program Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-3148 fhahne at fhcrc.org ------------------------------ Message: 27 Date: Wed, 22 Oct 2008 19:19:26 -0400 From: "Mark Kimpel" <mwkimpel@gmail.com> Subject: [BioC] problem with Category package and custom annotationDbi To: "Bioconductor Newsgroup" <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <6b93d1830810221619x7ad5a565hf08bf287839f74f0 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Today I believe I successfuly built a annotation package for my Affy Rat Gene ST data using annotationDbi, at least I got no errors during the build and it loads properly. I get the following error output, however, when I try to run hyperGTest, package Category, on a vector of Entrez Gene IDs and a vector of the gene universe of the chip. I suspect I did something wrong when building the annotation package, but I have no clue what that could be. I've used this same code with chipsets whose annotation packages are built by the BioConductor team without issue. > params <- new("GOHyperGParams", geneIds = myEGs, + universeGeneIds = myGeneUniverse, + annotation = annotation(AOP$eSet), + ontology = "BP", pvalueCutoff = 0.05, conditional = TRUE, testDirection = "over") > params A GOHyperGParams instance category: GO annotation: ragene10stv1 > hyperGTest(params) Error in getUniverseHelper(probes, datPkg, entrezIds) : No Entrez Gene ids left in universe Enter a frame number, or 0 to exit 1: hyperGTest(params) 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", "hyperGT 3: is(object, Cl) 4: is(object, Cl) 5: universeBuilder(p) 6: universeBuilder(p) 7: getUniverseViaGo(p) 8: getUniverseHelper(probes, datPkg, entrezIds) Selection: 8 Called from: eval(expr, envir, enclos) Browse[1]> ls() [1] "datPkg" "entrezIds" "probes" "univ" Browse[1]> datPkg An object of class "ArabadopsisDatPkg" Slot "name": [1] "ragene10stv1" Browse[1]> entrezIds[1:5] [1] "65049" "60444" "313914" "140941" "306868" Browse[1]> probes[1:5] [1] "10701636" "10701643" "10701654" "10701663" "10701679" Browse[1]> univ character(0) Browse[1]> ------------------------------------------------------------ Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 399-1219 Home Skype: mkimpel "The real problem is not whether machines think but whether men do." -- B. F. Skinner ------------------------------ Message: 28 Date: Wed, 22 Oct 2008 16:41:38 -0700 From: Marc Carlson <mcarlson@fhcrc.org> Subject: Re: [BioC] problem with Category package and custom annotationDbi To: Mark Kimpel <mwkimpel at="" gmail.com=""> Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <48FFBA32.3030406 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1 Hi Mark, If the package you built and installed were called "MarkPackage.db", then what would be the output of "MarkPackage()" (right after you loaded it)? Also, what is the output of sessionInfo()? Marc Mark Kimpel wrote: > Today I believe I successfuly built a annotation package for my Affy > Rat Gene ST data using annotationDbi, at least I got no errors during > the build and it loads properly. I get the following error output, > however, when I try to run hyperGTest, package Category, on a vector > of Entrez Gene IDs and a vector of the gene universe of the chip. I > suspect I did something wrong when building the annotation package, > but I have no clue what that could be. I've used this same code with > chipsets whose annotation packages are built by the BioConductor team > without issue. > > >> params <- new("GOHyperGParams", geneIds = myEGs, >> > + universeGeneIds = myGeneUniverse, > + annotation = annotation(AOP$eSet), > + ontology = "BP", pvalueCutoff = 0.05, conditional > = TRUE, testDirection = "over") > >> params >> > A GOHyperGParams instance > category: GO > annotation: ragene10stv1 > >> hyperGTest(params) >> > Error in getUniverseHelper(probes, datPkg, entrezIds) : > No Entrez Gene ids left in universe > > Enter a frame number, or 0 to exit > > 1: hyperGTest(params) > 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", "hyperGT > 3: is(object, Cl) > 4: is(object, Cl) > 5: universeBuilder(p) > 6: universeBuilder(p) > 7: getUniverseViaGo(p) > 8: getUniverseHelper(probes, datPkg, entrezIds) > > Selection: 8 > Called from: eval(expr, envir, enclos) > Browse[1]> ls() > [1] "datPkg" "entrezIds" "probes" "univ" > Browse[1]> datPkg > An object of class "ArabadopsisDatPkg" > Slot "name": > [1] "ragene10stv1" > > Browse[1]> entrezIds[1:5] > [1] "65049" "60444" "313914" "140941" "306868" > Browse[1]> probes[1:5] > [1] "10701636" "10701643" "10701654" "10701663" "10701679" > Browse[1]> univ > character(0) > Browse[1]> > > ------------------------------------------------------------ > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry > Indiana University School of Medicine > > 15032 Hunter Court, Westfield, IN 46074 > > (317) 490-5129 Work, & Mobile & VoiceMail > (317) 399-1219 Home > Skype: mkimpel > > "The real problem is not whether machines think but whether men do." > -- B. F. Skinner > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ------------------------------ Message: 29 Date: Wed, 22 Oct 2008 19:51:17 -0400 From: "Sean Davis" <sdavis2@mail.nih.gov> Subject: Re: [BioC] scale questions To: "Hui-Yi Chu" <huiyi.chu at="" gmail.com=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <264855a00810221651n656b5a8ak226daa013f282fab at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 On Wed, Oct 22, 2008 at 5:39 PM, Hui-Yi Chu <huiyi.chu at="" gmail.com=""> wrote: > Dear List, > > I think this may be a simple question for you but I wanna make it sure for > further steps. > I have already done some of data pre-processing procedures for my affymetrix > yeast2 arrays. My next step is to get *ratios* from various conditions in wt > and mutant following by fold-change comparison. So my question is which step > I should scale my dataframe for comparison? > > Here are parts of my codes (codes with underline are the questions): > > wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17]) > wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18]) > wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2) > wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f)) > *wt.pt.f <- scale(exprs(wt.pt.f)) ### not sure > > *mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21]) > mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22]) > mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2) > mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f)) > *mut.pt.f <- scale(exprs(mut.pt.f)) **### not sure* > > gg <- cbind(wt.pt.f, mut.pt.f) > *gg <- scale(gg)* *### not sure* > pt.f1 <- gg[,3]-gg[,1] > pt.f2 <- gg[,4]-gg[,2] > gg1 <- cbind(gg, pt.f1, pt.f2) > gg2 <- new("ExpressionSet", exprs=as.matrix(gg)) > ....... followed by further steps > > > As seen above, where should I scale the ratios? Scale wt and mut separately > or together before getting pt.f1 and pt.f2 ratios?? [[elided Yahoo spam]] I may be misunderstanding, but why do you want to scale the log ratios? Generally, you would not scale them at all. And I really cannot tell why you are forming ratios in the first place? Do you have replicates of any kind? Sean ------------------------------ Message: 30 Date: Wed, 22 Oct 2008 20:31:33 -0400 From: "Sean Davis" <sdavis2@mail.nih.gov> Subject: Re: [BioC] scale questions To: "Hui-Yi Chu" <huiyi.chu at="" gmail.com=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <264855a00810221731p1abc243cj1a59dc6726ebc7e0 at mail.gmail.com> Content-Type: text/plain; charset=UTF-8 On Wed, Oct 22, 2008 at 8:16 PM, Hui-Yi Chu <huiyi.chu at="" gmail.com=""> wrote: > Hi Sean, > > Yes, they are replicate in ratios. In other words, these ratios are from > untreatment and treatment of wt1, wt2, mut1,mut2, so total is 8 ratios as > below. > > untreatmen wt1, wt2, mut1, mut2 > treatment wt1, wt2, mut1, mut2 > > And I wanna get the second values like: > r1: untreatment mut1/wt1 > r2: untreatment mut2/wt2 > r3: treatment mut1/wt1 > r4: treatment mut2/wt2 > > Now we have 4 ratios. And then I will compare r3 vs r1, r4 vs r2 to get most > fold changes genes. (I know I need three replicates, but I cannot convince > my adviser, therefore, this is the strategy I can use so far. ) So the > question is should I scale the 8 ratios from wt and mut separately (twice) > or together(once) before I get the four values? Hi, Hui-Yi. You should not do any scaling. You can use the log fold changes directly. You seem to understand that this is not an optimal design, so I won't belabor that point. Sean > > On Wed, Oct 22, 2008 at 7:51 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >> >> On Wed, Oct 22, 2008 at 5:39 PM, Hui-Yi Chu <huiyi.chu at="" gmail.com=""> wrote: >> > Dear List, >> > >> > I think this may be a simple question for you but I wanna make it sure >> > for >> > further steps. >> > I have already done some of data pre-processing procedures for my >> > affymetrix >> > yeast2 arrays. My next step is to get *ratios* from various conditions >> > in wt >> > and mutant following by fold-change comparison. So my question is which >> > step >> > I should scale my dataframe for comparison? >> > >> > Here are parts of my codes (codes with underline are the questions): >> > >> > wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17]) >> > wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18]) >> > wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2) >> > wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f)) >> > *wt.pt.f <- scale(exprs(wt.pt.f)) ### not sure >> > >> > *mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21]) >> > mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22]) >> > mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2) >> > mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f)) >> > *mut.pt.f <- scale(exprs(mut.pt.f)) **### not sure* >> > >> > gg <- cbind(wt.pt.f, mut.pt.f) >> > *gg <- scale(gg)* *### not sure* >> > pt.f1 <- gg[,3]-gg[,1] >> > pt.f2 <- gg[,4]-gg[,2] >> > gg1 <- cbind(gg, pt.f1, pt.f2) >> > gg2 <- new("ExpressionSet", exprs=as.matrix(gg)) >> > ....... followed by further steps >> > >> > >> > As seen above, where should I scale the ratios? Scale wt and mut >> > separately >> > or together before getting pt.f1 and pt.f2 ratios?? [[elided Yahoo spam]] >> >> I may be misunderstanding, but why do you want to scale the log >> ratios? Generally, you would not scale them at all. And I really >> cannot tell why you are forming ratios in the first place? Do you >> have replicates of any kind? >> >> Sean > > ------------------------------ Message: 31 Date: Wed, 22 Oct 2008 21:26:16 -0400 From: "Mark Kimpel" <mwkimpel@gmail.com> Subject: Re: [BioC] problem with Category package and custom annotationDbi To: "Marc Carlson" <mcarlson at="" fhcrc.org=""> Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <6b93d1830810221826y71050c5ckb14dbca4644c9bf7 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 > ragene10stv1() Quality control information for ragene10stv1: This package has the following mappings: ragene10stv1ACCNUM has 0 mapped keys (of 29214 keys) ragene10stv1ALIAS2PROBE has 31410 mapped keys (of 31410 keys) ragene10stv1CHR has 20998 mapped keys (of 29214 keys) ragene10stv1CHRLENGTHS has 23 mapped keys (of 23 keys) ragene10stv1CHRLOC has 13092 mapped keys (of 29214 keys) ragene10stv1CHRLOCEND has 13092 mapped keys (of 29214 keys) ragene10stv1ENSEMBL has 15812 mapped keys (of 29214 keys) ragene10stv1ENSEMBL2PROBE has 14570 mapped keys (of 14570 keys) ragene10stv1ENTREZID has 21170 mapped keys (of 29214 keys) ragene10stv1ENZYME has 1586 mapped keys (of 29214 keys) ragene10stv1ENZYME2PROBE has 705 mapped keys (of 705 keys) ragene10stv1GENENAME has 21170 mapped keys (of 29214 keys) ragene10stv1GO has 13813 mapped keys (of 29214 keys) ragene10stv1GO2ALLPROBES has 9850 mapped keys (of 9850 keys) ragene10stv1GO2PROBE has 7430 mapped keys (of 7430 keys) ragene10stv1MAP has 20351 mapped keys (of 29214 keys) ragene10stv1PATH has 4357 mapped keys (of 29214 keys) ragene10stv1PATH2PROBE has 206 mapped keys (of 206 keys) ragene10stv1PFAM has 17187 mapped keys (of 29214 keys) ragene10stv1PMID has 12198 mapped keys (of 29214 keys) ragene10stv1PMID2PROBE has 36641 mapped keys (of 36641 keys) ragene10stv1PROSITE has 17187 mapped keys (of 29214 keys) ragene10stv1REFSEQ has 18683 mapped keys (of 29214 keys) ragene10stv1SYMBOL has 21170 mapped keys (of 29214 keys) ragene10stv1UNIGENE has 17585 mapped keys (of 29214 keys) ragene10stv1UNIPROT has 9826 mapped keys (of 29214 keys) Additional Information about this package: DB schema: RATCHIP_DB DB schema version: 1.0 Organism: Rattus norvegicus Date for NCBI data: 2008-Sep2 Date for GO data: 200808 Date for KEGG data: 2008-Sep2 Date for Golden Path data: 2006-Jun20 Date for IPI data: 2008-Sep02 Date for Ensembl data: 2008-Jul23 > sessionInfo() R version 2.8.0 (2008-10-20) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ragene10stv1.db_1.0.0 RSQLite_0.7-0 DBI_0.2-4 [4] AnnotationDbi_1.4.0 Biobase_2.2.0 graph_1.20.0 loaded via a namespace (and not attached): [1] cluster_1.11.11 > ------------------------------------------------------------ Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 399-1219 Home Skype: mkimpel "The real problem is not whether machines think but whether men do." -- B. F. Skinner ****************************************************************** On Wed, Oct 22, 2008 at 7:41 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: > Hi Mark, > > If the package you built and installed were called "MarkPackage.db", > then what would be the output of "MarkPackage()" (right after you loaded > it)? Also, what is the output of sessionInfo()? > > Marc > > > > > Mark Kimpel wrote: >> Today I believe I successfuly built a annotation package for my Affy >> Rat Gene ST data using annotationDbi, at least I got no errors during >> the build and it loads properly. I get the following error output, >> however, when I try to run hyperGTest, package Category, on a vector >> of Entrez Gene IDs and a vector of the gene universe of the chip. I >> suspect I did something wrong when building the annotation package, >> but I have no clue what that could be. I've used this same code with >> chipsets whose annotation packages are built by the BioConductor team >> without issue. >> >> >>> params <- new("GOHyperGParams", geneIds = myEGs, >>> >> + universeGeneIds = myGeneUniverse, >> + annotation = annotation(AOP$eSet), >> + ontology = "BP", pvalueCutoff = 0.05, conditional >> = TRUE, testDirection = "over") >> >>> params >>> >> A GOHyperGParams instance >> category: GO >> annotation: ragene10stv1 >> >>> hyperGTest(params) >>> >> Error in getUniverseHelper(probes, datPkg, entrezIds) : >> No Entrez Gene ids left in universe >> >> Enter a frame number, or 0 to exit >> >> 1: hyperGTest(params) >> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", "hyperGT >> 3: is(object, Cl) >> 4: is(object, Cl) >> 5: universeBuilder(p) >> 6: universeBuilder(p) >> 7: getUniverseViaGo(p) >> 8: getUniverseHelper(probes, datPkg, entrezIds) >> >> Selection: 8 >> Called from: eval(expr, envir, enclos) >> Browse[1]> ls() >> [1] "datPkg" "entrezIds" "probes" "univ" >> Browse[1]> datPkg >> An object of class "ArabadopsisDatPkg" >> Slot "name": >> [1] "ragene10stv1" >> >> Browse[1]> entrezIds[1:5] >> [1] "65049" "60444" "313914" "140941" "306868" >> Browse[1]> probes[1:5] >> [1] "10701636" "10701643" "10701654" "10701663" "10701679" >> Browse[1]> univ >> character(0) >> Browse[1]> >> >> ------------------------------------------------------------ >> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry >> Indiana University School of Medicine >> >> 15032 Hunter Court, Westfield, IN 46074 >> >> (317) 490-5129 Work, & Mobile & VoiceMail >> (317) 399-1219 Home >> Skype: mkimpel >> >> "The real problem is not whether machines think but whether men do." >> -- B. F. Skinner >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > ------------------------------ Message: 32 Date: Thu, 23 Oct 2008 14:07:48 +1100 (AUS Eastern Daylight Time) From: Gordon K Smyth <smyth@wehi.edu.au> Subject: Re: [BioC] How to save result from limma To: Jenny Drnevich <drnevich at="" illinois.edu=""> Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <pine.wnt.4.64.0810231405491.2800 at="" pc602.alpha.wehi.edu.au=""> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Dear Jenny, Thanks for the heads-up. I believe I have fixed it entirely now in limma 2.16.2. Try it out and see if you can break it. Best wishes Gordon On Wed, 22 Oct 2008, Jenny Drnevich wrote: > Hi Gordon, > > I just downloaded the new R 2.8.0 release and limma 2.16.0. I was checking > out the new sort="none" option in topTable and I found an error in the help > page for ?topTable. It doesn't list sort.by="none" as a possibility for > topTable or toptable, but does list it for topTableF. However, it's actually > the reverse; sort.by="none" works when using topTable with only 1 coef but > not with more than 1 coef. Just thought I'd let you and the archives know... > > Thanks again! > Jenny > > At 07:15 PM 8/19/2008, Gordon K Smyth wrote: > > >> On Tue, 19 Aug 2008, Jenny Drnevich wrote: >> >>> At 06:14 PM 8/13/2008, Gordon K Smyth wrote: >>>> OK, I've added sort="none" to the possibilities. >>>> Best wishes >>>> Gordon >>> >>> Hi Gordon, >>> >>> Should this change to topTable be up on BioC by now? I just updated my >>> packages on R 2.7.1 and the latest limma_2.14.5 does not have it. Neither >>> does the developmental version limma_2.15.10 on R 2.8.0 dev. Usually your >>> changes appear very quickly... >>> [[elided Yahoo spam]] >>> Jenny >> >> Not committed to BioC yet. I'm getting older and slower. Also, there will >> be a number of code additions in my next commit to BioC, which I'm still >> finalising. >> >> Best wishes >> Gordon > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at illinois.edu > ------------------------------ Message: 33 Date: Wed, 22 Oct 2008 22:28:59 -0500 From: "Wei,Caimiao" <caiwei@mdanderson.org> Subject: [BioC] Package "xps" "import.expr.scheme" error To: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <19D18D962A716B4EA5D26CB24F14DC26339D0DE943 at DCPWVMBXC1VS3.mdanderson.edu> Content-Type: text/plain I am importing chip definition and annotation files to create a ROOT scheme and get this error: Error in import.expr.scheme(filename = "Scheme_HGU133p2_na26", filedir = scmdir, : error in function 'ImportExprSchemes' > libdir <- "/mypath/Affy/libraryfiles" > anndir <- "/mypath/Affy/Annotation" > scmdir <- "/mypath/CRAN/Workspaces/Schemes" > > scheme.hgu133p2.na26 <- import.expr.scheme(filename="Scheme_HGU133p2_na26", + filedir=scmdir,schemefile=paste(libdir,"HG- U133_Plus_2.cdf",sep="/"), + probefile=paste(libdir,"HG-U133_Plus_2.probe.tab",sep="/"), + annotfile=paste(anndir,"HG-U133_Plus_2.na26.annot.csv",sep="/")) Error in import.expr.scheme(filename = "Scheme_HGU133p2_na26", filedir = scmdir, : error in function 'ImportExprSchemes' > sessionInfo() R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] xps_1.0.2 > Thanks for any help! Caimiao [[alternative HTML version deleted]] ------------------------------ Message: 34 Date: Thu, 23 Oct 2008 02:49:47 -0400 From: Leon Peshkin <pesha@hms.harvard.edu> Subject: Re: [BioC] Lumi and Beadstudio 1.5.13 To: <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <9EF526FD-9EA0-4489-83D1-6F3E9B93FAE9 at hms.harvard.edu> Content-Type: text/plain; charset="US-ASCII"; format=flowed; delsp=yes Hi Pan, I was wondering if you could help me resolve the issue with lumi package, I am able to load Illumina data with lumiR, but then when I try background adjustment it fails: A0 <- lumiR("killme.txt", convertNuID =FALSE, inputAnnotation =FALSE) > B0 <- lumiB(A0,method='bgAdjust') Error in `[.data.frame`(control, , sampleNames(lumiBatch)) : undefined columns selected -Leon ------------------------------ Message: 35 Date: Thu, 23 Oct 2008 10:43:18 +0200 (CEST) From: Clara de Dessous Ch?ri <email@dessouscheri.emv1.com> Subject: [BioC] Offre exceptionnelle suite au probl?me technique To: <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <20276360648.3897244.1224751398367 at schr1> Content-Type: text/plain Pour etre sur de recevoir tous nos emails, nous vous conseillons d'ajouter email at dessouscheri.emv1.com a votre carnet d'adresses. Si cet email ne s'affiche pas correctement, vous pouvez le visualiser grace a ce lien Copyright 2008 - Copyright Dessous Cheri, Tous droits reserves. Si vous ne souhaitez plus recevoir la newsletter de Dessous Cheri,utilisez le lien de desabonnement [[alternative HTML version deleted]] ------------------------------ _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor End of Bioconductor Digest, Vol 68, Issue 23

aCGH Sequencing miRNA Microarray Annotation Normalization GO Preprocessing Visualization • 2.3k views

ADD COMMENT • link updated 16.4 years ago by Marc Carlson ★ 7.2k • written 16.4 years ago by McGee, Monnie ▴ 300

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 8.6 years ago

United States

Hi Monnie, This is pretty easy once you know about the revmap() function. Here is a quick example: library(hgu95av2.db) mget("1557", revmap(hgu95av2ENTREZID)) Also, if you want to know more, you might want to look at the AnnotationDbi vignette: http://www.bioconductor.org/packages/2.4/bioc/html/AnnotationDbi.html Marc McGee, Monnie wrote: > Here is the previous query with a more descriptive subject. > > > -----Original Message----- > From: McGee, Monnie > Sent: Thu 10/23/2008 11:14 AM > To: bioconductor at stat.math.ethz.ch > Subject: RE: Bioconductor Digest, Vol 68, Issue 23 > > Dear List, > > Is there an elegant way to obtain the name of a probe set from an Affymetrix platform (doesn't matter which one) corresponding to a given ENTREZ gene ID? It seems that it is fairly simple to obtain the entrez ID if you have a probe set, but the reverse problem seems non- trival -at least it is to me. > > There's no particular reason I need to know. I just want to know if it's possible. > > Thanks! > Monnie > > Monnie McGee, Ph.D. > Associate Professor > Department of Statistical Science > Southern Methodist University > Ph: 214-768-2462 > Fax: 214-768-4035 > > > > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch on behalf of bioconductor-request at stat.math.ethz.ch > Sent: Thu 10/23/2008 5:00 AM > To: bioconductor at stat.math.ethz.ch > Subject: Bioconductor Digest, Vol 68, Issue 23 > > Send Bioconductor mailing list submissions to > bioconductor at stat.math.ethz.ch > > To subscribe or unsubscribe via the World Wide Web, visit > https://stat.ethz.ch/mailman/listinfo/bioconductor > or, via email, send a message with subject or body 'help' to > bioconductor-request at stat.math.ethz.ch > > You can reach the person managing the list at > bioconductor-owner at stat.math.ethz.ch > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioconductor digest..." > > > Today's Topics: > > 1. GOstat: listing genes from hyperGTest (Tim Smith) > 2. export toptables into Genespring (Pemmasani, Kalyani) > 3. Re: Limma contrasts question (James W. MacDonald) > 4. Re: GOstat: listing genes from hyperGTest (James W. MacDonald) > 5. Re: Limma contrasts question (Daniel Brewer) > 6. quality assessment and preprocessing for tiling array-based > CGH data (Leon Yee) > 7. GOstats and org.EcK12.eg.db (Robert Castelo) > 8. Re: quality assessment and preprocessing for tiling > array-based CGH data (Sean Davis) > 9. Re: GOstat: listing genes from hyperGTest (Tim Smith) > 10. Re: quality assessment and preprocessing for tiling > array-based CGH data (Leon Yee) > 11. Re: Beadarray and illumina methylation arrays (Mark Dunning) > 12. Re: quality assessment and preprocessing for tiling > array-based CGH data (Sean Davis) > 13. Problem using Rgraphviz (edge weights going missing). (Dan Bolser) > 14. Re: newbie problems with AnnBuilder (Mark Kimpel) > 15. Re: newbie problems with AnnBuilder (Sean Davis) > 16. Re: newbie problems with AnnBuilder (Mark Kimpel) > 17. Re: GOstats and org.EcK12.eg.db (Robert Gentleman) > 18. Re: quality assessment and preprocessing for tiling > array-based CGH data (Leon Yee) > 19. Bioconductor installation problem: unable to access > repository (Shinichiro Wachi) > 20. Re: quality assessment and preprocessing for tiling > array-based CGH data (Sean Davis) > 21. Re: GOstat: listing genes from hyperGTest (James W. MacDonald) > 22. Re: Bioconductor installation problem: unable to access > repository (Patrick Aboyoun) > 23. Bioconductor 2.3 is released (Patrick Aboyoun) > 24. Re: How to save result from limma (Jenny Drnevich) > 25. scale questions (Hui-Yi Chu) > 26. Re: [Fwd: batch info for cellHTS] (Florian Hahne) > 27. problem with Category package and custom annotationDbi > (Mark Kimpel) > 28. Re: problem with Category package and custom annotationDbi > (Marc Carlson) > 29. Re: scale questions (Sean Davis) > 30. Re: scale questions (Sean Davis) > 31. Re: problem with Category package and custom annotationDbi > (Mark Kimpel) > 32. Re: How to save result from limma (Gordon K Smyth) > 33. Package "xps" "import.expr.scheme" error (Wei,Caimiao) > 34. Re: Lumi and Beadstudio 1.5.13 (Leon Peshkin) > 35. Offre exceptionnelle suite au probl?me technique > (Clara de Dessous Ch?ri) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 22 Oct 2008 03:43:33 -0700 (PDT) > From: Tim Smith <tim_smith_666 at="" yahoo.com=""> > Subject: [BioC] GOstat: listing genes from hyperGTest > To: bioc <bioconductor at="" stat.math.ethz.ch=""> > Message-ID: <257981.79114.qm at web58005.mail.re3.yahoo.com> > Content-Type: text/plain > > > Hi, > > I > was performing a hyperGTest for genes in homo-sapiens. For a set of > input genes, this function returns some 'significant' GO terms. What I > wanted to now do was to co-relate each significant GO term (returned by > this function) with genes (from my set of input genes) associated with > that GO term. However, I think that I may be using the wrong > package/function to get the releveant set of genes. > > Currently, what I'm doing is finding the significant GO terms by using the following code: > > ----------------------- > ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs > > paramsGO <- new("GOHyperGParams", geneIds = genes1, > universeGeneIds = allGenes, annotation = "org.Hs.eg.db", > ontology = "BP", pvalueCutoff = 1, conditional = FALSE, > testDirection = "over") > > GO <- hyperGTest(paramsGO) > -------------------------- > This > gives me a set of significant GO terms. Now, I would like to find which > subset of genes in 'genes1' is associated with each of the significant > GO term. To do this I map all GO terms to their Entrez IDs using the > 'org.Hs.eg.db' package using the following: > > xx <- as.list(org.Hs.egGO2EG) > > to > get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't > this number small?) that map to at least one Entrez ID. So, from here I > look up which Entrez IDs are associated with my GO term of interest. > > My > problem is that often, the GO term from hyperGTest is not associated > with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described > above ), i.e. the GO term/ID is not in the list obtained from > 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by > hyperGTest, but does not appear to be associated with any Entrez IDs in > the org.Hs.eg.db package. Where could I be going wrong? > > I would give a set of genes so that the example is reproducible, but [[elided Yahoo spam]] > > Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here.... > > My sessionInfo() is: > -------------------------------- > R version 2.7.2 (2008-08-25) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United > States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] grid splines tools stats graphics grDevices utils datasets methods base > > other attached packages: > [1] > gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0 > gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0 > Category_2.6.0 > [8] RBGL_1.16.0 annotate_1.18.0 > xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0 > GO.db_2.2.0 KEGG.db_2.2.0 > [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0 > [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 MASS_7.2-44 > > > --------------------------------- > > > > [[alternative HTML version deleted]] > > > > ------------------------------ > > Message: 2 > Date: Wed, 22 Oct 2008 12:34:38 +0100 > From: "Pemmasani, Kalyani" <kalyani.pemmasani at="" nuigalway.ie=""> > Subject: [BioC] export toptables into Genespring > To: <bioconductor at="" stat.math.ethz.ch=""> > Message-ID: > <6B017AD2AE2F6F489087FC986588136B88FA42 at EVS1.ac.nuigalway.ie> > Content-Type: text/plain; charset="iso-8859-1" > > > Hi all, > > Is there a way to export toptables from LIMMA into Genespring software program (from Agilent technologies) for clustering? > > Best regards, > Kalyani > ------------------------------------------- > Kalyani Pemmasani > Marie Curie research fellow > National Diagnostics Centre > National University of Ireland > Galway, IRELAND > e.mail: kalyani.pemmasani at nuigalway.ie > Ph.no: +353(0)91492815 > Fax: +353 (0) 91586570 > > > > ------------------------------ > > Message: 3 > Date: Wed, 22 Oct 2008 09:07:16 -0400 > From: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> > Subject: Re: [BioC] Limma contrasts question > To: Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> > Cc: bioconductor at stat.math.ethz.ch > Message-ID: <48FF2584.5010509 at med.umich.edu> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Daniel Brewer wrote: > > >> Hi Jim, >> >> Could you go into the maths of the contrast formulas a bit? I would >> like to get a really solid understanding of what it is doing for future >> analyses. >> > > Once you understand what the coefficients are, the contrasts are just > simple algebra. In your case, all of the coefficients are estimating the > difference between the sample and PC3M (e.g., Knockdown - PC3M). > > So the algebra is something like this: > > 2(Knockdown - PC3M) - (Scramble - PC3M) > = > 2Knockdown - 2PC3M - Scramble + PC3M > = > 2Knockdown - Scramble - PC3M > = > Knockdown - (Scramble + PC3M)/2 > > Which is knockdown minus the mean of the controls. > > Note that this will be the numerator of the resulting t-statistic. The > denominator will be sort of an average of the variability within each of > the three groups being compared. So the question being answered is 'What > genes are different in Knockdown as compared to the average of the > controls?'. However, there is nothing here to test if the two controls > are similar at all (and you might not care). > > So for instance, you might have a gene with average expression like this: > > Knockdown = 10 > PC3M = 4 > Scramble = 7 > > If the intra-group variability is small for each sample type, then you > will likely get a significant t-statistic even though the two controls > are probably significantly different as well. Which is why I mentioned > earlier that you might want to test the Scramble - PC3M contrast as well. > > Best, > > Jim > > > >> Many thanks >> >> Dan >> >> > >

ADD COMMENT • link 16.4 years ago Marc Carlson ★ 7.2k