Entering edit mode
McGee, Monnie
▴
300
@mcgee-monnie-1108
Last seen 10.4 years ago
Here is the previous query with a more descriptive subject.
-----Original Message-----
From: McGee, Monnie
Sent: Thu 10/23/2008 11:14 AM
To: bioconductor at stat.math.ethz.ch
Subject: RE: Bioconductor Digest, Vol 68, Issue 23
Dear List,
Is there an elegant way to obtain the name of a probe set from an
Affymetrix platform (doesn't matter which one) corresponding to a
given ENTREZ gene ID? It seems that it is fairly simple to obtain the
entrez ID if you have a probe set, but the reverse problem seems non-
trival -at least it is to me.
There's no particular reason I need to know. I just want to know if
it's possible.
Thanks!
Monnie
Monnie McGee, Ph.D.
Associate Professor
Department of Statistical Science
Southern Methodist University
Ph: 214-768-2462
Fax: 214-768-4035
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch on behalf of
bioconductor-request@stat.math.ethz.ch
Sent: Thu 10/23/2008 5:00 AM
To: bioconductor at stat.math.ethz.ch
Subject: Bioconductor Digest, Vol 68, Issue 23
Send Bioconductor mailing list submissions to
bioconductor at stat.math.ethz.ch
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
bioconductor-request at stat.math.ethz.ch
You can reach the person managing the list at
bioconductor-owner at stat.math.ethz.ch
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."
Today's Topics:
1. GOstat: listing genes from hyperGTest (Tim Smith)
2. export toptables into Genespring (Pemmasani, Kalyani)
3. Re: Limma contrasts question (James W. MacDonald)
4. Re: GOstat: listing genes from hyperGTest (James W. MacDonald)
5. Re: Limma contrasts question (Daniel Brewer)
6. quality assessment and preprocessing for tiling array-based
CGH data (Leon Yee)
7. GOstats and org.EcK12.eg.db (Robert Castelo)
8. Re: quality assessment and preprocessing for tiling
array-based CGH data (Sean Davis)
9. Re: GOstat: listing genes from hyperGTest (Tim Smith)
10. Re: quality assessment and preprocessing for tiling
array-based CGH data (Leon Yee)
11. Re: Beadarray and illumina methylation arrays (Mark Dunning)
12. Re: quality assessment and preprocessing for tiling
array-based CGH data (Sean Davis)
13. Problem using Rgraphviz (edge weights going missing). (Dan
Bolser)
14. Re: newbie problems with AnnBuilder (Mark Kimpel)
15. Re: newbie problems with AnnBuilder (Sean Davis)
16. Re: newbie problems with AnnBuilder (Mark Kimpel)
17. Re: GOstats and org.EcK12.eg.db (Robert Gentleman)
18. Re: quality assessment and preprocessing for tiling
array-based CGH data (Leon Yee)
19. Bioconductor installation problem: unable to access
repository (Shinichiro Wachi)
20. Re: quality assessment and preprocessing for tiling
array-based CGH data (Sean Davis)
21. Re: GOstat: listing genes from hyperGTest (James W. MacDonald)
22. Re: Bioconductor installation problem: unable to access
repository (Patrick Aboyoun)
23. Bioconductor 2.3 is released (Patrick Aboyoun)
24. Re: How to save result from limma (Jenny Drnevich)
25. scale questions (Hui-Yi Chu)
26. Re: [Fwd: batch info for cellHTS] (Florian Hahne)
27. problem with Category package and custom annotationDbi
(Mark Kimpel)
28. Re: problem with Category package and custom annotationDbi
(Marc Carlson)
29. Re: scale questions (Sean Davis)
30. Re: scale questions (Sean Davis)
31. Re: problem with Category package and custom annotationDbi
(Mark Kimpel)
32. Re: How to save result from limma (Gordon K Smyth)
33. Package "xps" "import.expr.scheme" error (Wei,Caimiao)
34. Re: Lumi and Beadstudio 1.5.13 (Leon Peshkin)
35. Offre exceptionnelle suite au probl?me technique
(Clara de Dessous Ch?ri)
----------------------------------------------------------------------
Message: 1
Date: Wed, 22 Oct 2008 03:43:33 -0700 (PDT)
From: Tim Smith <tim_smith_666@yahoo.com>
Subject: [BioC] GOstat: listing genes from hyperGTest
To: bioc <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <257981.79114.qm at web58005.mail.re3.yahoo.com>
Content-Type: text/plain
Hi,
I
was performing a hyperGTest for genes in homo-sapiens. For a set of
input genes, this function returns some 'significant' GO terms. What I
wanted to now do was to co-relate each significant GO term (returned
by
this function) with genes (from my set of input genes) associated with
that GO term. However, I think that I may be using the wrong
package/function to get the releveant set of genes.
Currently, what I'm doing is finding the significant GO terms by using
the following code:
-----------------------
### 'genes1' are the Entrez IDs of my genes of interest, and
'allGenes' is the universe of Entrez IDs
paramsGO <- new("GOHyperGParams", geneIds = genes1,
universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
testDirection = "over")
GO <- hyperGTest(paramsGO)
--------------------------
This
gives me a set of significant GO terms. Now, I would like to find
which
subset of genes in 'genes1' is associated with each of the significant
GO term. To do this I map all GO terms to their Entrez IDs using the
'org.Hs.eg.db' package using the following:
xx <- as.list(org.Hs.egGO2EG)
to
get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
this number small?) that map to at least one Entrez ID. So, from here
I
look up which Entrez IDs are associated with my GO term of interest.
My
problem is that often, the GO term from hyperGTest is not associated
with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
above ), i.e. the GO term/ID is not in the list obtained from
'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
hyperGTest, but does not appear to be associated with any Entrez IDs
in
the org.Hs.eg.db package. Where could I be going wrong?
I would give a set of genes so that the example is reproducible, but
[[elided Yahoo spam]]
Thanks for any comments/suggestions. I realize that I'm probably doing
something really stupid here....
My sessionInfo() is:
--------------------------------
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=English_United
States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] grid splines tools stats graphics grDevices utils
datasets methods base
other attached packages:
[1]
gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0
gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0
Category_2.6.0
[8] RBGL_1.16.0 annotate_1.18.0
xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0
GO.db_2.2.0 KEGG.db_2.2.0
[15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8
DBI_0.2-4 genefilter_1.20.0 survival_2.34-1
affy_1.18.0
[22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0
loaded via a namespace (and not attached):
[1] cluster_1.11.11 MASS_7.2-44
---------------------------------
[[alternative HTML version deleted]]
------------------------------
Message: 2
Date: Wed, 22 Oct 2008 12:34:38 +0100
From: "Pemmasani, Kalyani" <kalyani.pemmasani@nuigalway.ie>
Subject: [BioC] export toptables into Genespring
To: <bioconductor at="" stat.math.ethz.ch="">
Message-ID:
<6B017AD2AE2F6F489087FC986588136B88FA42 at
EVS1.ac.nuigalway.ie>
Content-Type: text/plain; charset="iso-8859-1"
Hi all,
Is there a way to export toptables from LIMMA into Genespring software
program (from Agilent technologies) for clustering?
Best regards,
Kalyani
-------------------------------------------
Kalyani Pemmasani
Marie Curie research fellow
National Diagnostics Centre
National University of Ireland
Galway, IRELAND
e.mail: kalyani.pemmasani at nuigalway.ie
Ph.no: +353(0)91492815
Fax: +353 (0) 91586570
------------------------------
Message: 3
Date: Wed, 22 Oct 2008 09:07:16 -0400
From: "James W. MacDonald" <jmacdon@med.umich.edu>
Subject: Re: [BioC] Limma contrasts question
To: Daniel Brewer <daniel.brewer at="" icr.ac.uk="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF2584.5010509 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Daniel Brewer wrote:
> Hi Jim,
>
> Could you go into the maths of the contrast formulas a bit? I would
> like to get a really solid understanding of what it is doing for
future
> analyses.
Once you understand what the coefficients are, the contrasts are just
simple algebra. In your case, all of the coefficients are estimating
the
difference between the sample and PC3M (e.g., Knockdown - PC3M).
So the algebra is something like this:
2(Knockdown - PC3M) - (Scramble - PC3M)
=
2Knockdown - 2PC3M - Scramble + PC3M
=
2Knockdown - Scramble - PC3M
=
Knockdown - (Scramble + PC3M)/2
Which is knockdown minus the mean of the controls.
Note that this will be the numerator of the resulting t-statistic. The
denominator will be sort of an average of the variability within each
of
the three groups being compared. So the question being answered is
'What
genes are different in Knockdown as compared to the average of the
controls?'. However, there is nothing here to test if the two controls
are similar at all (and you might not care).
So for instance, you might have a gene with average expression like
this:
Knockdown = 10
PC3M = 4
Scramble = 7
If the intra-group variability is small for each sample type, then you
will likely get a significant t-statistic even though the two controls
are probably significantly different as well. Which is why I mentioned
earlier that you might want to test the Scramble - PC3M contrast as
well.
Best,
Jim
>
> Many thanks
>
> Dan
>
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
------------------------------
Message: 4
Date: Wed, 22 Oct 2008 09:10:39 -0400
From: "James W. MacDonald" <jmacdon@med.umich.edu>
Subject: Re: [BioC] GOstat: listing genes from hyperGTest
Cc: bioc <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <48FF264F.50404 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Tim,
Does probeSetSummary() do what you want?
Best,
Jim
Tim Smith wrote:
>
> Hi,
>
> I
> was performing a hyperGTest for genes in homo-sapiens. For a set of
> input genes, this function returns some 'significant' GO terms. What
I
> wanted to now do was to co-relate each significant GO term (returned
by
> this function) with genes (from my set of input genes) associated
with
> that GO term. However, I think that I may be using the wrong
> package/function to get the releveant set of genes.
>
> Currently, what I'm doing is finding the significant GO terms by
using the following code:
>
> -----------------------
> ### 'genes1' are the Entrez IDs of my genes of interest, and
'allGenes' is the universe of Entrez IDs
>
> paramsGO <- new("GOHyperGParams", geneIds = genes1,
> universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
> ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
> testDirection = "over")
>
> GO <- hyperGTest(paramsGO)
> --------------------------
> This
> gives me a set of significant GO terms. Now, I would like to find
which
> subset of genes in 'genes1' is associated with each of the
significant
> GO term. To do this I map all GO terms to their Entrez IDs using the
> 'org.Hs.eg.db' package using the following:
>
> xx <- as.list(org.Hs.egGO2EG)
>
> to
> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
> this number small?) that map to at least one Entrez ID. So, from
here I
> look up which Entrez IDs are associated with my GO term of interest.
>
> My
> problem is that often, the GO term from hyperGTest is not associated
> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
> above ), i.e. the GO term/ID is not in the list obtained from
> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up
by
> hyperGTest, but does not appear to be associated with any Entrez IDs
in
> the org.Hs.eg.db package. Where could I be going wrong?
>
> I would give a set of genes so that the example is reproducible, but
[[elided Yahoo spam]]
>
> Thanks for any comments/suggestions. I realize that I'm probably
doing something really stupid here....
>
> My sessionInfo() is:
> --------------------------------
> R version 2.7.2 (2008-08-25)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United
> States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] grid splines tools stats graphics grDevices
utils datasets methods base
>
> other attached packages:
> [1]
> gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0
> gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0
> Category_2.6.0
> [8] RBGL_1.16.0 annotate_1.18.0
> xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0
> GO.db_2.2.0 KEGG.db_2.2.0
> [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8
DBI_0.2-4 genefilter_1.20.0 survival_2.34-1
affy_1.18.0
> [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 MASS_7.2-44
>
>
> ---------------------------------
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
------------------------------
Message: 5
Date: Wed, 22 Oct 2008 14:50:06 +0100
From: Daniel Brewer <daniel.brewer@icr.ac.uk>
Subject: Re: [BioC] Limma contrasts question
To: "James W. MacDonald" <jmacdon at="" med.umich.edu="">,
bioconductor at stat.math.ethz.ch
Message-ID: <48FF2F8E.4020208 at icr.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1
James W. MacDonald wrote:
> Daniel Brewer wrote:
>
>> Hi Jim,
>>
>> Could you go into the maths of the contrast formulas a bit? I
would
>> like to get a really solid understanding of what it is doing for
future
>> analyses.
>
> Once you understand what the coefficients are, the contrasts are
just
> simple algebra. In your case, all of the coefficients are estimating
the
> difference between the sample and PC3M (e.g., Knockdown - PC3M).
>
> So the algebra is something like this:
>
> 2(Knockdown - PC3M) - (Scramble - PC3M)
> =
> 2Knockdown - 2PC3M - Scramble + PC3M
> =
> 2Knockdown - Scramble - PC3M
> =
> Knockdown - (Scramble + PC3M)/2
>
> Which is knockdown minus the mean of the controls.
>
> Note that this will be the numerator of the resulting t-statistic.
The
> denominator will be sort of an average of the variability within
each of
> the three groups being compared. So the question being answered is
'What
> genes are different in Knockdown as compared to the average of the
> controls?'. However, there is nothing here to test if the two
controls
> are similar at all (and you might not care).
>
> So for instance, you might have a gene with average expression like
this:
>
> Knockdown = 10
> PC3M = 4
> Scramble = 7
>
> If the intra-group variability is small for each sample type, then
you
> will likely get a significant t-statistic even though the two
controls
> are probably significantly different as well. Which is why I
mentioned
> earlier that you might want to test the Scramble - PC3M contrast as
well.
>
> Best,
>
> Jim
>
>
>>
>> Many thanks
>>
>> Dan
>>
>
Thanks for that, that is brilliant and has clarified everything. I
did
a comparison of the controls and there were no significant genes found
which is encouraging.
Many thanks
Dan
--
**************************************************************
Daniel Brewer, Ph.D.
Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.brewer at icr.ac.uk
**************************************************************
The Institute of Cancer Research: Royal Cancer Hospital, a charitable
Company Limited by Guarantee, Registered in England under Company No.
534147 with its Registered Office at 123 Old Brompton Road, London SW7
3RP.
This e-mail message is confidential and for use by the
a...{{dropped:2}}
------------------------------
Message: 6
Date: Wed, 22 Oct 2008 21:51:24 +0800
From: Leon Yee <yee.leon@gmail.com>
Subject: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: bioconductor at stat.math.ethz.ch
Message-ID: <48FF2FDC.6020203 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Dear all,
Is there any well-established routine for quality assessment and
preprocessing of array CGH data, especially tiling array-based CGH
data?
I found most of the quality assessment of array data are about
expression array, while few are related to array CGH data.
We are using agilent 244k CGH array of rat, and now we have the
text files produced by Feature Extraction, don't know whether they are
of good quality. Could anyone help provide some clues? Thanks in
advance!
After read.maimage(), we got the RGlist object, which contain
several components including R, G, Rb, Gb, and so on. The probes are
of
3 types: -1, 1 and 0. 0 means normal probe; -1 mean negative control,
i
guess, and the probe names are like (-)3xSLv1, NC1_00000002, etc[no
corresponding probe sequence]; 1 means positive control, i guess, and
the probe names are like DarkCorner, DCP_008001.0, RnCGHBrightCorner,
SRN_800002, etc[no corresponding probe sequence]. The number of -1 is
1275, while the number of 1 is 4217, each of which has its R, Rb, G,
Gb
values. Do we need these values for quality assessment and
normalization? How?
In addition, in the normal probes, we have 1000 probes repeating
3
times in the array. How could we use these data for quality assessment
and normalization?
Regards,
Leon
------------------------------
Message: 7
Date: Wed, 22 Oct 2008 15:48:22 +0200
From: Robert Castelo <robert.castelo@upf.edu>
Subject: [BioC] GOstats and org.EcK12.eg.db
To: bioconductor at stat.math.ethz.ch
Message-ID: <1224683302.5889.35.camel at llull.imim.es>
Content-Type: text/plain
dear list,
I cannot get to work GOstats with the annotation for E. coli in
org.EcK12.eg.db. Please find below the code that reproduces the
problem
including the error message, and my sessionInfo() at the end of this
email. I have included the same exercise with the human annotation
package org.Hs.eg.db which runs fine in my system. Any help with this
will be very much appreciated.
thanks!!
robert.
==========CODE STARTS HERE===========
library(org.Hs.eg.db)
library(org.EcK12.eg.db)
library(GOstats)
geneuniverse <- mappedkeys(org.EcK12.egSYMBOL)
set.seed(12345)
geneset <- sample(geneuniverse, size=100, replace=FALSE)
goHypGparams <- new("GOHyperGParams",
geneIds=geneset,
universeGeneIds=geneuniverse,
annotation="org.EcK12.eg.db", ontology="BP",
pvalueCutoff=1.0, conditional=TRUE,
testDirection="over")
goHypGcond <- hyperGTest(goHypGparams)
Error in get(mapName, envir = pkgEnv, inherits = FALSE) :
variable "org.EcK12.egENTREZID" was not found
Error in mget(probes, ID2EntrezID(datPkg)) :
error in evaluating the argument 'envir' in selecting a method for
function 'mget'
geneuniverse <- mappedkeys(org.Hs.egSYMBOL)
set.seed(12345)
geneset <- sample(geneuniverse, size=100, replace=FALSE)
goHypGparams <- new("GOHyperGParams",
geneIds=geneset,
universeGeneIds=geneuniverse,
annotation="org.Hs.eg.db", ontology="BP",
pvalueCutoff=1.0, conditional=TRUE,
testDirection="over")
goHypGcond <- hyperGTest(goHypGparams)
sessionInfo()
R version 2.8.0 beta (2008-10-05 r46601)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US
.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N
AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI
FICATION=C
attached base packages:
[1] splines tools stats graphics grDevices utils
datasets
[8] methods base
other attached packages:
[1] org.Hs.eg.db_2.2.6 GOstats_2.7.0 Category_2.7.6
[4] genefilter_1.21.5 survival_2.34-1 RBGL_1.17.2
[7] annotate_1.19.2 xtable_1.5-4 GO.db_2.2.5
[10] graph_1.19.6 org.EcK12.eg.db_2.2.6 AnnotationDbi_1.3.12
[13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.1.7
loaded via a namespace (and not attached):
[1] cluster_1.11.11 GSEABase_1.3.6 XML_1.98-1
------------------------------
Message: 8
Date: Wed, 22 Oct 2008 10:02:35 -0400
From: "Sean Davis" <sdavis2@mail.nih.gov>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: "Leon Yee" <yee.leon at="" gmail.com="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<264855a00810220702m3ee17381y221a8e5ec18ee6ff at
mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at="" gmail.com="">
wrote:
> Dear all,
>
> Is there any well-established routine for quality assessment and
> preprocessing of array CGH data, especially tiling array-based CGH
data? I
> found most of the quality assessment of array data are about
expression
> array, while few are related to array CGH data.
> We are using agilent 244k CGH array of rat, and now we have the
text
> files produced by Feature Extraction, don't know whether they are of
good
[[elided Yahoo spam]]
>
> After read.maimage(), we got the RGlist object, which contain
several
> components including R, G, Rb, Gb, and so on. The probes are of 3
types:
> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i
guess, and
> the probe names are like (-)3xSLv1, NC1_00000002, etc[no
corresponding probe
> sequence]; 1 means positive control, i guess, and the probe names
are like
> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
> corresponding probe sequence]. The number of -1 is 1275, while the
number
> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need
these
> values for quality assessment and normalization? How?
> In addition, in the normal probes, we have 1000 probes repeating
3 times
> in the array. How could we use these data for quality assessment and
> normalization?
You generally will not want to do any normalization besides a possible
shift of the center. Any linear normalization that affects the slope
of the M vs. A plot or nonlinear normalization will likely decrease
signal. As for quality control, a good, general measure to track is
the dlrs, a robust measure of the standard deviation.
dlrs <-
function(x) {
nx <- length(x)
if (nx<3) {
stop("Vector length>2 needed for computation")
}
tmp <- embed(x,2)
diffs <- tmp[,2]-tmp[,1]
dlrs <- IQR(diffs)/(sqrt(2)*1.34)
return(dlrs)
}
For agilent arrays, most of the dlrs should be around or under 0.2,
generally. However, this might vary a bit based on lab-to-lab
variation. In any case, if there is a significant outlier, that is
suspect. The input to the above function is the log ratios for a
single array arranged in chromosome and position order.
Sean
------------------------------
Message: 9
Date: Wed, 22 Oct 2008 07:24:50 -0700 (PDT)
Subject: Re: [BioC] GOstat: listing genes from hyperGTest
To: "James W. MacDonald" <jmacdon at="" med.umich.edu="">
Cc: bioc <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <418040.17653.qm at web58004.mail.re3.yahoo.com>
Content-Type: text/plain
Thanks James. If I can tweak that function, I'll get exactly what I
want.
I tried what you suggested and got the following error:
---------------------------
### 'genes1' are the Entrez IDs of my genes of interest, and
'allGenes' is the universe of Entrez IDs
paramsGO <- new("GOHyperGParams", geneIds = genes1,
universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
testDirection = "over")
GO <- hyperGTest(paramsGO)
ps <- probeSetSummary(GO)
Error in get(mapName, envir = pkgEnv, inherits = FALSE) :
variable "org.Hs.egENTREZID" was not found
--------------------------------
I guess the function would return the probe ids if I was using them,
but I have Entrez IDs as input.
Or am I doing something wrong?
thanks!
----- Original Message ----
From: James W. MacDonald <jmacdon@med.umich.edu>
Cc: bioc <bioconductor at="" stat.math.ethz.ch="">
Sent: Wednesday, October 22, 2008 9:10:39 AM
Subject: Re: [BioC] GOstat: listing genes from hyperGTest
Hi Tim,
Does probeSetSummary() do what you want?
Best,
Jim
Tim Smith wrote:
>
> Hi,
>
> I
> was performing a hyperGTest for genes in homo-sapiens. For a set of
> input genes, this function returns some 'significant' GO terms. What
I
> wanted to now do was to co-relate each significant GO term (returned
by
> this function) with genes (from my set of input genes) associated
with
> that GO term. However, I think that I may be using the wrong
> package/function to get the releveant set of genes.
>
> Currently, what I'm doing is finding the significant GO terms by
using the following code:
>
> -----------------------
> ### 'genes1' are the Entrez IDs of my genes of interest, and
'allGenes' is the universe of Entrez IDs
>
> paramsGO <- new("GOHyperGParams", geneIds = genes1,
> universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
> ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
> testDirection = "over")
>
> GO <- hyperGTest(paramsGO)
> --------------------------
> This
> gives me a set of significant GO terms. Now, I would like to find
which
> subset of genes in 'genes1' is associated with each of the
significant
> GO term. To do this I map all GO terms to their Entrez IDs using the
> 'org.Hs.eg.db' package using the following:
>
> xx <- as.list(org.Hs.egGO2EG)
>
> to
> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
> this number small?) that map to at least one Entrez ID. So, from
here I
> look up which Entrez IDs are associated with my GO term of interest.
>
> My
> problem is that often, the GO term from hyperGTest is not associated
> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
> above ), i.e. the GO term/ID is not in the list obtained from
> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up
by
> hyperGTest, but does not appear to be associated with any Entrez IDs
in
> the org.Hs.eg.db package. Where could I be going wrong?
>
[[elided Yahoo spam]]
>
> Thanks for any comments/suggestions. I realize that I'm probably
doing something really stupid here....
>
> My sessionInfo() is:
> --------------------------------
> R version 2.7.2 (2008-08-25)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United
> States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] grid splines tools stats graphics grDevices
utils datasets methods base
>
> other attached packages:
> [1]
> gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0
> gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0
> Category_2.6.0
> [8] RBGL_1.16.0 annotate_1.18.0
> xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0
> GO.db_2.2.0 KEGG.db_2.2.0
> [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8
DBI_0.2-4 genefilter_1.20.0 survival_2.34-1
affy_1.18.0
> [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 MASS_7.2-44
>
>
> ---------------------------------
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
[[alternative HTML version deleted]]
------------------------------
Message: 10
Date: Wed, 22 Oct 2008 22:32:34 +0800
From: Leon Yee <yee.leon@gmail.com>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: Sean Davis <sdavis2 at="" mail.nih.gov="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF3982.9020005 at gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Sean Davis wrote:
> On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at="" gmail.com="">
wrote:
>> Dear all,
>>
>> Is there any well-established routine for quality assessment and
>> preprocessing of array CGH data, especially tiling array-based CGH
data? I
>> found most of the quality assessment of array data are about
expression
>> array, while few are related to array CGH data.
>> We are using agilent 244k CGH array of rat, and now we have the
text
>> files produced by Feature Extraction, don't know whether they are
of good
[[elided Yahoo spam]]
>>
>> After read.maimage(), we got the RGlist object, which contain
several
>> components including R, G, Rb, Gb, and so on. The probes are of 3
types:
>> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i
guess, and
>> the probe names are like (-)3xSLv1, NC1_00000002, etc[no
corresponding probe
>> sequence]; 1 means positive control, i guess, and the probe names
are like
>> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
>> corresponding probe sequence]. The number of -1 is 1275, while the
number
>> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need
these
>> values for quality assessment and normalization? How?
>> In addition, in the normal probes, we have 1000 probes repeating
3 times
>> in the array. How could we use these data for quality assessment
and
>> normalization?
>
> You generally will not want to do any normalization besides a
possible
> shift of the center. Any linear normalization that affects the
slope
> of the M vs. A plot or nonlinear normalization will likely decrease
> signal. As for quality control, a good, general measure to track is
> the dlrs, a robust measure of the standard deviation.
>
>
> dlrs <-
> function(x) {
> nx <- length(x)
> if (nx<3) {
> stop("Vector length>2 needed for computation")
> }
> tmp <- embed(x,2)
> diffs <- tmp[,2]-tmp[,1]
> dlrs <- IQR(diffs)/(sqrt(2)*1.34)
> return(dlrs)
> }
>
> For agilent arrays, most of the dlrs should be around or under 0.2,
> generally. However, this might vary a bit based on lab-to-lab
> variation. In any case, if there is a significant outlier, that is
> suspect. The input to the above function is the log ratios for a
> single array arranged in chromosome and position order.
>
> Sean
>
Hi, Sean
Thanks for your advice. However, I have still several questions:
1. The input of dlrs is the log ratios, the log ration extracted
from the text file produced by Feature Extraction? or calculated from
RGlist --> MAlist ? I have searched the mailist and seen a post of
you
mentioned the difference of log ration from Feature Extraction and the
default M value from read.maimages.
2. I can get the log ratios of all features including control type
of -1 and 1, but these features don't have chromosome positions, does
this mean I don't need all of them for quality assessment?
3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will
not
get a proper mapping on the chromosome, so I should remove these
values
from the input of dlrs. Is it so?
4. How could I handle those 1000 probes repeating 3 times? They
will be mapped on the same chromosome position by three per group.
Regards,
Leon
------------------------------
Message: 11
Date: Wed, 22 Oct 2008 15:42:11 +0100
From: "Mark Dunning" <md392@cam.ac.uk>
Subject: Re: [BioC] Beadarray and illumina methylation arrays
To: "'Katrina bell'" <katrina.bell at="" mcri.edu.au="">,
<bioconductor at="" stat.math.ethz.ch="">
Message-ID: <000c01c93454$5dc9e720$195db560$@ac.uk>
Content-Type: text/plain; charset="us-ascii"
Hi Katrina,
I only have limited experience with methylation data, but hopefully I
might
be able to give you a few pointers.
-The error with readIllumina is quite hard to diagnose without seeing
the
example. I haven't seen any data from this type of Methylation array.
What
do the first few lines of the bead-level text files (.csv in your
case) look
like? It could be that the x and y coordinates are in a slightly
different
format to that we have seen before.
-Yes, 25% of outliers does seem a little high I'm afraid. Have you
also
looked at whereabouts the outliers are located on the arrays or made
some
imageplots? We have just developed a new function for automatic
artefact
detection called BASH that will be available in the forthcoming
Bioconductor
release. I could be interesting to run that on your data as Illumina
do
sometimes miss some beads in obvious artefacts and remove too many
beads on
the rest of the array. BASH should be a good compromise.
-Yes, currently the only way of reading methylation data into
beadarray is
by using the bead-level data.
-I'm not very familiar with the output of BeadStudio. Do you get
separate
detection values for the green and red channels? If so, then I don't
think
it would be problem if a bead-type was detected in one channel but not
the
other (since they are measure of either methylated or unmethylated
respectively). Bead types that are not detected in either channel
could be a
problem though.
-I haven't really seen many guidelines for normalization and this is
something I would like to look into. There is an obvious dye-bias that
needs
to be corrected and the background normalisation used by Illumina
might
actually help in this regard (although I wouldn't usually recommend it
for
other Illumina data). Quantile normalisation has been used for other
types
of two-colour Illumina data
(http://www.biomedcentral.com/1471-2105/9/409,
http://genome.cshlp.org/cgi/content/abstract/17/3/368) so it could
work
here.
Hope this helps,
Mark
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of
Katrina bell
Sent: 21 October 2008 03:04
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Beadarray and illumina methylation arrays
Hello,
This is the first time I have used beadarray . I am using it for the
analysis of an illumina 27 methylation array and I am having a few
issues I
hope that you could help me with.
1. The first time I tried to load the methylation data, I didn't
write
in singleChannel=FALSE. It happily read in my 12 arrays with no
problems
what so ever. I tried plotting a few things which worked fine. Seeing
my
mistake, I then went back to reload my data with the red channel
(singleChannel=FALSE) and got the following error.
> BLData = readIllumina(arrayNames = targets$ChipBarcode,
textType=".csv",
targets=targets, backgroundMethod="none", singleChannel=FALSE)
Found 12 arrays
Reading pixels of ./4408100017_A_Grn.tif
Calculating background
Sharpening Image
Calculating foregound
Background correcting: method = none
Reading pixels of ./4408100017_A_Red.tif
Calculating background
*** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: .C("readBeadImage", as.character(tifFiles2[i]),
as.double(RedX[ord]),
as.double(RedY[ord]), as.integer(numBeads), foreGround = double(length
=
numBeads), backGround = double(length = numBeads),
as.integer(backgroundSize), as.integer(manip), as.integer(fground),
PACKAGE
= "beadarray")
2: readIllumina(arrayNames = targets$ChipBarcode, textType = ".csv",
targets = targets, backgroundMethod = "none", singleChannel = FALSE)
session info Below.
So I ended up loading in the data with images=FALSE, which worked, but
I
would like to be able to look at the background. Is there a way around
this
issue?
2. When I plotted the outliers (bar chart) I got an astounding 25% for
the
majority of my 12 samples, both in the red and green channel (unlogged
data). In addition, 3 of the samples had a peak of intensity at 5 in
the
green channel, leading me to believe that I have some real quality
control
issues with my samples. Any opinions/suggestions on these results
would be
most welcome.
3. Is it correct that readBeadSummaryData, is not set up for two
colour
arrays such as the methylation arrays? So the only way to look at
methylation data is through reading in BLData?
4. Some of my samples seem to have a large number of targets which
have a p
value detection rate above 0.05 (beadstudio output). Illumina have
indicated that they disregard these. If I can not read in the bead
summary
data from bead studio, I am assuming that these detection p values can
not
be taken into account in the analysis? Or are there other methods that
remove/down grade these less than optimal probes (most removed as
outliers?).
5. Has any one had any experience with normalisation of the
methylation
arrays? I know that many of the usual array methods are out of the
question
due to the assumption that most probes on the array will not be
differentially expressed is invalid. I read in a bioconductor list
someone
suggesting quantile normalisation? I would really appreciate any
feeback
from people who have tried this or other methods, especially if they
have
verified their methylation results.
Thanks for any help/advice you may be able to give.
Cheers
Katrina
> sessionInfo() below
R version 2.7.0 (2008-04-22)
x86_64-redhat-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US
.UTF-8
;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;
LC_ADD
RESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] beadarray_1.8.0 affy_1.18.2 preprocessCore_1.2.1
[4] affyio_1.8.0 geneplotter_1.18.0 annotate_1.18.0
[7] xtable_1.5-2 AnnotationDbi_1.2.2 RSQLite_0.6-9
[10] DBI_0.2-4 lattice_0.17-6 Biobase_2.0.1
[13] limma_2.14.5
loaded via a namespace (and not attached):
[1] grid_2.7.0 KernSmooth_2.22-22 RColorBrewer_1.0-2
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 12
Date: Wed, 22 Oct 2008 10:46:24 -0400
From: "Sean Davis" <sdavis2@mail.nih.gov>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: "Leon Yee" <yee.leon at="" gmail.com="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<264855a00810220746w6d5ecb11ica5542bf46042f20 at
mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 10:32 AM, Leon Yee <yee.leon at="" gmail.com="">
wrote:
> Sean Davis wrote:
>>
>> On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at="" gmail.com="">
wrote:
>>>
>>> Dear all,
>>>
>>> Is there any well-established routine for quality assessment and
>>> preprocessing of array CGH data, especially tiling array-based CGH
data?
>>> I
>>> found most of the quality assessment of array data are about
expression
>>> array, while few are related to array CGH data.
>>> We are using agilent 244k CGH array of rat, and now we have the
text
>>> files produced by Feature Extraction, don't know whether they are
of good
[[elided Yahoo spam]]
>>>
>>> After read.maimage(), we got the RGlist object, which contain
several
>>> components including R, G, Rb, Gb, and so on. The probes are of 3
types:
>>> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i
guess, and
>>> the probe names are like (-)3xSLv1, NC1_00000002, etc[no
corresponding
>>> probe
>>> sequence]; 1 means positive control, i guess, and the probe names
are
>>> like
>>> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
>>> corresponding probe sequence]. The number of -1 is 1275, while
the
>>> number
>>> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we
need these
>>> values for quality assessment and normalization? How?
>>> In addition, in the normal probes, we have 1000 probes repeating
3
>>> times
>>> in the array. How could we use these data for quality assessment
and
>>> normalization?
>>
>> You generally will not want to do any normalization besides a
possible
>> shift of the center. Any linear normalization that affects the
slope
>> of the M vs. A plot or nonlinear normalization will likely decrease
>> signal. As for quality control, a good, general measure to track
is
>> the dlrs, a robust measure of the standard deviation.
>>
>>
>> dlrs <-
>> function(x) {
>> nx <- length(x)
>> if (nx<3) {
>> stop("Vector length>2 needed for computation")
>> }
>> tmp <- embed(x,2)
>> diffs <- tmp[,2]-tmp[,1]
>> dlrs <- IQR(diffs)/(sqrt(2)*1.34)
>> return(dlrs)
>> }
>>
>> For agilent arrays, most of the dlrs should be around or under 0.2,
>> generally. However, this might vary a bit based on lab-to-lab
>> variation. In any case, if there is a significant outlier, that is
>> suspect. The input to the above function is the log ratios for a
>> single array arranged in chromosome and position order.
>>
>> Sean
>>
>
> Hi, Sean
>
> Thanks for your advice. However, I have still several questions:
>
> 1. The input of dlrs is the log ratios, the log ration extracted
from the
> text file produced by Feature Extraction? or calculated from RGlist
-->
> MAlist ? I have searched the mailist and seen a post of you
mentioned the
> difference of log ration from Feature Extraction and the default M
value
> from read.maimages.
You can read the Agilent FE manual for more details, but you can
probably use either and come to very similar conclusions. If you use
the MAlist version, make sure to use only median centering or none for
normalization.
> 2. I can get the log ratios of all features including control type
of -1
> and 1, but these features don't have chromosome positions, does this
mean I
> don't need all of them for quality assessment?
We have not routinely used these probes, no. If an array fails
miserably, then these control probes might be useful for determining
the reason for the failure, though.
> 3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will
not get a
> proper mapping on the chromosome, so I should remove these values
from the
> input of dlrs. Is it so?
You can either remove them or treat chr2_random as a separate
chromosome.
> 4. How could I handle those 1000 probes repeating 3 times? They
will be
> mapped on the same chromosome position by three per group.
You could choose one at random or use a mean or median of them. My
guess is that they agree very closely with one another so the choice
should not affect the results much.
Sean
------------------------------
Message: 13
Date: Wed, 22 Oct 2008 15:54:25 +0100
From: "Dan Bolser" <dan.bolser@gmail.com>
Subject: [BioC] Problem using Rgraphviz (edge weights going missing).
To: bioconductor at stat.math.ethz.ch
Message-ID:
<2c8757af0810220754r3a16507l29dd77f43a5151d7 at
mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Hi all,
After successfully installing Rgraphviz on this Centos 5.2 (RHEL 4)
box (basically with yum install *graphviz*** ..., then using the
standard BioC install procedure) and then navigating the packaging
error (that actually caused R to crash) described here:
http://lists.rpmforge.net/pipermail/users/2007-August/000958.html
(Just manually run dot -c as root)
I finally got to produce some nice graphs of my data. However, I found
that when I tried to write my graphs to file in dot format, the edge
weights go missing.
I load my graph like this:
library(Rgraphviz)
myNodes <-
paste(nodes$MAPC)
myG <- new("graphNEL", nodes=myNodes)
for(i in 1:nrow(dat)){
paste(dat[i,"A"], dat[i,"B"])
myG <-
addEdge(paste(dat[i,"A"]),
paste(dat[i,"B"]), myG, dat[i,"N"])
}
plot(myG, "neato")
and all well and good (except that I couldn't get the edge weights to
translate into line thickness - more on that below).
Then I (foolishly?) tried:
agwrite(agopen(myG,"test"), filename="test")
This produced a file, but the weight attribute of the edges (in dot
format?) were all clearly set to 1 (and not the range of values that I
have stored in dat$N). I confirmed that my loop was putting the
weights somewhere with the following one-liner:
lapply(myG at edgeData@data, function(l){l$weight})
but I don't really know what this means. So I think its a bug. The
graph should be written to file with the correct weight attributes.
AFAICT.
Poking around in the data structure I was confused by the apparent
number of edges...
ew <-
unlist(edgeWeights(myG))
length(ew)
Why is this not equal to the number of edges that I put in (and the
number reported by just printing the graph object)?
I tried to set the width of the edges plotted on the graph by using
the penwidth attribute, but it had no effect, and was not written to
file as an attribute of the graph.
## Try defining some EDGE attributes
eAttrs <- list()
## Get the edge weights
ew <-
unlist(edgeWeights(myG))
length(ew)
## Some need removing for some reason somehow...
ew <-
ew[setdiff(seq(along = ew), removedEdges(myG))]
length(ew)
## Label the attributes vector (so it can be used properly)
names(ew) <-
edgeNames(myG)
eAttrs$penwidth <-
ew
plot(myG, "neato", edgeAttrs=eAttrs)
toFile(agopen(myG, name="test"), filename="test", edgeAttrs=eAttrs)
So I read in the vignette,
" A list of all available attributes
is accessible
online at: http://www.graphviz.org/pub/scm/graphviz2/doc/info/attrs.
html. Note that there are some di?erences between default values and
also
some attributes will not have an e?ect in Rgraphviz. Please see the
man page
for graphvizAttributes for more details."
Where is the "man page for graphvizAttributes for more details" ??
It would be good to know what attributes Rgraphviz is ignoring and if
penwidth is one of them in particular.
Also the vignette didn't tell me who to contact in case of feedback,
bugs, questions, etc. Who else should I be bugging with this email?
Thanks for reading!
Dan.
--
R2spec --bioc -u
http://www.bioconductor.org/packages/release/bioc/src/contrib/Rgraphvi
z_1.18.1.tar.gz
------------------------------
Message: 14
Date: Wed, 22 Oct 2008 11:07:02 -0400
From: "Mark Kimpel" <mwkimpel@gmail.com>
Subject: Re: [BioC] newbie problems with AnnBuilder
To: "Marc Carlson" <mcarlson at="" fhcrc.org="">
Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch="">
Message-ID:
<6b93d1830810220807l788e52f8wa4bafef9726892d0 at
mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Build with AnnotationDbi was uneventful, but I have been unable to
install the package or use it as is.
If the package is just placed in my site-library, I get:
'ragene10stv1.db' is not a valid package -- installed < 2.0.0?
If I tar the package and try R CMD INSTALL, I get:
cannot extract package from 'ragene10stv1.db.tar.gz'
What approach should I be using?
Mark
------------------------------------------------------------
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Home
Skype: mkimpel
"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
******************************************************************
On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at="" fhcrc.org="">
wrote:
>
> Hi Mark,
>
> AnnBuilder is on its way out. Please have a look at the
SQLforge.pdf
> vignette which can be found here:
>
>
http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
>
> If you have further questions after reading this, then please just
ask,
> and we should be able to help you out.
>
>
> Marc
>
>
>
> Mark Kimpel wrote:
> > I'm having problems getting AnnBuilder to work with the Affy Rat
Gene ST
> > array data supplied by Affy. Using the code chunk below, I am able
to get
> > AnnBuilder to create the annotation object, but it crashes, I
believe, when
> > it tries to save it. I should also mention that I had a previous
crash when
> > I had a madecdfenv package directory in place that used the same
name. I got
> > a "file lock" error, so I temporarily renamed the directory to see
if this
> > fixed the problem. As below, the error changed, but I still can't
get the
> > script to work.
> >
> > I suspect that there is a fundamental misunderstaning on my part
related to
> > how the annotation package should relate to the cdf package or
some naming
> > convention related to one or both.
> >
> > Mark
> >
> >
> >> require(AnnBuilder); require(makecdfenv)
> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe-
entrez_gene.csv"
> >> myBaseType <- "ll"
> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus")
> >>
> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType
=
> >>
> > + myBaseType, pkgName =
> > substring(cleancdfname("RaGene-1_0-st-v1"),
> > + 1, (nchar(cleancdfname("RaGene-
1_0-st-v1")) - 3)),
> > + pkgPath = '~/R_HOME/site-library-2.8.0', organism =
"Rattus
> > norvegicus", version = "1.0",
> > + author = list(authors = "Mark W Kimpel", maintainer =
> > + "mkimpel at iupui.edu"), fromWeb =
TRUE)
> > Read 1 item
> > Read 1 item
> > [1] "4450 2 2"
> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress,
envhook) :
> > cannot open file
> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No such
file or
> > directory
> >
> > Enter a frame number, or 0 to exit
> >
> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls,
baseMapType =
> > myBaseTy
> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE)
> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress)
> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii,
compress,
> > envhook)
> > 5: function (e)
> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook)
> >
> > Selection: 1
> > Browse[1]> annotation[1:2,]
> > ENTREZID PROBE ACCNUM GO
> > [1,] "100008565" "10766774" NA NA
> > [2,] "100034253" "10937540" NA "GO:0005525 at IEA;GO:0005622
at IEA"
> > PMID SYMBOL
> > [1,] "16804107;17292978" "Slc39a4l"
> > [2,] "8889548" "Gnl3l"
> > GENENAME
CHR
> > [1,] "solute carrier family 39 (zinc transporter), member 4-like"
NA
> > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like"
"X"
> > MAP REFSEQ UNIGENE CHRLOC
PATH
> > [1,] NA "NM_001101021;NP_001094491" "Rn.10120" "NA"
NA
> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274" "-40301218
at X" NA
> > ENZYME PFAM PROSITE
> > [1,] NA "NA at IPI00817057" "NA at IPI00817057"
> > [2,] NA "PF01926 at IPI00362844" "NA at IPI00362844"
> > ------------------------------------------------------------
> > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
> > Indiana University School of Medicine
> >
> > 15032 Hunter Court, Westfield, IN 46074
> >
> > (317) 490-5129 Work, & Mobile & VoiceMail
> > (317) 399-1219 Home
> > Skype: mkimpel
> >
> > "The real problem is not whether machines think but whether men
do." -- B.
> > F. Skinner
> > ******************************************************************
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
>
------------------------------
Message: 15
Date: Wed, 22 Oct 2008 11:53:59 -0400
From: "Sean Davis" <sdavis2@mail.nih.gov>
Subject: Re: [BioC] newbie problems with AnnBuilder
To: "Mark Kimpel" <mwkimpel at="" gmail.com="">
Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch="">
Message-ID:
<264855a00810220853m5afad74apfb3df17499f90f49 at
mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 11:07 AM, Mark Kimpel <mwkimpel at="" gmail.com="">
wrote:
> Build with AnnotationDbi was uneventful, but I have been unable to
> install the package or use it as is.
>
> If the package is just placed in my site-library, I get:
> 'ragene10stv1.db' is not a valid package -- installed < 2.0.0?
>
> If I tar the package and try R CMD INSTALL, I get:
> cannot extract package from 'ragene10stv1.db.tar.gz'
>
> What approach should I be using?
You can just
R CMD INSTALL ragene10stv1.db
where ragene10stv1.db is the directory that contains the package
(right above the DESCRIPTION file).
Sean
> On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote:
>>
>> Hi Mark,
>>
>> AnnBuilder is on its way out. Please have a look at the
SQLforge.pdf
>> vignette which can be found here:
>>
>>
http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
>>
>> If you have further questions after reading this, then please just
ask,
>> and we should be able to help you out.
>>
>>
>> Marc
>>
>>
>>
>> Mark Kimpel wrote:
>> > I'm having problems getting AnnBuilder to work with the Affy Rat
Gene ST
>> > array data supplied by Affy. Using the code chunk below, I am
able to get
>> > AnnBuilder to create the annotation object, but it crashes, I
believe, when
>> > it tries to save it. I should also mention that I had a previous
crash when
>> > I had a madecdfenv package directory in place that used the same
name. I got
>> > a "file lock" error, so I temporarily renamed the directory to
see if this
>> > fixed the problem. As below, the error changed, but I still can't
get the
>> > script to work.
>> >
>> > I suspect that there is a fundamental misunderstaning on my part
related to
>> > how the annotation package should relate to the cdf package or
some naming
>> > convention related to one or both.
>> >
>> > Mark
>> >
>> >
>> >> require(AnnBuilder); require(makecdfenv)
>> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe-
entrez_gene.csv"
>> >> myBaseType <- "ll"
>> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus")
>> >>
>> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType
=
>> >>
>> > + myBaseType, pkgName =
>> > substring(cleancdfname("RaGene-1_0-st-v1"),
>> > + 1, (nchar(cleancdfname("RaGene-
1_0-st-v1")) - 3)),
>> > + pkgPath = '~/R_HOME/site-library-2.8.0', organism =
"Rattus
>> > norvegicus", version = "1.0",
>> > + author = list(authors = "Mark W Kimpel", maintainer =
>> > + "mkimpel at iupui.edu"), fromWeb =
TRUE)
>> > Read 1 item
>> > Read 1 item
>> > [1] "4450 2 2"
>> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress,
envhook) :
>> > cannot open file
>> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No
such file or
>> > directory
>> >
>> > Enter a frame number, or 0 to exit
>> >
>> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls,
baseMapType =
>> > myBaseTy
>> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE)
>> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress)
>> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii,
compress,
>> > envhook)
>> > 5: function (e)
>> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress,
envhook)
>> >
>> > Selection: 1
>> > Browse[1]> annotation[1:2,]
>> > ENTREZID PROBE ACCNUM GO
>> > [1,] "100008565" "10766774" NA NA
>> > [2,] "100034253" "10937540" NA "GO:0005525 at IEA;GO:0005622
at IEA"
>> > PMID SYMBOL
>> > [1,] "16804107;17292978" "Slc39a4l"
>> > [2,] "8889548" "Gnl3l"
>> > GENENAME
CHR
>> > [1,] "solute carrier family 39 (zinc transporter), member 4-like"
NA
>> > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like"
"X"
>> > MAP REFSEQ UNIGENE CHRLOC
PATH
>> > [1,] NA "NM_001101021;NP_001094491" "Rn.10120" "NA"
NA
>> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274"
"-40301218 at X" NA
>> > ENZYME PFAM PROSITE
>> > [1,] NA "NA at IPI00817057" "NA at IPI00817057"
>> > [2,] NA "PF01926 at IPI00362844" "NA at IPI00362844"
>> > ------------------------------------------------------------
>> > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
>> > Indiana University School of Medicine
>> >
>> > 15032 Hunter Court, Westfield, IN 46074
>> >
>> > (317) 490-5129 Work, & Mobile & VoiceMail
>> > (317) 399-1219 Home
>> > Skype: mkimpel
>> >
>> > "The real problem is not whether machines think but whether men
do." -- B.
>> > F. Skinner
>> >
******************************************************************
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
------------------------------
Message: 16
Date: Wed, 22 Oct 2008 12:05:25 -0400
From: "Mark Kimpel" <mwkimpel@gmail.com>
Subject: Re: [BioC] newbie problems with AnnBuilder
To: "Sean Davis" <sdavis2 at="" mail.nih.gov="">
Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch="">
Message-ID:
<6b93d1830810220905m46a4e41kfe052fd768d458e5 at
mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Interesting, it works from a command line outside of R, but when I had
tried R> system('R CMD INSTALL ragene10stv1.db') I received an error
message. I have used this approach successfully with other packages to
avoid leaving R and starting a shell, but with this package I get:
'* Installing to library '/home/mkimpel/R_HOME/site-library-2.8.0'
ERROR: no packages specified'
Well, thanks, it now is installed. Any comment on why my system()
approach might not have worked?
Mark
------------------------------------------------------------
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Home
Skype: mkimpel
"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
******************************************************************
On Wed, Oct 22, 2008 at 11:53 AM, Sean Davis <sdavis2 at="" mail.nih.gov="">
wrote:
> On Wed, Oct 22, 2008 at 11:07 AM, Mark Kimpel <mwkimpel at="" gmail.com=""> wrote:
>> Build with AnnotationDbi was uneventful, but I have been unable to
>> install the package or use it as is.
>>
>> If the package is just placed in my site-library, I get:
>> 'ragene10stv1.db' is not a valid package -- installed < 2.0.0?
>>
>> If I tar the package and try R CMD INSTALL, I get:
>> cannot extract package from 'ragene10stv1.db.tar.gz'
>>
>> What approach should I be using?
>
> You can just
>
> R CMD INSTALL ragene10stv1.db
>
> where ragene10stv1.db is the directory that contains the package
> (right above the DESCRIPTION file).
>
> Sean
>
>
>> On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote:
>>>
>>> Hi Mark,
>>>
>>> AnnBuilder is on its way out. Please have a look at the
SQLforge.pdf
>>> vignette which can be found here:
>>>
>>>
http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
>>>
>>> If you have further questions after reading this, then please just
ask,
>>> and we should be able to help you out.
>>>
>>>
>>> Marc
>>>
>>>
>>>
>>> Mark Kimpel wrote:
>>> > I'm having problems getting AnnBuilder to work with the Affy Rat
Gene ST
>>> > array data supplied by Affy. Using the code chunk below, I am
able to get
>>> > AnnBuilder to create the annotation object, but it crashes, I
believe, when
>>> > it tries to save it. I should also mention that I had a previous
crash when
>>> > I had a madecdfenv package directory in place that used the same
name. I got
>>> > a "file lock" error, so I temporarily renamed the directory to
see if this
>>> > fixed the problem. As below, the error changed, but I still
can't get the
>>> > script to work.
>>> >
>>> > I suspect that there is a fundamental misunderstaning on my part
related to
>>> > how the annotation package should relate to the cdf package or
some naming
>>> > convention related to one or both.
>>> >
>>> > Mark
>>> >
>>> >
>>> >> require(AnnBuilder); require(makecdfenv)
>>> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe-
entrez_gene.csv"
>>> >> myBaseType <- "ll"
>>> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus")
>>> >>
>>> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls,
baseMapType =
>>> >>
>>> > + myBaseType, pkgName =
>>> > substring(cleancdfname("RaGene-1_0-st-v1"),
>>> > + 1, (nchar(cleancdfname("RaGene-
1_0-st-v1")) - 3)),
>>> > + pkgPath = '~/R_HOME/site-library-2.8.0', organism =
"Rattus
>>> > norvegicus", version = "1.0",
>>> > + author = list(authors = "Mark W Kimpel", maintainer
=
>>> > + "mkimpel at iupui.edu"), fromWeb =
TRUE)
>>> > Read 1 item
>>> > Read 1 item
>>> > [1] "4450 2 2"
>>> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress,
envhook) :
>>> > cannot open file
>>> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No
such file or
>>> > directory
>>> >
>>> > Enter a frame number, or 0 to exit
>>> >
>>> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls,
baseMapType =
>>> > myBaseTy
>>> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE)
>>> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress)
>>> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii,
compress,
>>> > envhook)
>>> > 5: function (e)
>>> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress,
envhook)
>>> >
>>> > Selection: 1
>>> > Browse[1]> annotation[1:2,]
>>> > ENTREZID PROBE ACCNUM GO
>>> > [1,] "100008565" "10766774" NA NA
>>> > [2,] "100034253" "10937540" NA "GO:0005525 at IEA;GO:0005622
at IEA"
>>> > PMID SYMBOL
>>> > [1,] "16804107;17292978" "Slc39a4l"
>>> > [2,] "8889548" "Gnl3l"
>>> > GENENAME
CHR
>>> > [1,] "solute carrier family 39 (zinc transporter), member
4-like" NA
>>> > [2,] "guanine nucleotide binding protein-like 3
(nucleolar)-like" "X"
>>> > MAP REFSEQ UNIGENE CHRLOC
PATH
>>> > [1,] NA "NM_001101021;NP_001094491" "Rn.10120" "NA"
NA
>>> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274"
"-40301218 at X" NA
>>> > ENZYME PFAM PROSITE
>>> > [1,] NA "NA at IPI00817057" "NA at IPI00817057"
>>> > [2,] NA "PF01926 at IPI00362844" "NA at IPI00362844"
>>> > ------------------------------------------------------------
>>> > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
>>> > Indiana University School of Medicine
>>> >
>>> > 15032 Hunter Court, Westfield, IN 46074
>>> >
>>> > (317) 490-5129 Work, & Mobile & VoiceMail
>>> > (317) 399-1219 Home
>>> > Skype: mkimpel
>>> >
>>> > "The real problem is not whether machines think but whether men
do." -- B.
>>> > F. Skinner
>>> >
******************************************************************
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > _______________________________________________
>>> > Bioconductor mailing list
>>> > Bioconductor at stat.math.ethz.ch
>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >
>>> >
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
------------------------------
Message: 17
Date: Wed, 22 Oct 2008 09:49:20 -0700
From: Robert Gentleman <rgentlem@fhcrc.org>
Subject: Re: [BioC] GOstats and org.EcK12.eg.db
To: Robert Castelo <robert.castelo at="" upf.edu="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF5990.20501 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Robert,
Yet another one that I will need to write some specialized code
for.
I should get it done by the end of the week, and will push it to
both
release and devel. I will post an email when it is done,
best wishes
Robert
Robert Castelo wrote:
> dear list,
>
> I cannot get to work GOstats with the annotation for E. coli in
> org.EcK12.eg.db. Please find below the code that reproduces the
problem
> including the error message, and my sessionInfo() at the end of this
> email. I have included the same exercise with the human annotation
> package org.Hs.eg.db which runs fine in my system. Any help with
this
> will be very much appreciated.
>
> thanks!!
> robert.
> ==========CODE STARTS HERE===========
>
> library(org.Hs.eg.db)
> library(org.EcK12.eg.db)
> library(GOstats)
>
> geneuniverse <- mappedkeys(org.EcK12.egSYMBOL)
> set.seed(12345)
> geneset <- sample(geneuniverse, size=100, replace=FALSE)
>
>
> goHypGparams <- new("GOHyperGParams",
> geneIds=geneset,
> universeGeneIds=geneuniverse,
> annotation="org.EcK12.eg.db", ontology="BP",
> pvalueCutoff=1.0, conditional=TRUE,
> testDirection="over")
>
> goHypGcond <- hyperGTest(goHypGparams)
>
> Error in get(mapName, envir = pkgEnv, inherits = FALSE) :
> variable "org.EcK12.egENTREZID" was not found
> Error in mget(probes, ID2EntrezID(datPkg)) :
> error in evaluating the argument 'envir' in selecting a method for
> function 'mget'
>
> geneuniverse <- mappedkeys(org.Hs.egSYMBOL)
> set.seed(12345)
> geneset <- sample(geneuniverse, size=100, replace=FALSE)
>
>
> goHypGparams <- new("GOHyperGParams",
> geneIds=geneset,
> universeGeneIds=geneuniverse,
> annotation="org.Hs.eg.db", ontology="BP",
> pvalueCutoff=1.0, conditional=TRUE,
> testDirection="over")
>
> goHypGcond <- hyperGTest(goHypGparams)
>
>
> sessionInfo()
> R version 2.8.0 beta (2008-10-05 r46601)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_
US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC
_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDEN
TIFICATION=C
>
> attached base packages:
> [1] splines tools stats graphics grDevices utils
> datasets
> [8] methods base
>
> other attached packages:
> [1] org.Hs.eg.db_2.2.6 GOstats_2.7.0 Category_2.7.6
> [4] genefilter_1.21.5 survival_2.34-1 RBGL_1.17.2
> [7] annotate_1.19.2 xtable_1.5-4 GO.db_2.2.5
> [10] graph_1.19.6 org.EcK12.eg.db_2.2.6
AnnotationDbi_1.3.12
> [13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.1.7
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 GSEABase_1.3.6 XML_1.98-1
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
------------------------------
Message: 18
Date: Thu, 23 Oct 2008 01:14:09 +0800
From: Leon Yee <yee.leon@gmail.com>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: Sean Davis <sdavis2 at="" mail.nih.gov="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF5F61.5060906 at gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Sean Davis wrote:
>> Hi, Sean
>>
>> Thanks for your advice. However, I have still several questions:
>>
>> 1. The input of dlrs is the log ratios, the log ration extracted
from the
>> text file produced by Feature Extraction? or calculated from RGlist
-->
>> MAlist ? I have searched the mailist and seen a post of you
mentioned the
>> difference of log ration from Feature Extraction and the default M
value
>> from read.maimages.
>
> You can read the Agilent FE manual for more details, but you can
> probably use either and come to very similar conclusions. If you
use
> the MAlist version, make sure to use only median centering or none
for
> normalization.
>
>> 2. I can get the log ratios of all features including control
type of -1
>> and 1, but these features don't have chromosome positions, does
this mean I
>> don't need all of them for quality assessment?
>
> We have not routinely used these probes, no. If an array fails
> miserably, then these control probes might be useful for determining
> the reason for the failure, though.
>
>> 3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will
not get a
>> proper mapping on the chromosome, so I should remove these values
from the
>> input of dlrs. Is it so?
>
> You can either remove them or treat chr2_random as a separate
chromosome.
>
>> 4. How could I handle those 1000 probes repeating 3 times? They
will be
>> mapped on the same chromosome position by three per group.
>
> You could choose one at random or use a mean or median of them. My
> guess is that they agree very closely with one another so the choice
> should not affect the results much.
Hi, Sean
Thank you very much for your detailed reply and help.
Where can I get the references or official documentations about
dlrs method?
In addition, we have design our array with dye-swap [test-cy3 vs
ref-cy5, and test-cy5 vs ref-cy3]. Is there any method for utilizing
the
information here for quality assessment?
Best wishes!
Leon
------------------------------
Message: 19
Date: Wed, 22 Oct 2008 17:13:10 +0000 (UTC)
From: Shinichiro Wachi <swachi@ucdavis.edu>
Subject: [BioC] Bioconductor installation problem: unable to access
repository
To: bioconductor at stat.math.ethz.ch
Message-ID: <loom.20081022t170334-485 at="" post.gmane.org="">
Content-Type: text/plain; charset=us-ascii
I am installing Bioconductor on a new machine (Intel Mac running OSX
10.5.5), R
version is 2.8.0.
sudo R is used for this session.
These are the error messages I get:
source("http://bioconductor.org/biocLite.R")
biocinstall()
Running biocinstall version 2.3.8 with R version 2.8.0 (under
development)
Your version of R requires version 2.3 of Bioconductor.
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/bioc/bin/macosx//contrib/2.8
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/data/annotation/bin/macosx//contr
ib/2.8
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/data/experiment/bin/macosx//contr
ib/2.8
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/extra/bin/macosx//contrib/2.8
Warning: unable to access index for repository
http://cran.fhcrc.org/bin/macosx//contrib/2.8
Warning message:
package 'Biobase' is not available
http://bioconductor.org/packages/2.3/bioc/src/contrib/PACKAGES
will produce a web page displaying packages. (I read a post that asked
for this.
I am assuming this was a quick server diagnostics).
Is there a problem with the server, or is this a temporary glitch in
the
installer? Is there a workaround?
Much thanks.
Shin
------------------------------
Message: 20
Date: Wed, 22 Oct 2008 13:29:58 -0400
From: "Sean Davis" <sdavis2@mail.nih.gov>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: "Leon Yee" <yee.leon at="" gmail.com="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<264855a00810221029i18a08bc3nb1a34d786a6b30a6 at
mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 1:14 PM, Leon Yee <yee.leon at="" gmail.com="">
wrote:
> Sean Davis wrote:
>>>
>>> Hi, Sean
>>>
>>> Thanks for your advice. However, I have still several questions:
>>>
>>> 1. The input of dlrs is the log ratios, the log ration extracted
from
>>> the
>>> text file produced by Feature Extraction? or calculated from
RGlist -->
>>> MAlist ? I have searched the mailist and seen a post of you
mentioned
>>> the
>>> difference of log ration from Feature Extraction and the default M
value
>>> from read.maimages.
>>
>> You can read the Agilent FE manual for more details, but you can
>> probably use either and come to very similar conclusions. If you
use
>> the MAlist version, make sure to use only median centering or none
for
>> normalization.
>>
>>> 2. I can get the log ratios of all features including control
type of -1
>>> and 1, but these features don't have chromosome positions, does
this mean
>>> I
>>> don't need all of them for quality assessment?
>>
>> We have not routinely used these probes, no. If an array fails
>> miserably, then these control probes might be useful for
determining
>> the reason for the failure, though.
>>
>>> 3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will
not get
>>> a
>>> proper mapping on the chromosome, so I should remove these values
from
>>> the
>>> input of dlrs. Is it so?
>>
>> You can either remove them or treat chr2_random as a separate
chromosome.
>>
>>> 4. How could I handle those 1000 probes repeating 3 times? They
will be
>>> mapped on the same chromosome position by three per group.
>>
>> You could choose one at random or use a mean or median of them. My
>> guess is that they agree very closely with one another so the
choice
>> should not affect the results much.
>
> Hi, Sean
>
> Thank you very much for your detailed reply and help.
>
> Where can I get the references or official documentations about
dlrs
> method?
It is a standard robust estimator of the variance and is not specific
to microarrays. If you look at the code, it simply subtracts the
difference between adjacent probes and then normalizes the result. If
the array is "noisy", the dlrs will be high. This assumes that the
contribution due to large copy number changes is negligible which is
likely true since even the most abnormal cancer samples have fewer
than 1000 breaks.
> In addition, we have design our array with dye-swap [test-cy3 vs
ref-cy5,
> and test-cy5 vs ref-cy3]. Is there any method for utilizing the
information
> here for quality assessment?
Not that I know of, but you could certainly look at correlations
between replicates, etc. Our experience with Agilent CGH arrays is
that the contribution due to dye bias is small compared to changes due
to copy number.
Sean
Sean
------------------------------
Message: 21
Date: Wed, 22 Oct 2008 13:40:01 -0400
From: "James W. MacDonald" <jmacdon@med.umich.edu>
Subject: Re: [BioC] GOstat: listing genes from hyperGTest
Cc: bioc <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <48FF6571.5090806 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Tim,
Yeah, probeSetSummary() is probably not what you want, if you are not
starting with an Affy chip. There are some gymnastics required to map
things back to the original Affy chip that you won't need to do. In
addition, if you are not using a conditional hypergeometric analysis,
it
should be pretty simple to get what you want without even needing to
parse things out of the GOHyperGResult object. An example:
## fake up some data
> geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)]
> univ <- Lkeys(org.Hs.egGO)
> param <- new("GOHyperGParams", geneIds = geneIds,
universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP")
> hyp <- hyperGTest(param)
> summary(hyp, categorySize=10)
GOBPID Pvalue OddsRatio ExpCount Count Size
Term
1 GO:0007338 0.002723500 29.25101 0.07808304 2 54 single
fertilization
2 GO:0009566 0.002925855 28.16374 0.08097501 2 56
fertilization
So we have two terms of interest. Getting the Entrez Gene IDs from the
input set that map to these terms is easy:
> geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))]
[1] "100131137" "10007"
Now you might also want to know which 54 Entrez Gene IDs map to that
particular GO term. Since you are not conditioning, this includes that
particular GO term and all its offspring.
> offspring <- get("GO:0007338", GOBPOFFSPRING)
> egids <- unique(unlist(mget(c("GO:0007338", offspring),
revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE))
> egids[!is.na(egids)]
[1] "1047" "4179" "4240" "4486" "4809"
"5016"
[7] "6674" "7783" "7784" "7802" "7993"
"8747"
[13] "8748" "8852" "9082" "10007" "10361"
"22917"
[19] "26476" "53340" "57055" "57829" "64100"
"93185"
[25] "158062" "442868" "100131137" "49" "410"
"2683"
[31] "3010" "4184" "6677" "7142" "7455"
"8857"
[37] "11055" "124626" "2054" "2741" "10343"
"10566"
[43] "27297" "152015" "3074" "167" "928"
"2515"
[49] "5104" "23553" "284359" "164684" "7141"
"79400"
Best,
Jim
Tim Smith wrote:
> Thanks James. If I can tweak that function, I'll get exactly what I
want.
>
> I tried what you suggested and got the following error:
>
> ---------------------------
> ### 'genes1' are the Entrez IDs of my genes of interest, and
'allGenes' is the universe of Entrez IDs
>
> paramsGO <- new("GOHyperGParams", geneIds = genes1,
> universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
> ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
> testDirection = "over")
>
> GO <- hyperGTest(paramsGO)
> ps <- probeSetSummary(GO)
>
> Error in get(mapName, envir = pkgEnv, inherits = FALSE) :
> variable "org.Hs.egENTREZID" was not found
> --------------------------------
>
> I guess the function would return the probe ids if I was using them,
but I have Entrez IDs as input.
>
> Or am I doing something wrong?
>
> thanks!
>
>
>
>
>
> ----- Original Message ----
> From: James W. MacDonald <jmacdon at="" med.umich.edu="">
>
> Cc: bioc <bioconductor at="" stat.math.ethz.ch="">
> Sent: Wednesday, October 22, 2008 9:10:39 AM
> Subject: Re: [BioC] GOstat: listing genes from hyperGTest
>
> Hi Tim,
>
> Does probeSetSummary() do what you want?
>
> Best,
>
> Jim
>
>
>
> Tim Smith wrote:
>>
>> Hi,
>>
>> I
>> was performing a hyperGTest for genes in homo-sapiens. For a set of
>> input genes, this function returns some 'significant' GO terms.
What I
>> wanted to now do was to co-relate each significant GO term
(returned by
>> this function) with genes (from my set of input genes) associated
with
>> that GO term. However, I think that I may be using the wrong
>> package/function to get the releveant set of genes.
>>
>> Currently, what I'm doing is finding the significant GO terms by
using the following code:
>>
>> -----------------------
>> ### 'genes1' are the Entrez IDs of my genes of interest, and
'allGenes' is the universe of Entrez IDs
>>
>> paramsGO <- new("GOHyperGParams", geneIds = genes1,
>> universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
>> ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
>> testDirection = "over")
>>
>> GO <- hyperGTest(paramsGO)
>> --------------------------
>> This
>> gives me a set of significant GO terms. Now, I would like to find
which
>> subset of genes in 'genes1' is associated with each of the
significant
>> GO term. To do this I map all GO terms to their Entrez IDs using
the
>> 'org.Hs.eg.db' package using the following:
>>
>> xx <- as.list(org.Hs.egGO2EG)
>>
>> to
>> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms
(isn't
>> this number small?) that map to at least one Entrez ID. So, from
here I
>> look up which Entrez IDs are associated with my GO term of
interest.
>>
>> My
>> problem is that often, the GO term from hyperGTest is not
associated
>> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
>> above ), i.e. the GO term/ID is not in the list obtained from
>> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up
by
>> hyperGTest, but does not appear to be associated with any Entrez
IDs in
>> the org.Hs.eg.db package. Where could I be going wrong?
>>
> [[elided Yahoo spam]]
>> Thanks for any comments/suggestions. I realize that I'm probably
doing something really stupid here....
>>
>> My sessionInfo() is:
>> --------------------------------
>> R version 2.7.2 (2008-08-25)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United
>> States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] grid splines tools stats graphics grDevices
utils datasets methods base
>>
>> other attached packages:
>> [1]
>> gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0
>> gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0
>> Category_2.6.0
>> [8] RBGL_1.16.0 annotate_1.18.0
>> xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0
>> GO.db_2.2.0 KEGG.db_2.2.0
>> [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8
DBI_0.2-4 genefilter_1.20.0 survival_2.34-1
affy_1.18.0
>> [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.11.11 MASS_7.2-44
>>
>>
>> ---------------------------------
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
------------------------------
Message: 22
Date: Wed, 22 Oct 2008 10:48:02 -0700
From: Patrick Aboyoun <paboyoun@fhcrc.org>
Subject: Re: [BioC] Bioconductor installation problem: unable to
access repository
To: Shinichiro Wachi <swachi at="" ucdavis.edu="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF6752.1020901 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Shin,
I appears that R 2.8 has changed the way it regulates Mac OS X binary
packages for users running Mac OS X 10.5 (Leopard). We have just
become aware of this change and will be adjusting the Bioconductor Mac
OS X repositories accordingly over the next few days to adjust to
these changes. The good news is that R 2.8 supports binary packages
for both Mac OS X 10.4 (Tiger) and Mac OS X 10.5 (Leopard). I'll send
out an e-mail to this group when the Mac OS X 10.5 (Leopard) packages
are available for BioC 2.3.
Patrick
Shinichiro Wachi wrote:
> I am installing Bioconductor on a new machine (Intel Mac running OSX
10.5.5), R
> version is 2.8.0.
>
> sudo R is used for this session.
>
> These are the error messages I get:
>
> source("http://bioconductor.org/biocLite.R")
> biocinstall()
>
> Running biocinstall version 2.3.8 with R version 2.8.0 (under
development)
> Your version of R requires version 2.3 of Bioconductor.
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/bioc/bin/macosx//contrib/2.8
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/data/annotation/bin/macosx//con
trib/2.8
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/data/experiment/bin/macosx//con
trib/2.8
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/extra/bin/macosx//contrib/2.8
> Warning: unable to access index for repository
> http://cran.fhcrc.org/bin/macosx//contrib/2.8
> Warning message:
> package 'Biobase' is not available
>
> http://bioconductor.org/packages/2.3/bioc/src/contrib/PACKAGES
> will produce a web page displaying packages. (I read a post that
asked for this.
> I am assuming this was a quick server diagnostics).
>
> Is there a problem with the server, or is this a temporary glitch in
the
> installer? Is there a workaround?
>
> Much thanks.
>
> Shin
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
------------------------------
Message: 23
Date: Wed, 22 Oct 2008 13:36:30 -0700
From: Patrick Aboyoun <paboyoun@fhcrc.org>
Subject: [BioC] Bioconductor 2.3 is released
To: BioC list <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <48FF8ECE.4060201 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Bioconductors:
We are pleased to announce the release of Bioconductor 2.3. This
release
includes 36 new software packages, and many changes to existing
packages. Bioconductor 2.3 is comprised of 294 software packages and
is
compatible with the recently released R 2.8.0.
Please visit http://bioconductor.org for details and downloads.
IMPORTANT NOTE FOR MAC USERS: R 2.8.0 is using a new Mac OS X binary
package distribution system and the CRAN and BioC repositories need to
catch up with this change. If you are using Mac OS X, please refrain
from migrating to R-2.8.0 until these new binary package repositories
are put in place, or use 'type="source"' when installing packages
using
biocLite.
Contents
========
o Release Highlight
o Getting Started with Bioconductor 2.3
o New Software Packages
o Software Packages in 2.2 that didn't make it to 2.3
Release Highlight
=================
This release contains a septet of packages (BSgenome, Biostrings,
ShortRead, IRanges, HilbertVis, HilbertVisGUI, and rtracklayer) that
are
suited to analyze 'next generation' high-throughput DNA sequence data.
The BSgenome package provides the backbone for representing genome
sequences from many different organisms including human, mouse, rat,
dog, chimp, chicken, cow, fruit fly, honey bee, yeast, E. coli, C.
elegans, and arabidopsis. The Biostrings package performs fast or
flexible alignments between reads and genomes. The ShortRead package
provides tools for importation/exportation and quality assurance of
common data formats. The IRanges package offers an emerging
infrastructure for managing very large data objects and for range-
based
data representation. The packages HilbertVis and HilbertVisGUI display
data with space-filling (Hilbert) curves that maintain the spatial
information implied by the linearity of chromosomes. The rtracklayer
package provides an interface to genome browsers and their annotation
tracks.
Getting Started with Bioconductor 2.3
=====================================
IMPORTANT: MAC USERS: see the important note above.
To install Bioconductor 2.3
1. Install R 2.8.0. Bioconductor 2.3 has been designed expressly for
this version of R.
2. Follow the instructions here:
http://bioconductor.org/docs/install
Please visit http://bioconductor.org for details and downloads.
New Packages
============
The following packages are new in this release of Bioconductor; visit
http://bioconductor.org/packages/release/Software.html
for links to all package descriptions.
affyContam
Structured corruption of cel file data to demonstrate QA
effectiveness
Agi4x44PreProcess
Preprocesses Agilent 4x44 array data
ArrayExpress
Accesses the ArrayExpress microarray database at EBI
arrayMvout
Analyzes AffyBatch instances
ArrayTools
Quality assessment and differentially gene expression detection for
Affymetrix GeneChips
BicARE
Biclustering Analysis and Results Exploration
CGHbase
Base functions and classes for arrayCGH data analysis
CGHregions
Dimension reduction for arrayCGH data with minimal information loss
ChemmineR
Compound Data Mining Framework
CMA
Synthesis of microarray-based classification
DFP
Supervised technique for identifying differentially expressed genes
using Fuzzy Patterns (FPs).
domainsignatures
Finds significantly enriched gene classifications based on their
InterPro domain structure
dualKS
Training and classifying gene expression data sets using a
Kolmogorov-Smirnov rank-sum based algorithm
edgeR
Estimates and tests for differential expression in multiple digital
gene expression libraries
HELP
Pipeline for analyzing HELP microarray data that includes graphical
and
mathematical tools
HilbertVis
Functions to visualize long vectors of integer data by means of
Hilbert
curves
HilbertVisGUI
An interactive tool to visualize long vectors of integer data by
means
of Hilbert curves
IRanges
Infrastructure for managing large data objects and range-based data
representations
ITALICS
Normalizes Affymetrix GeneChip Human Mapping 100K and 500K set
iterativeBMA
Bayesian Model Averaging (BMA) of classification models of 2-class
microarray samples
iterativeBMAsurv
Uses Bayesian Model Averaging (BMA) of survival analysis models of
microarray data
KCsmart
Multi-sample aCGH analysis package using kernel convolution
logitT
Implements the Logit-t algorithm
LPEadj
Extends the LPE algorithm
MEDME
Determines absolute and relative DNA methylation scores from MeDIP
enrichment measurements
miRNApath
Provides pathway enrichment techniques for miRNA expression data
microRNA
Accesses different data resources for microRNAs
minet
Implements methods for inferring mutual information networks from
data.
multiscan
Estimates gene expressions from several laser scans of the same
microarray
parody
Provides routines for univariate and multivariate outlier detection
PLPE
Performs tests for paired high-throughput data
RNAither
Analyzes cell-based RNAi screens
RpsiXML
Queries, data structure and interface to visualization of interaction
datasets
SIM
Finds associations between DNA copy number and gene expression
ShortRead
Representation of high-throughput, short-read sequencing data
xmapbridge
Plots graphs in the X:Map genome browser
Software Packages in 2.2 that didn't make it to 2.3
===================================================
1. SemSim
2. widgetInvoke
[[elided Yahoo spam]]
The Biocore Team
------------------------------
Message: 24
Date: Wed, 22 Oct 2008 16:04:53 -0500
From: Jenny Drnevich <drnevich@illinois.edu>
Subject: Re: [BioC] How to save result from limma
To: Gordon K Smyth <smyth at="" wehi.edu.au="">
Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <200810222104.m9ML4ss3012169 at expredir5.cites.uiuc.edu>
Content-Type: text/plain; charset="us-ascii"; format=flowed
Hi Gordon,
I just downloaded the new R 2.8.0 release and limma 2.16.0. I was
checking out the new sort="none" option in topTable and I found an
error in the help page for ?topTable. It doesn't list sort.by="none"
as a possibility for topTable or toptable, but does list it for
topTableF. However, it's actually the reverse; sort.by="none" works
when using topTable with only 1 coef but not with more than 1 coef.
Just thought I'd let you and the archives know...
Thanks again!
Jenny
At 07:15 PM 8/19/2008, Gordon K Smyth wrote:
>On Tue, 19 Aug 2008, Jenny Drnevich wrote:
>
>>At 06:14 PM 8/13/2008, Gordon K Smyth wrote:
>>>OK, I've added sort="none" to the possibilities.
>>>Best wishes
>>>Gordon
>>
>>Hi Gordon,
>>
>>Should this change to topTable be up on BioC by now? I just updated
>>my packages on R 2.7.1 and the latest limma_2.14.5 does not have
>>it. Neither does the developmental version limma_2.15.10 on R 2.8.0
>>dev. Usually your changes appear very quickly...
>>
[[elided Yahoo spam]]
>>Jenny
>
>Not committed to BioC yet. I'm getting older and slower. Also,
>there will be a number of code additions in my next commit to BioC,
>which I'm still finalising.
>
>Best wishes
>Gordon
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu
------------------------------
Message: 25
Date: Wed, 22 Oct 2008 17:39:24 -0400
From: "Hui-Yi Chu" <huiyi.chu@gmail.com>
Subject: [BioC] scale questions
To: bioconductor at stat.math.ethz.ch
Message-ID:
<aaeddd3f0810221439r19fe9f56rc06ffc812fa8405e at="" mail.gmail.com="">
Content-Type: text/plain
Dear List,
I think this may be a simple question for you but I wanna make it sure
for
further steps.
I have already done some of data pre-processing procedures for my
affymetrix
yeast2 arrays. My next step is to get *ratios* from various conditions
in wt
and mutant following by fold-change comparison. So my question is
which step
I should scale my dataframe for comparison?
Here are parts of my codes (codes with underline are the questions):
wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17])
wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18])
wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2)
wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f))
*wt.pt.f <- scale(exprs(wt.pt.f)) ### not sure
*mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21])
mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22])
mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2)
mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f))
*mut.pt.f <- scale(exprs(mut.pt.f)) **### not sure*
gg <- cbind(wt.pt.f, mut.pt.f)
*gg <- scale(gg)* *### not sure*
pt.f1 <- gg[,3]-gg[,1]
pt.f2 <- gg[,4]-gg[,2]
gg1 <- cbind(gg, pt.f1, pt.f2)
gg2 <- new("ExpressionSet", exprs=as.matrix(gg))
....... followed by further steps
As seen above, where should I scale the ratios? Scale wt and mut
separately
or together before getting pt.f1 and pt.f2 ratios??
[[elided Yahoo spam]]
Hui-Yi
[[alternative HTML version deleted]]
------------------------------
Message: 26
Date: Wed, 22 Oct 2008 15:09:08 -0700
From: Florian Hahne <fhahne@fhcrc.org>
Subject: Re: [BioC] [Fwd: batch info for cellHTS]
To: Yan Zhou <yan.zhou at="" fccc.edu="">
Cc: Bioconductor_help <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <48FFA484.7050405 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Yan,
the idea was that the user constructs the batch array manually and
assigns it to the batch slot using the accessor method, e.g.
bt <- array(rep(1:2, each=5, dim=c(5,2,1))
batch(x) <- bt
I agree that it would be useful to have that functionality directly in
the import functions. The only natural place where the batch
information
could go is in the platelist file. In the plateconf file, we don't
have
the notion of plate replicates or samples any more.
Attached you find a slightly modified readPlateList function which
evaluates an (optional) "Batch" column in the platelist file.
Hope that is the functionality you where looking for. Please let me
know
if you have further input.
Thanks,
Florian
PS: I forwarded this conversation to the mailing list. Others might
benefit from it...
readPlateList <- function(filename,
path=dirname(filename),
name,
importFun,
verbose=interactive())
{
file <- basename(filename)
dfiles <- dir(path)
if(!(is.character(path)&&length(path)==1))
stop("'path' must be character of length 1")
pd <- read.table(file.path(path, file), sep="\t", header=TRUE,
as.is=TRUE)
checkColumns(pd, file, mandatory=c("Filename", "Plate",
"Replicate"),
numeric=c("Plate", "Replicate", "Channel", "Batch"))
## consistency check for "importFun"
if (!missing(importFun)) {
if (!is(importFun, "function"))
stop("'importFun' should be a function to use to read the
raw data files")
} else {
## default function (compatible with the file format of the
plate reader)
importFun <- function(f) {
txt <- readLines(f, warn=FALSE)
sp <- strsplit(txt, "\t")
well <- sapply(sp, "[", 2)
val <- sapply(sp, "[", 3)
out <- list(data.frame(well=I(well), val=as.numeric(val)),
txt=I(txt))
return(out)
}
}
## check if the data files are in the given directory
a <- unlist(sapply(pd$Filename, function(z) grep(z, dfiles,
ignore.case=TRUE)))
if (length(a)==0)
stop(sprintf("None of the files were found in the given
'path':
%s", path))
f <- file.path(path, dfiles[a])
## check if 'importFun' gives the output in the desired form
aux <- importFun(f[1])
if (which(unlist(lapply(aux, is, "data.frame"))) != 1 |
!all(c("val", "well") %in% names(aux[[1]])) | length(aux)!=2)
stop("The output of 'importFun' must be a list with 2
components;\n",
"the first component should be a 'data.frame' with slots
'well' and 'val'.")
## auto-determine the plate format
well <- as.character(importFun(f[1])[[1]]$well)
let <- substr(well, 1, 1)
num <- substr(well, 2, 3)
let <- match(let, LETTERS)
num <- as.integer(num)
if(anyis.na(let))||anyis.na(num)))
stop(sprintf("Malformated column 'well' in input file %s",
f[1]))
dimPlate <- c(nrow=max(let), ncol=max(num))
nrWell <- prod(dimPlate)
if(verbose)
cat(sprintf("%s: found data in %d x %d (%d well) format.\n",
name,
dimPlate[1], dimPlate[2], nrWell))
## Should we check whether these are true?
## "96" = c(nrow=8, ncol=12),
## "384" = c(nrow=16, ncol=24),
nrRep <- max(pd$Replicate)
nrPlate <- max(pd$Plate)
combo <- paste(pd$Plate, pd$Replicate)
## Channel: if not given, this implies that there is just one
if("Channel" %in% colnames(pd)) {
nrChannel <- max(pd$Channel)
channel <- pd$Channel
combo <- paste(combo, pd$Channel)
} else {
nrChannel <- 1L
channel <- rep(1L, nrow(pd))
pd$Channel <- channel
}
whDup <- which(duplicated(combo))
if(length(whDup)>0L) {
idx <- whDup[1:min(5L, length(whDup))]
msg <- paste("The following rows are duplicated in the
plateList
table:\n",
"\tPlate Replicate Channel\n", "\t",
paste(idx, combo[idx], sep="\t",
collapse="\n\t"),
if(length(whDup)>5) sprintf("\n\t...and %d
more.\n",
length(whDup)-5),
"\n",
sep="")
stop(msg)
}
xraw <- array(NA_real_, dim=c(nrWell, nrPlate, nrRep, nrChannel))
intensityFiles <- vector(mode="list", length=nrow(pd))
names(intensityFiles) <- pd[, "Filename"]
status <- character(nrow(pd))
for(i in seq_len(nrow(pd))) {
if(verbose)
cat("\rReading ", i, ": ", pd$Filename[i], sep="")
ff <- grep(pd[i, "Filename"], dfiles, ignore.case=TRUE)
if (length(ff)!=1) {
f <- file.path(path, pd[i, "Filename"])
status[i] <- sprintf("File not found: %s", f)
} else {
f <- file.path(path, dfiles[ff])
names(intensityFiles)[i] <- dfiles[ff]
status[i] <- tryCatch({
out <- importFun(f)
pos <- convertWellCoordinates(out[[1]]$well,
dimPlate)$num
intensityFiles[[i]] <- out[[2]]
xraw[pos, pd$Plate[i], pd$Replicate[i], channel[i]] <-
out[[1]]$val
"OK"
}, warning=function(e) paste(class(e)[1], e$message,
sep=": "),
error=function(e)
paste(class(e)[1],
e$message, sep=": ")
) ## tryCatch
} ## else
} ## for
if(verbose)
cat("\rRead", nrow(pd), "plates. \n\n")
## ---- Store the data as a "cellHTS" object ----
## arrange the assayData slot:
dat <- lapply(seq_len(nrChannel), function(ch)
matrix(xraw[,,,ch], ncol=nrRep,
nrow=nrWell*nrPlate))
names(dat) <- paste("ch", seq_len(nrChannel), sep="")
adata <- do.call("assayDataNew",
c(storage.mode="lockedEnvironment",
dat))
## arrange the phenoData slot:
pdata <- new("AnnotatedDataFrame",
data <- data.frame(replicate=seq_len(nrRep),
assay=rep(name, nrRep),
stringsAsFactors=FALSE),
varMetadata=data.frame(labelDescription=c("Replicate
number",
"Biological
assay"),
channel=factor(rep("_ALL_",
2L),
levels=c(names(dat), "_ALL_")),
row.names=c("replicate",
"assay"),
stringsAsFactors=FALSE))
## arrange the featureData slot:
well <- convertWellCoordinates(seq_len(nrWell),
pdim=dimPlate)$letnum
fdata <- new("AnnotatedDataFrame",
data <- data.frame(plate=rep(seq_len(nrPlate),
each=nrWell),
well=rep(well, nrPlate),
controlStatus=factor(rep("unknown",
nrWell*nrPlate)),
stringsAsFactors=FALSE),
varMetadata=data.frame(labelDescription=c("Plate
number", "Well ID",
"Well
annotation"),
row.names=c("plate", "well",
"controlStatus"),
stringsAsFactors=FALSE))
res <- new("cellHTS",
assayData=adata,
phenoData=pdata,
featureData=fdata,
plateList=cbind(pd[,1L,drop=FALSE], status=I(status),
pd[,-1L,drop=FALSE]),
intensityFiles=intensityFiles)
## if there is a batch column in the platelist file we want to
import it
if("Batch" %in% colnames(pd)){
bat <- pd$Batch[order(pd$Replicate, pd$Channel)]
dim(bat) <- c(max(plate(res)), ncol(res),
length(channelNames(res)))
res at batch <- bat
}
## output the possible errors that were encountered along the way:
whHadProbs <- which(status!="OK")
if(length(whHadProbs)>0 & verbose) {
idx <- whHadProbs[1:min(5, length(whHadProbs))]
msg <- paste("Please check the following problems encountered
while reading the data:\n",
"\tFilename \t Error\n", "\t",
paste(plateList(res)$Filename[idx], status[idx],
sep="\t", collapse="\n\t"),
if(length(whHadProbs)>5) sprintf("\n\t...and %d
more.\n",
length(whHadProbs)-5), "\n", sep="")
stop(msg)
}
return(res)
}
Yan Zhou wrote:
> Dear Florian,
>
> I understand the different meaning of batch from the 2 cellHTS
> packages. I just don't know how to add the "batch" slot.(we have a
> large screen with screen done on different day,we want to use the
> variance adjustment by batch function). I tried to add a "batch"
> column in the plate configuration file, but doesn't seem to be taken
> into cellHTS2 object. then I tried to add "batch" column in the
plate
> list file , still didn't do anything. In another word, I couldn't
> figure out how to add the batch slot to the cellHTS object. please
[[elided Yahoo spam]]
>
> yan
>
> Florian Hahne wrote:
>
>> Hi Yan,
>> Ligia forwarded your mail to me.
>>
>> The batch concept is a little bit different in cellHTS2. Basically,
>> we separated the ability to included several plate configurations
and
>> the batch-specific parameter estimation (e.g. in the normalizePlate
>> function). For the former, you can use a regular expression syntax
in
>> your plate configuration file, while the latter has to be added
>> manually into the new 'batch' slot. All of this is explained in
much
>> more detail in the package vignette: cellHTS2 - Main vignette
>> (complete version): End-to-end analysis of cell-based screens
>>
>> Let me know if this doesn't get you anywhere.
>>
>> Florian
>>
>> ligia at ebi.ac.uk wrote:
>>
>>> Hi Florian,
>>>
>>> Hope you're doing fine.
>>> Could you take care of this for me?
>>> Cheers,
>>> Ligia
>>>
>>>
>>> ------------------------- Original Message
----------------------------
>>> Subject: batch info for cellHTS
>>> From: "Yan Zhou" <yan.zhou at="" fccc.edu="">
>>> Date: Fri, October 17, 2008 19:14
>>> To: ligia at ebi.ac.uk
>>> ------------------------------------------------------------------
--------
>>>
>>>
>>> Dear Ligia,
>>>
>>> I'm using the cellHTS2 package for HTS data analysis. For the old
>>> cellHTS package, I knew how to incorporate the batch information.
But
>>> for the new cellHTS2 package, I couldn't have it done right. I
attached
>>> my plate configuration file and also plate list file with this
email. I
>>> was wondering whether you would have time to help me take a look,
and
>>> point me to the right directions. Thanks a lot for your time and
>>> kind help.
>>>
>>> yan
>>>
>>
>>
>>
>
--
Florian Hahne, PhD
Computational Biology Program
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-3148
fhahne at fhcrc.org
------------------------------
Message: 27
Date: Wed, 22 Oct 2008 19:19:26 -0400
From: "Mark Kimpel" <mwkimpel@gmail.com>
Subject: [BioC] problem with Category package and custom annotationDbi
To: "Bioconductor Newsgroup" <bioconductor at="" stat.math.ethz.ch="">
Message-ID:
<6b93d1830810221619x7ad5a565hf08bf287839f74f0 at
mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Today I believe I successfuly built a annotation package for my Affy
Rat Gene ST data using annotationDbi, at least I got no errors during
the build and it loads properly. I get the following error output,
however, when I try to run hyperGTest, package Category, on a vector
of Entrez Gene IDs and a vector of the gene universe of the chip. I
suspect I did something wrong when building the annotation package,
but I have no clue what that could be. I've used this same code with
chipsets whose annotation packages are built by the BioConductor team
without issue.
> params <- new("GOHyperGParams", geneIds = myEGs,
+ universeGeneIds = myGeneUniverse,
+ annotation = annotation(AOP$eSet),
+ ontology = "BP", pvalueCutoff = 0.05, conditional
= TRUE, testDirection = "over")
> params
A GOHyperGParams instance
category: GO
annotation: ragene10stv1
> hyperGTest(params)
Error in getUniverseHelper(probes, datPkg, entrezIds) :
No Entrez Gene ids left in universe
Enter a frame number, or 0 to exit
1: hyperGTest(params)
2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase",
"hyperGT
3: is(object, Cl)
4: is(object, Cl)
5: universeBuilder(p)
6: universeBuilder(p)
7: getUniverseViaGo(p)
8: getUniverseHelper(probes, datPkg, entrezIds)
Selection: 8
Called from: eval(expr, envir, enclos)
Browse[1]> ls()
[1] "datPkg" "entrezIds" "probes" "univ"
Browse[1]> datPkg
An object of class "ArabadopsisDatPkg"
Slot "name":
[1] "ragene10stv1"
Browse[1]> entrezIds[1:5]
[1] "65049" "60444" "313914" "140941" "306868"
Browse[1]> probes[1:5]
[1] "10701636" "10701643" "10701654" "10701663" "10701679"
Browse[1]> univ
character(0)
Browse[1]>
------------------------------------------------------------
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Home
Skype: mkimpel
"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
------------------------------
Message: 28
Date: Wed, 22 Oct 2008 16:41:38 -0700
From: Marc Carlson <mcarlson@fhcrc.org>
Subject: Re: [BioC] problem with Category package and custom
annotationDbi
To: Mark Kimpel <mwkimpel at="" gmail.com="">
Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <48FFBA32.3030406 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1
Hi Mark,
If the package you built and installed were called "MarkPackage.db",
then what would be the output of "MarkPackage()" (right after you
loaded
it)? Also, what is the output of sessionInfo()?
Marc
Mark Kimpel wrote:
> Today I believe I successfuly built a annotation package for my Affy
> Rat Gene ST data using annotationDbi, at least I got no errors
during
> the build and it loads properly. I get the following error output,
> however, when I try to run hyperGTest, package Category, on a vector
> of Entrez Gene IDs and a vector of the gene universe of the chip. I
> suspect I did something wrong when building the annotation package,
> but I have no clue what that could be. I've used this same code with
> chipsets whose annotation packages are built by the BioConductor
team
> without issue.
>
>
>> params <- new("GOHyperGParams", geneIds = myEGs,
>>
> + universeGeneIds = myGeneUniverse,
> + annotation = annotation(AOP$eSet),
> + ontology = "BP", pvalueCutoff = 0.05,
conditional
> = TRUE, testDirection = "over")
>
>> params
>>
> A GOHyperGParams instance
> category: GO
> annotation: ragene10stv1
>
>> hyperGTest(params)
>>
> Error in getUniverseHelper(probes, datPkg, entrezIds) :
> No Entrez Gene ids left in universe
>
> Enter a frame number, or 0 to exit
>
> 1: hyperGTest(params)
> 2: .valueClassTest(standardGeneric("hyperGTest"),
"HyperGResultBase", "hyperGT
> 3: is(object, Cl)
> 4: is(object, Cl)
> 5: universeBuilder(p)
> 6: universeBuilder(p)
> 7: getUniverseViaGo(p)
> 8: getUniverseHelper(probes, datPkg, entrezIds)
>
> Selection: 8
> Called from: eval(expr, envir, enclos)
> Browse[1]> ls()
> [1] "datPkg" "entrezIds" "probes" "univ"
> Browse[1]> datPkg
> An object of class "ArabadopsisDatPkg"
> Slot "name":
> [1] "ragene10stv1"
>
> Browse[1]> entrezIds[1:5]
> [1] "65049" "60444" "313914" "140941" "306868"
> Browse[1]> probes[1:5]
> [1] "10701636" "10701643" "10701654" "10701663" "10701679"
> Browse[1]> univ
> character(0)
> Browse[1]>
>
> ------------------------------------------------------------
> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN 46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 399-1219 Home
> Skype: mkimpel
>
> "The real problem is not whether machines think but whether men do."
> -- B. F. Skinner
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
------------------------------
Message: 29
Date: Wed, 22 Oct 2008 19:51:17 -0400
From: "Sean Davis" <sdavis2@mail.nih.gov>
Subject: Re: [BioC] scale questions
To: "Hui-Yi Chu" <huiyi.chu at="" gmail.com="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<264855a00810221651n656b5a8ak226daa013f282fab at
mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 5:39 PM, Hui-Yi Chu <huiyi.chu at="" gmail.com="">
wrote:
> Dear List,
>
> I think this may be a simple question for you but I wanna make it
sure for
> further steps.
> I have already done some of data pre-processing procedures for my
affymetrix
> yeast2 arrays. My next step is to get *ratios* from various
conditions in wt
> and mutant following by fold-change comparison. So my question is
which step
> I should scale my dataframe for comparison?
>
> Here are parts of my codes (codes with underline are the questions):
>
> wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17])
> wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18])
> wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2)
> wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f))
> *wt.pt.f <- scale(exprs(wt.pt.f)) ### not sure
>
> *mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21])
> mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22])
> mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2)
> mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f))
> *mut.pt.f <- scale(exprs(mut.pt.f)) **### not sure*
>
> gg <- cbind(wt.pt.f, mut.pt.f)
> *gg <- scale(gg)* *### not sure*
> pt.f1 <- gg[,3]-gg[,1]
> pt.f2 <- gg[,4]-gg[,2]
> gg1 <- cbind(gg, pt.f1, pt.f2)
> gg2 <- new("ExpressionSet", exprs=as.matrix(gg))
> ....... followed by further steps
>
>
> As seen above, where should I scale the ratios? Scale wt and mut
separately
> or together before getting pt.f1 and pt.f2 ratios??
[[elided Yahoo spam]]
I may be misunderstanding, but why do you want to scale the log
ratios? Generally, you would not scale them at all. And I really
cannot tell why you are forming ratios in the first place? Do you
have replicates of any kind?
Sean
------------------------------
Message: 30
Date: Wed, 22 Oct 2008 20:31:33 -0400
From: "Sean Davis" <sdavis2@mail.nih.gov>
Subject: Re: [BioC] scale questions
To: "Hui-Yi Chu" <huiyi.chu at="" gmail.com="">
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<264855a00810221731p1abc243cj1a59dc6726ebc7e0 at
mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 8:16 PM, Hui-Yi Chu <huiyi.chu at="" gmail.com="">
wrote:
> Hi Sean,
>
> Yes, they are replicate in ratios. In other words, these ratios are
from
> untreatment and treatment of wt1, wt2, mut1,mut2, so total is 8
ratios as
> below.
>
> untreatmen wt1, wt2, mut1, mut2
> treatment wt1, wt2, mut1, mut2
>
> And I wanna get the second values like:
> r1: untreatment mut1/wt1
> r2: untreatment mut2/wt2
> r3: treatment mut1/wt1
> r4: treatment mut2/wt2
>
> Now we have 4 ratios. And then I will compare r3 vs r1, r4 vs r2 to
get most
> fold changes genes. (I know I need three replicates, but I cannot
convince
> my adviser, therefore, this is the strategy I can use so far. ) So
the
> question is should I scale the 8 ratios from wt and mut separately
(twice)
> or together(once) before I get the four values?
Hi, Hui-Yi.
You should not do any scaling. You can use the log fold changes
directly. You seem to understand that this is not an optimal design,
so I won't belabor that point.
Sean
>
> On Wed, Oct 22, 2008 at 7:51 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote:
>>
>> On Wed, Oct 22, 2008 at 5:39 PM, Hui-Yi Chu <huiyi.chu at="" gmail.com=""> wrote:
>> > Dear List,
>> >
>> > I think this may be a simple question for you but I wanna make it
sure
>> > for
>> > further steps.
>> > I have already done some of data pre-processing procedures for my
>> > affymetrix
>> > yeast2 arrays. My next step is to get *ratios* from various
conditions
>> > in wt
>> > and mutant following by fold-change comparison. So my question is
which
>> > step
>> > I should scale my dataframe for comparison?
>> >
>> > Here are parts of my codes (codes with underline are the
questions):
>> >
>> > wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17])
>> > wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18])
>> > wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2)
>> > wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f))
>> > *wt.pt.f <- scale(exprs(wt.pt.f)) ### not sure
>> >
>> > *mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21])
>> > mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22])
>> > mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2)
>> > mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f))
>> > *mut.pt.f <- scale(exprs(mut.pt.f)) **### not sure*
>> >
>> > gg <- cbind(wt.pt.f, mut.pt.f)
>> > *gg <- scale(gg)* *### not sure*
>> > pt.f1 <- gg[,3]-gg[,1]
>> > pt.f2 <- gg[,4]-gg[,2]
>> > gg1 <- cbind(gg, pt.f1, pt.f2)
>> > gg2 <- new("ExpressionSet", exprs=as.matrix(gg))
>> > ....... followed by further steps
>> >
>> >
>> > As seen above, where should I scale the ratios? Scale wt and mut
>> > separately
>> > or together before getting pt.f1 and pt.f2 ratios??
[[elided Yahoo spam]]
>>
>> I may be misunderstanding, but why do you want to scale the log
>> ratios? Generally, you would not scale them at all. And I really
>> cannot tell why you are forming ratios in the first place? Do you
>> have replicates of any kind?
>>
>> Sean
>
>
------------------------------
Message: 31
Date: Wed, 22 Oct 2008 21:26:16 -0400
From: "Mark Kimpel" <mwkimpel@gmail.com>
Subject: Re: [BioC] problem with Category package and custom
annotationDbi
To: "Marc Carlson" <mcarlson at="" fhcrc.org="">
Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch="">
Message-ID:
<6b93d1830810221826y71050c5ckb14dbca4644c9bf7 at
mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
> ragene10stv1()
Quality control information for ragene10stv1:
This package has the following mappings:
ragene10stv1ACCNUM has 0 mapped keys (of 29214 keys)
ragene10stv1ALIAS2PROBE has 31410 mapped keys (of 31410 keys)
ragene10stv1CHR has 20998 mapped keys (of 29214 keys)
ragene10stv1CHRLENGTHS has 23 mapped keys (of 23 keys)
ragene10stv1CHRLOC has 13092 mapped keys (of 29214 keys)
ragene10stv1CHRLOCEND has 13092 mapped keys (of 29214 keys)
ragene10stv1ENSEMBL has 15812 mapped keys (of 29214 keys)
ragene10stv1ENSEMBL2PROBE has 14570 mapped keys (of 14570 keys)
ragene10stv1ENTREZID has 21170 mapped keys (of 29214 keys)
ragene10stv1ENZYME has 1586 mapped keys (of 29214 keys)
ragene10stv1ENZYME2PROBE has 705 mapped keys (of 705 keys)
ragene10stv1GENENAME has 21170 mapped keys (of 29214 keys)
ragene10stv1GO has 13813 mapped keys (of 29214 keys)
ragene10stv1GO2ALLPROBES has 9850 mapped keys (of 9850 keys)
ragene10stv1GO2PROBE has 7430 mapped keys (of 7430 keys)
ragene10stv1MAP has 20351 mapped keys (of 29214 keys)
ragene10stv1PATH has 4357 mapped keys (of 29214 keys)
ragene10stv1PATH2PROBE has 206 mapped keys (of 206 keys)
ragene10stv1PFAM has 17187 mapped keys (of 29214 keys)
ragene10stv1PMID has 12198 mapped keys (of 29214 keys)
ragene10stv1PMID2PROBE has 36641 mapped keys (of 36641 keys)
ragene10stv1PROSITE has 17187 mapped keys (of 29214 keys)
ragene10stv1REFSEQ has 18683 mapped keys (of 29214 keys)
ragene10stv1SYMBOL has 21170 mapped keys (of 29214 keys)
ragene10stv1UNIGENE has 17585 mapped keys (of 29214 keys)
ragene10stv1UNIPROT has 9826 mapped keys (of 29214 keys)
Additional Information about this package:
DB schema: RATCHIP_DB
DB schema version: 1.0
Organism: Rattus norvegicus
Date for NCBI data: 2008-Sep2
Date for GO data: 200808
Date for KEGG data: 2008-Sep2
Date for Golden Path data: 2006-Jun20
Date for IPI data: 2008-Sep02
Date for Ensembl data: 2008-Jul23
> sessionInfo()
R version 2.8.0 (2008-10-20)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US
.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N
AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI
FICATION=C
attached base packages:
[1] tools stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] ragene10stv1.db_1.0.0 RSQLite_0.7-0 DBI_0.2-4
[4] AnnotationDbi_1.4.0 Biobase_2.2.0 graph_1.20.0
loaded via a namespace (and not attached):
[1] cluster_1.11.11
>
------------------------------------------------------------
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Home
Skype: mkimpel
"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
******************************************************************
On Wed, Oct 22, 2008 at 7:41 PM, Marc Carlson <mcarlson at="" fhcrc.org="">
wrote:
> Hi Mark,
>
> If the package you built and installed were called "MarkPackage.db",
> then what would be the output of "MarkPackage()" (right after you
loaded
> it)? Also, what is the output of sessionInfo()?
>
> Marc
>
>
>
>
> Mark Kimpel wrote:
>> Today I believe I successfuly built a annotation package for my
Affy
>> Rat Gene ST data using annotationDbi, at least I got no errors
during
>> the build and it loads properly. I get the following error output,
>> however, when I try to run hyperGTest, package Category, on a
vector
>> of Entrez Gene IDs and a vector of the gene universe of the chip. I
>> suspect I did something wrong when building the annotation package,
>> but I have no clue what that could be. I've used this same code
with
>> chipsets whose annotation packages are built by the BioConductor
team
>> without issue.
>>
>>
>>> params <- new("GOHyperGParams", geneIds = myEGs,
>>>
>> + universeGeneIds = myGeneUniverse,
>> + annotation = annotation(AOP$eSet),
>> + ontology = "BP", pvalueCutoff = 0.05,
conditional
>> = TRUE, testDirection = "over")
>>
>>> params
>>>
>> A GOHyperGParams instance
>> category: GO
>> annotation: ragene10stv1
>>
>>> hyperGTest(params)
>>>
>> Error in getUniverseHelper(probes, datPkg, entrezIds) :
>> No Entrez Gene ids left in universe
>>
>> Enter a frame number, or 0 to exit
>>
>> 1: hyperGTest(params)
>> 2: .valueClassTest(standardGeneric("hyperGTest"),
"HyperGResultBase", "hyperGT
>> 3: is(object, Cl)
>> 4: is(object, Cl)
>> 5: universeBuilder(p)
>> 6: universeBuilder(p)
>> 7: getUniverseViaGo(p)
>> 8: getUniverseHelper(probes, datPkg, entrezIds)
>>
>> Selection: 8
>> Called from: eval(expr, envir, enclos)
>> Browse[1]> ls()
>> [1] "datPkg" "entrezIds" "probes" "univ"
>> Browse[1]> datPkg
>> An object of class "ArabadopsisDatPkg"
>> Slot "name":
>> [1] "ragene10stv1"
>>
>> Browse[1]> entrezIds[1:5]
>> [1] "65049" "60444" "313914" "140941" "306868"
>> Browse[1]> probes[1:5]
>> [1] "10701636" "10701643" "10701654" "10701663" "10701679"
>> Browse[1]> univ
>> character(0)
>> Browse[1]>
>>
>> ------------------------------------------------------------
>> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
>> Indiana University School of Medicine
>>
>> 15032 Hunter Court, Westfield, IN 46074
>>
>> (317) 490-5129 Work, & Mobile & VoiceMail
>> (317) 399-1219 Home
>> Skype: mkimpel
>>
>> "The real problem is not whether machines think but whether men
do."
>> -- B. F. Skinner
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>
------------------------------
Message: 32
Date: Thu, 23 Oct 2008 14:07:48 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth@wehi.edu.au>
Subject: Re: [BioC] How to save result from limma
To: Jenny Drnevich <drnevich at="" illinois.edu="">
Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <pine.wnt.4.64.0810231405491.2800 at="" pc602.alpha.wehi.edu.au="">
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Dear Jenny,
Thanks for the heads-up. I believe I have fixed it entirely now in
limma
2.16.2. Try it out and see if you can break it.
Best wishes
Gordon
On Wed, 22 Oct 2008, Jenny Drnevich wrote:
> Hi Gordon,
>
> I just downloaded the new R 2.8.0 release and limma 2.16.0. I was
checking
> out the new sort="none" option in topTable and I found an error in
the help
> page for ?topTable. It doesn't list sort.by="none" as a possibility
for
> topTable or toptable, but does list it for topTableF. However, it's
actually
> the reverse; sort.by="none" works when using topTable with only 1
coef but
> not with more than 1 coef. Just thought I'd let you and the archives
know...
>
> Thanks again!
> Jenny
>
> At 07:15 PM 8/19/2008, Gordon K Smyth wrote:
>
>
>> On Tue, 19 Aug 2008, Jenny Drnevich wrote:
>>
>>> At 06:14 PM 8/13/2008, Gordon K Smyth wrote:
>>>> OK, I've added sort="none" to the possibilities.
>>>> Best wishes
>>>> Gordon
>>>
>>> Hi Gordon,
>>>
>>> Should this change to topTable be up on BioC by now? I just
updated my
>>> packages on R 2.7.1 and the latest limma_2.14.5 does not have it.
Neither
>>> does the developmental version limma_2.15.10 on R 2.8.0 dev.
Usually your
>>> changes appear very quickly...
>>>
[[elided Yahoo spam]]
>>> Jenny
>>
>> Not committed to BioC yet. I'm getting older and slower. Also,
there will
>> be a number of code additions in my next commit to BioC, which I'm
still
>> finalising.
>>
>> Best wishes
>> Gordon
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at illinois.edu
>
------------------------------
Message: 33
Date: Wed, 22 Oct 2008 22:28:59 -0500
From: "Wei,Caimiao" <caiwei@mdanderson.org>
Subject: [BioC] Package "xps" "import.expr.scheme" error
To: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch="">
Message-ID:
<19D18D962A716B4EA5D26CB24F14DC26339D0DE943 at
DCPWVMBXC1VS3.mdanderson.edu>
Content-Type: text/plain
I am importing chip definition and annotation files to create a ROOT
scheme and get this error:
Error in import.expr.scheme(filename = "Scheme_HGU133p2_na26", filedir
= scmdir, :
error in function 'ImportExprSchemes'
> libdir <- "/mypath/Affy/libraryfiles"
> anndir <- "/mypath/Affy/Annotation"
> scmdir <- "/mypath/CRAN/Workspaces/Schemes"
>
> scheme.hgu133p2.na26 <-
import.expr.scheme(filename="Scheme_HGU133p2_na26",
+ filedir=scmdir,schemefile=paste(libdir,"HG-
U133_Plus_2.cdf",sep="/"),
+ probefile=paste(libdir,"HG-U133_Plus_2.probe.tab",sep="/"),
+ annotfile=paste(anndir,"HG-U133_Plus_2.na26.annot.csv",sep="/"))
Error in import.expr.scheme(filename = "Scheme_HGU133p2_na26", filedir
= scmdir, :
error in function 'ImportExprSchemes'
> sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xps_1.0.2
>
Thanks for any help!
Caimiao
[[alternative HTML version deleted]]
------------------------------
Message: 34
Date: Thu, 23 Oct 2008 02:49:47 -0400
From: Leon Peshkin <pesha@hms.harvard.edu>
Subject: Re: [BioC] Lumi and Beadstudio 1.5.13
To: <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <9EF526FD-9EA0-4489-83D1-6F3E9B93FAE9 at hms.harvard.edu>
Content-Type: text/plain; charset="US-ASCII"; format=flowed; delsp=yes
Hi Pan,
I was wondering if you could help me resolve the issue with lumi
package,
I am able to load Illumina data with lumiR, but then when I try
background adjustment
it fails:
A0 <- lumiR("killme.txt", convertNuID =FALSE, inputAnnotation
=FALSE)
> B0 <- lumiB(A0,method='bgAdjust')
Error in `[.data.frame`(control, , sampleNames(lumiBatch)) :
undefined columns selected
-Leon
------------------------------
Message: 35
Date: Thu, 23 Oct 2008 10:43:18 +0200 (CEST)
From: Clara de Dessous Ch?ri <email@dessouscheri.emv1.com>
Subject: [BioC] Offre exceptionnelle suite au probl?me technique
To: <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <20276360648.3897244.1224751398367 at schr1>
Content-Type: text/plain
Pour etre sur de recevoir tous nos emails, nous vous conseillons
d'ajouter
email at dessouscheri.emv1.com a votre carnet d'adresses.
Si cet email ne s'affiche pas correctement, vous pouvez le visualiser
grace a ce
lien
Copyright 2008 - Copyright Dessous Cheri, Tous droits reserves.
Si vous ne souhaitez plus recevoir la newsletter de Dessous
Cheri,utilisez
le lien de desabonnement
[[alternative HTML version deleted]]
------------------------------
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
End of Bioconductor Digest, Vol 68, Issue 23