I am hoping to get appropriate GO mappings for a list of genes used in
a microarray experiment with a view to identifying significantly
regulated processes.
I was planning on using the Bioconductor package GOstats to identify
these processes; however, the organism under study is not a supported
organism. I have attempted to use the blast2GO software to generate
the gene to GO mapping, but this approach seems to be very time
consuming (after generating the corresponding .fasta files, it took
over 1 hour to BLAST just 10 genes).
Currently, the gene identifiers I am using are simply the gene names,
but it shouldn't be too difficult to derive a list of corresponding
alternative identifiers (assuming they are publicly available) should
it be advantageous to the GO mapping process.
Is there any faster way to achieve this gene to GO mapping (either
through Bioconductor packages or otherwise)?
Any assistance is appreciated.
Joseph
-- output of sessionInfo():
-
--
Sent via the guest posting facility at bioconductor.org.
Hi Joseph,
What's the organism? You might be able to create an org-level package
using orgPkgFromNCBI() in the AnnotationForge package.
Best,
Jim
On 2/4/2014 7:09 PM, Joseph Shaw [guest] wrote:
> I am hoping to get appropriate GO mappings for a list of genes used
in a microarray experiment with a view to identifying significantly
regulated processes.
>
> I was planning on using the Bioconductor package GOstats to identify
these processes; however, the organism under study is not a supported
organism. I have attempted to use the blast2GO software to generate
the gene to GO mapping, but this approach seems to be very time
consuming (after generating the corresponding .fasta files, it took
over 1 hour to BLAST just 10 genes).
>
> Currently, the gene identifiers I am using are simply the gene
names, but it shouldn't be too difficult to derive a list of
corresponding alternative identifiers (assuming they are publicly
available) should it be advantageous to the GO mapping process.
>
> Is there any faster way to achieve this gene to GO mapping (either
through Bioconductor packages or otherwise)?
>
> Any assistance is appreciated.
>
> Joseph
>
> -- output of sessionInfo():
>
> -
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
Hi Jim,
Thanks for your reply!
The organism is Campylobacter jejuni (strain: NCTC11168). How can I
check if this is a viable option?
According to the reference manual for AnnotationForge, the
makeOrgPackageFromNCBI() function makes an organism package from
annotations available from NCBI, but, according to the function
arguments an author and maintainer are required; I'm not sure exactly
what this applies to.
Also, the function returns nothing; if this is the case, how can you
access the created organism package?
Joseph
On Wed, Feb 5, 2014 at 3:51 PM, James W. MacDonald <jmacdon at="" uw.edu="">
wrote:
> Hi Joseph,
>
> What's the organism? You might be able to create an org-level
package using
> orgPkgFromNCBI() in the AnnotationForge package.
>
> Best,
>
> Jim
>
>
>
> On 2/4/2014 7:09 PM, Joseph Shaw [guest] wrote:
>>
>> I am hoping to get appropriate GO mappings for a list of genes used
in a
>> microarray experiment with a view to identifying significantly
regulated
>> processes.
>>
>> I was planning on using the Bioconductor package GOstats to
identify these
>> processes; however, the organism under study is not a supported
organism. I
>> have attempted to use the blast2GO software to generate the gene to
GO
>> mapping, but this approach seems to be very time consuming (after
generating
>> the corresponding .fasta files, it took over 1 hour to BLAST just
10 genes).
>>
>> Currently, the gene identifiers I am using are simply the gene
names, but
>> it shouldn't be too difficult to derive a list of corresponding
alternative
>> identifiers (assuming they are publicly available) should it be
advantageous
>> to the GO mapping process.
>>
>> Is there any faster way to achieve this gene to GO mapping (either
through
>> Bioconductor packages or otherwise)?
>>
>> Any assistance is appreciated.
>>
>> Joseph
>>
>> -- output of sessionInfo():
>>
>> -
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
Hi Joseph,
You can check to see if it is a viable option by just giving it a
shot.
Note that the author and maintainer should in general be you, so you
would replace my oh so very droll versions with your name and email.
Also note that if you are on Windows, you need to include type =
"source" to the call to install.packages().
> makeOrgPackageFromNCBI(version = "0.0.1", author = "me <me at="" mine.com="">", maintainer = "me <me at="" mine.com="">",outputDir = ".", tax_id
= "192222", genus = "Campylobacter", species = "jejuni")
Loading required package: GO.db
Getting data for gene2pubmed.gz
Loading required package: RCurl
Loading required package: bitops
discarding data from other organisms
Populating gene2pubmed table:
table gene2pubmed filled
Getting data for gene2accession.gz
discarding data from other organisms
Populating gene2accession table:
table gene2accession filled
Getting data for gene2refseq.gz
discarding data from other organisms
Populating gene2refseq table:
table gene2refseq filled
Getting data for gene2unigene
discarding data from other organisms
Populating gene2unigene table:
table gene2unigene filled
Getting data for gene_info.gz
discarding data from other organisms
Populating gene_info table:
table gene_info filled
Getting data for gene2go.gz
discarding data from other organisms
Populating gene2go table:
Getting blast2GO data as a substitute for gene2go
table metadata filled
table map_metadata filled
table gene2go filled
table metadata filled
table map_metadata filled
Populating genes table:
genes table filled
Populating gene_info_temp table:
gene_info_temp table filled
Populating alias table:
alias table filled
Populating chromosomes table:
chromosomes table filled
Populating pubmed table:
pubmed table filled
Populating refseq table:
refseq table filled
Populating accessions table:
accessions table filled
Populating unigene table:
Dropping GO IDs that are too new for the current GO.db
Dropping GO IDs that are too new for the current GO.db
Dropping GO IDs that are too new for the current GO.db
Populating go_bp table:
go_bp table filled
Populating go_mf table:
go_mf table filled
Populating go_cc table:
go_cc table filled
Populating go_bp_all table:
go_bp_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_cc_all table:
go_cc_all table filled
dropping table gene2pubmeddropping table gene2accessiondropping table
gene2refseqdropping table gene2unigenedropping table gene_infodropping
table gene2go
Making GO views
SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE
t._id=g._id AND t.gene_name NOT NULL
SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE
t._id=g._id AND t.symbol NOT NULL
SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE
t._id=g._id AND t.symbol NOT NULL
SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g
WHERE t._id=g._id AND t.chromosome NOT NULL
SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE
t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE
t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE
t._id=g._id AND t.unigene_id NOT NULL
SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g
WHERE
t._id=g._id AND t.unigene_id NOT NULL
SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g
WHERE
t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g
WHERE t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE
t._id=g._id AND t.alias_symbol NOT NULL
table map_counts filled
Creating package in ./org.Cjejuni.eg.db
[1] TRUE
Warning message:
In .makeSimpleTable(ug, table = "unigene", con) :
no values found for table unigene in this data chunk.
So that built the package, but now we need to install
> install.packages("org.Cjejuni.eg.db", repos = NULL)
* installing *source* package ?org.Cjejuni.eg.db? ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (org.Cjejuni.eg.db)
> library(org.Cjejuni.eg.db)
> head(toTable(org.Cjejuni.egGO))
gene_id go_id Evidence Ontology
1 904332 GO:0006281 IEA BP
2 904332 GO:0030420 IEA BP
3 904333 GO:0006935 IEA BP
4 904333 GO:0007165 IEA BP
5 904334 GO:0006401 IEA BP
6 904335 GO:0006549 IEA BP
Best,
Jim
On Wednesday, February 05, 2014 1:31:22 PM, Joseph Shaw wrote:
> Hi Jim,
>
> Thanks for your reply!
>
> The organism is Campylobacter jejuni (strain: NCTC11168). How can I
> check if this is a viable option?
>
> According to the reference manual for AnnotationForge, the
> makeOrgPackageFromNCBI() function makes an organism package from
> annotations available from NCBI, but, according to the function
> arguments an author and maintainer are required; I'm not sure
exactly
> what this applies to.
>
> Also, the function returns nothing; if this is the case, how can you
> access the created organism package?
>
> Joseph
>
> On Wed, Feb 5, 2014 at 3:51 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote:
>> Hi Joseph,
>>
>> What's the organism? You might be able to create an org-level
package using
>> orgPkgFromNCBI() in the AnnotationForge package.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 2/4/2014 7:09 PM, Joseph Shaw [guest] wrote:
>>>
>>> I am hoping to get appropriate GO mappings for a list of genes
used in a
>>> microarray experiment with a view to identifying significantly
regulated
>>> processes.
>>>
>>> I was planning on using the Bioconductor package GOstats to
identify these
>>> processes; however, the organism under study is not a supported
organism. I
>>> have attempted to use the blast2GO software to generate the gene
to GO
>>> mapping, but this approach seems to be very time consuming (after
generating
>>> the corresponding .fasta files, it took over 1 hour to BLAST just
10 genes).
>>>
>>> Currently, the gene identifiers I am using are simply the gene
names, but
>>> it shouldn't be too difficult to derive a list of corresponding
alternative
>>> identifiers (assuming they are publicly available) should it be
advantageous
>>> to the GO mapping process.
>>>
>>> Is there any faster way to achieve this gene to GO mapping (either
through
>>> Bioconductor packages or otherwise)?
>>>
>>> Any assistance is appreciated.
>>>
>>> Joseph
>>>
>>> -- output of sessionInfo():
>>>
>>> -
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
Hi Jim,
> You can check to see if it is a viable option by just giving it a
shot.
I have attempted to call the makeOrgPackageFromNCBI() as described in
your previous mail (having provided my details for the author and
maintainer arguments); however, the function call doesn't fully
complete. In particular, the steps outline below are completed, but it
appears to make it no further.
> Loading required package: GO.db
>
> Getting data for gene2pubmed.gz
> Loading required package: RCurl
> Loading required package: bitops
> discarding data from other organisms
> Populating gene2pubmed table:
> table gene2pubmed filled
> Getting data for gene2accession.gz
I'm not sure if the function has failed or if the function is still in
the process of completion. Could you tell me, approximately, how long
the function should take to complete? For reference, I'm currently
running OS X with 1.8 GHz processor and 4GB memory.
Joseph
Hi Joseph,
Please don't take conversations off-list.
On Friday, February 07, 2014 9:00:06 PM, Joseph Shaw wrote:
> Hi Jim,
>
> Thanks for all your assistance. I really appreciate it!
>
> Unfortunately, when I attempt to run
>
>> install.packages("org.Cjejuni_0.0.1.tar.gz", repos = NULL, type =
"source")
>
> I get the error warning
>
>> Error : package 'AnnotationDbi' 1.24.0 was found, but >= 1.25.2 is
required by 'org.Cjejuni.eg.db'
>
> I have since attempted to reinstall and update the AnnotationDbi
> package on my system to a compatible iteration, but the process
> results in the same error.
Hmm. Weird. I seem to have one iteration of a devel AnnotationDbi
package in my release BioC install.
You could probably just untar and ungzip that file and then manually
change the DESCRIPTION file to require AnnotationDbi >= 1.24.0 and
then
install using
install.packages("org.Cjejuni.eg.db", type = "source", repos = NULL)
>
> On a separate but related note, is it possible to restrict the list
of
> gene annotations from org.Cjejuni.eg.db used in the GO analysis
(i.e.
> the GSEAGOHyperGParams()* function) to simply include the probes
used
> in the experiment (i.e. create two subsets; a gene universe and a
> collection of genes identified as differentially expressed)?
>
> (*The GSEAGOHyperGParams() function is used in the unuspported model
> organisms vignette, but the author simply uses the entire gene
mapping
> as the gene universe and selects the first 500 genes as
differentially
> expressed; ideally, I would like to include genes in the universe
> based on gene IDs, but this might not be the most efficient way.)
You are reading the wrong vignette. While this is technically a
'unsupported organism', since you have an org package, you can just
use
the regular infrastructure:
> univ <- Lkeys(org.Cjejuni.egACCNUM)
> gns <- univ[sample(1:1670, 100)] ## here I am just selecting genes
at random
> p <- new("GOHyperGParams", geneIds = gns, universeGeneIds = univ,
ontology = "BP", annotation = "org.Cjejuni.eg.db", conditional = TRUE)
> hyp <- hyperGTest(p)
> summary(hyp)
GOBPID Pvalue OddsRatio ExpCount Count Size
Term
1 GO:0012501 0.003677779 Inf 0.1221239 2 2 programmed
cell
death
2 GO:0016265 0.003677779 Inf 0.1221239 2 2
death
I get an infinite odds ratio here because I randomly selected the only
two apoptosis genes on the array. Yay for me!
Best,
Jim
>
> Relevant Vignette:
> http://www.bioconductor.org/packages/devel/bioc/vignettes/GOstats/in
st/doc/GOstatsForUnsupportedOrganisms.pdf
>
> Joseph
>
> On Fri, Feb 7, 2014 at 7:03 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote:
>> See attached.
>>
>>
>> On 2/6/2014 8:32 PM, Joseph Shaw wrote:
>>>
>>> Hi Jim,
>>>
>>>> You can check to see if it is a viable option by just giving it a
shot.
>>>
>>> I have attempted to call the makeOrgPackageFromNCBI() as described
in
>>> your previous mail (having provided my details for the author and
>>> maintainer arguments); however, the function call doesn't fully
>>> complete. In particular, the steps outline below are completed,
but it
>>> appears to make it no further.
>>>
>>>> Loading required package: GO.db
>>>>
>>>> Getting data for gene2pubmed.gz
>>>> Loading required package: RCurl
>>>> Loading required package: bitops
>>>> discarding data from other organisms
>>>> Populating gene2pubmed table:
>>>> table gene2pubmed filled
>>>> Getting data for gene2accession.gz
>>>
>>> I'm not sure if the function has failed or if the function is
still in
>>> the process of completion. Could you tell me, approximately, how
long
>>> the function should take to complete? For reference, I'm currently
>>> running OS X with 1.8 GHz processor and 4GB memory.
>>>
>>> Joseph
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
Hi Joseph,
Here is a newer tarball, build with all release packages so that it
should be able to install properly for you without modification.
And Jim is right, you should be able to proceed from here now that you
have an org package. Basically, we didn't used to have as many tools
for making org packages, so not being supported used to be a much more
serious problem than it is today.
Hope this helps,
Marc
On 02/10/2014 08:44 AM, James W. MacDonald wrote:
> Hi Joseph,
>
> Please don't take conversations off-list.
>
> On Friday, February 07, 2014 9:00:06 PM, Joseph Shaw wrote:
>> Hi Jim,
>>
>> Thanks for all your assistance. I really appreciate it!
>>
>> Unfortunately, when I attempt to run
>>
>>> install.packages("org.Cjejuni_0.0.1.tar.gz", repos = NULL, type =
>>> "source")
>>
>> I get the error warning
>>
>>> Error : package 'AnnotationDbi' 1.24.0 was found, but >= 1.25.2 is
>>> required by 'org.Cjejuni.eg.db'
>>
>> I have since attempted to reinstall and update the AnnotationDbi
>> package on my system to a compatible iteration, but the process
>> results in the same error.
>
> Hmm. Weird. I seem to have one iteration of a devel AnnotationDbi
> package in my release BioC install.
>
> You could probably just untar and ungzip that file and then manually
> change the DESCRIPTION file to require AnnotationDbi >= 1.24.0 and
> then install using
>
> install.packages("org.Cjejuni.eg.db", type = "source", repos = NULL)
>
>
>>
>> On a separate but related note, is it possible to restrict the list
of
>> gene annotations from org.Cjejuni.eg.db used in the GO analysis
(i.e.
>> the GSEAGOHyperGParams()* function) to simply include the probes
used
>> in the experiment (i.e. create two subsets; a gene universe and a
>> collection of genes identified as differentially expressed)?
>>
>> (*The GSEAGOHyperGParams() function is used in the unuspported
model
>> organisms vignette, but the author simply uses the entire gene
mapping
>> as the gene universe and selects the first 500 genes as
differentially
>> expressed; ideally, I would like to include genes in the universe
>> based on gene IDs, but this might not be the most efficient way.)
>
> You are reading the wrong vignette. While this is technically a
> 'unsupported organism', since you have an org package, you can just
> use the regular infrastructure:
>
>> univ <- Lkeys(org.Cjejuni.egACCNUM)
>> gns <- univ[sample(1:1670, 100)] ## here I am just selecting genes
at
>> random
>> p <- new("GOHyperGParams", geneIds = gns, universeGeneIds = univ,
>> ontology = "BP", annotation = "org.Cjejuni.eg.db", conditional =
TRUE)
>> hyp <- hyperGTest(p)
>> summary(hyp)
> GOBPID Pvalue OddsRatio ExpCount Count Size
> Term
> 1 GO:0012501 0.003677779 Inf 0.1221239 2 2 programmed
> cell death
> 2 GO:0016265 0.003677779 Inf 0.1221239 2 2
> death
>
> I get an infinite odds ratio here because I randomly selected the
only
> two apoptosis genes on the array. Yay for me!
>
> Best,
>
> Jim
>
>
>>
>> Relevant Vignette:
>> http://www.bioconductor.org/packages/devel/bioc/vignettes/GOstats/i
nst/doc/GOstatsForUnsupportedOrganisms.pdf
>>
>>
>> Joseph
>>
>> On Fri, Feb 7, 2014 at 7:03 PM, James W. MacDonald <jmacdon at="" uw.edu="">
>> wrote:
>>> See attached.
>>>
>>>
>>> On 2/6/2014 8:32 PM, Joseph Shaw wrote:
>>>>
>>>> Hi Jim,
>>>>
>>>>> You can check to see if it is a viable option by just giving it
a
>>>>> shot.
>>>>
>>>> I have attempted to call the makeOrgPackageFromNCBI() as
described in
>>>> your previous mail (having provided my details for the author and
>>>> maintainer arguments); however, the function call doesn't fully
>>>> complete. In particular, the steps outline below are completed,
but it
>>>> appears to make it no further.
>>>>
>>>>> Loading required package: GO.db
>>>>>
>>>>> Getting data for gene2pubmed.gz
>>>>> Loading required package: RCurl
>>>>> Loading required package: bitops
>>>>> discarding data from other organisms
>>>>> Populating gene2pubmed table:
>>>>> table gene2pubmed filled
>>>>> Getting data for gene2accession.gz
>>>>
>>>> I'm not sure if the function has failed or if the function is
still in
>>>> the process of completion. Could you tell me, approximately, how
long
>>>> the function should take to complete? For reference, I'm
currently
>>>> running OS X with 1.8 GHz processor and 4GB memory.
>>>>
>>>> Joseph
>>>
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: org.Cjejuni.eg.db_0.1.tar.gz
Type: application/x-gzip
Size: 5874006 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140210="" 823b90c3="" attachment-0001.gz="">
Hi all,
Thank you so much. Your assistance has been invaluable!
Joseph
On Mon, Feb 10, 2014 at 7:37 PM, Marc Carlson <mcarlson at="" fhcrc.org="">
wrote:
> Hi Joseph,
>
> Here is a newer tarball, build with all release packages so that it
should
> be able to install properly for you without modification.
>
> And Jim is right, you should be able to proceed from here now that
you have
> an org package. Basically, we didn't used to have as many tools for
making
> org packages, so not being supported used to be a much more serious
problem
> than it is today.
>
> Hope this helps,
>
>
> Marc
>
>
>
>
> On 02/10/2014 08:44 AM, James W. MacDonald wrote:
>>
>> Hi Joseph,
>>
>> Please don't take conversations off-list.
>>
>> On Friday, February 07, 2014 9:00:06 PM, Joseph Shaw wrote:
>>>
>>> Hi Jim,
>>>
>>> Thanks for all your assistance. I really appreciate it!
>>>
>>> Unfortunately, when I attempt to run
>>>
>>>> install.packages("org.Cjejuni_0.0.1.tar.gz", repos = NULL, type =
>>>> "source")
>>>
>>>
>>> I get the error warning
>>>
>>>> Error : package 'AnnotationDbi' 1.24.0 was found, but >= 1.25.2
is
>>>> required by 'org.Cjejuni.eg.db'
>>>
>>>
>>> I have since attempted to reinstall and update the AnnotationDbi
>>> package on my system to a compatible iteration, but the process
>>> results in the same error.
>>
>>
>> Hmm. Weird. I seem to have one iteration of a devel AnnotationDbi
package
>> in my release BioC install.
>>
>> You could probably just untar and ungzip that file and then
manually
>> change the DESCRIPTION file to require AnnotationDbi >= 1.24.0 and
then
>> install using
>>
>> install.packages("org.Cjejuni.eg.db", type = "source", repos =
NULL)
>>
>>
>>>
>>> On a separate but related note, is it possible to restrict the
list of
>>> gene annotations from org.Cjejuni.eg.db used in the GO analysis
(i.e.
>>> the GSEAGOHyperGParams()* function) to simply include the probes
used
>>> in the experiment (i.e. create two subsets; a gene universe and a
>>> collection of genes identified as differentially expressed)?
>>>
>>> (*The GSEAGOHyperGParams() function is used in the unuspported
model
>>> organisms vignette, but the author simply uses the entire gene
mapping
>>> as the gene universe and selects the first 500 genes as
differentially
>>> expressed; ideally, I would like to include genes in the universe
>>> based on gene IDs, but this might not be the most efficient way.)
>>
>>
>> You are reading the wrong vignette. While this is technically a
>> 'unsupported organism', since you have an org package, you can just
use the
>> regular infrastructure:
>>
>>> univ <- Lkeys(org.Cjejuni.egACCNUM)
>>> gns <- univ[sample(1:1670, 100)] ## here I am just selecting genes
at
>>> random
>>> p <- new("GOHyperGParams", geneIds = gns, universeGeneIds = univ,
>>> ontology = "BP", annotation = "org.Cjejuni.eg.db", conditional =
TRUE)
>>> hyp <- hyperGTest(p)
>>> summary(hyp)
>>
>> GOBPID Pvalue OddsRatio ExpCount Count Size
>> Term
>> 1 GO:0012501 0.003677779 Inf 0.1221239 2 2 programmed
cell
>> death
>> 2 GO:0016265 0.003677779 Inf 0.1221239 2 2
death
>>
>> I get an infinite odds ratio here because I randomly selected the
only two
>> apoptosis genes on the array. Yay for me!
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> Relevant Vignette:
>>>
>>> http://www.bioconductor.org/packages/devel/bioc/vignettes/GOstats/
inst/doc/GOstatsForUnsupportedOrganisms.pdf
>>>
>>> Joseph
>>>
>>> On Fri, Feb 7, 2014 at 7:03 PM, James W. MacDonald <jmacdon at="" uw.edu="">
>>> wrote:
>>>>
>>>> See attached.
>>>>
>>>>
>>>> On 2/6/2014 8:32 PM, Joseph Shaw wrote:
>>>>>
>>>>>
>>>>> Hi Jim,
>>>>>
>>>>>> You can check to see if it is a viable option by just giving it
a
>>>>>> shot.
>>>>>
>>>>>
>>>>> I have attempted to call the makeOrgPackageFromNCBI() as
described in
>>>>> your previous mail (having provided my details for the author
and
>>>>> maintainer arguments); however, the function call doesn't fully
>>>>> complete. In particular, the steps outline below are completed,
but it
>>>>> appears to make it no further.
>>>>>
>>>>>> Loading required package: GO.db
>>>>>>
>>>>>> Getting data for gene2pubmed.gz
>>>>>> Loading required package: RCurl
>>>>>> Loading required package: bitops
>>>>>> discarding data from other organisms
>>>>>> Populating gene2pubmed table:
>>>>>> table gene2pubmed filled
>>>>>> Getting data for gene2accession.gz
>>>>>
>>>>>
>>>>> I'm not sure if the function has failed or if the function is
still in
>>>>> the process of completion. Could you tell me, approximately, how
long
>>>>> the function should take to complete? For reference, I'm
currently
>>>>> running OS X with 1.8 GHz processor and 4GB memory.
>>>>>
>>>>> Joseph
>>>>
>>>>
>>>>
>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> University of Washington
>>>> Environmental and Occupational Health Sciences
>>>> 4225 Roosevelt Way NE, # 100
>>>> Seattle WA 98105-6099
>>>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor