goseq/nullp with non-native identifiers
1
1
Entering edit mode
Ravi Karra ▴ 140
@ravi-karra-4463
Last seen 10.4 years ago
Hello, I am trying to use goseq to find enriched GO terms for zebrafish RNA- seq data and am looking for advice on manually providing gene length information and GO annotation to goseq. My RNA-Seq data is mapped to danRer7 Ensembl gene id's. Unfortunately danRer7 does not appear to be supported by goeqs's built-ins for ensembl gene ids. > supportedGenomes () [68,] db species date name AvailableGeneIDs 68 danRer7 Zebrafish Jul. 2010 Sanger Institute Zv9 > pwf = nullp(gene.vector, "danRer7", "ensGene") Error in getlength(names(DEgenes), genome, id) : Length information for genome danRer7 and gene ID ensGene is not in the geneLenDataBase database. You will have to specify bias.data manually. I would like to manually supply the gene length information by: > zv9txs = makeTranscriptDbFromBiomart (biomart ="ensembl", dataset = "drerio_gene_ensembl") > txsByGene=transcriptsBy(zv9txs,"gene") > lengthData=median(width(txsByGene)) and GO Data (using biomaRt): > zv9 = useDataset("drerio_gene_ensembl",mart=useMart ("ensembl")) > GOmap = getBM (filters = "ensembl_gene_id", attributes = c("ensembl_gene_id", "go_id"), values = gene.universe, mart = zv9) How can I input this GO Data and gene length data into the nullp function of goseq to calculate a probability weighting function? Thanks and sessionInfo() below, Ravi > sessionInfo () R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.8.3 AnnotationDbi_1.18.1 Biobase_2.16.0 GenomicRanges_1.8.13 [5] IRanges_1.14.4 BiocGenerics_0.2.0 goseq_1.8.0 geneLenDataBase_0.99.9 [9] BiasedUrn_1.04 biomaRt_2.12.0 loaded via a namespace (and not attached): [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 DBI_0.2-5 grid_2.15.1 [6] hwriter_1.3 lattice_0.20-10 Matrix_1.0-6 mgcv_1.7-20 nlme_3.1-104 [11] RCurl_1.91-1 Rsamtools_1.8.6 RSQLite_0.11.1 rtracklayer_1.16.3 ShortRead_1.14.4 [16] stats4_2.15.1 tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 > [[alternative HTML version deleted]]
Annotation GO zebrafish goseq Annotation GO zebrafish goseq • 3.2k views
ADD COMMENT
0
Entering edit mode
@alicia-oshlack-4634
Last seen 10.4 years ago
Hi Ravi, You can use your own length data and GO categories by: pwf=nullp(gene.vector,bias.data=lengthData) go=goseq(pwf,gene2cat=GOmap) Cheers, Alicia On 3/09/12 8:00 PM, "bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote: > Date: Sun, 2 Sep 2012 09:39:48 -0400 > From: Ravi Karra <ravi.karra at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] goseq/nullp with non-native identifiers > Message-ID: <1446F9C1-DB8C-4F0F-BB7A-ABE4AA47A64A at gmail.com> > Content-Type: text/plain > > Hello, > > I am trying to use goseq to find enriched GO terms for zebrafish RNA-seq data > and am looking for advice on manually providing gene length information and GO > annotation to goseq. My RNA-Seq data is mapped to danRer7 Ensembl gene > id's. Unfortunately danRer7 does not appear to be supported by goeqs's > built-ins for ensembl gene ids. > >> supportedGenomes () [68,] > db species date name AvailableGeneIDs > 68 danRer7 Zebrafish Jul. 2010 Sanger Institute Zv9 > >> pwf = nullp(gene.vector, "danRer7", "ensGene") > Error in getlength(names(DEgenes), genome, id) : > Length information for genome danRer7 and gene ID ensGene is not in the > geneLenDataBase database. You will have to specify bias.data manually. > > I would like to manually supply the gene length information by: > >> zv9txs = makeTranscriptDbFromBiomart (biomart ="ensembl", dataset = >> "drerio_gene_ensembl") >> txsByGene=transcriptsBy(zv9txs,"gene") >> lengthData=median(width(txsByGene)) > > and GO Data (using biomaRt): > >> zv9 = useDataset("drerio_gene_ensembl",mart=useMart ("ensembl")) >> GOmap = getBM (filters = "ensembl_gene_id", attributes = c("ensembl_gene_id", >> "go_id"), values = gene.universe, mart = zv9) > > How can I input this GO Data and gene length data into the nullp function of > goseq to calculate a probability weighting function? > > Thanks and sessionInfo() below, > > Ravi > >> sessionInfo () > R version 2.15.1 (2012-06-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicFeatures_1.8.3 AnnotationDbi_1.18.1 Biobase_2.16.0 > GenomicRanges_1.8.13 > [5] IRanges_1.14.4 BiocGenerics_0.2.0 goseq_1.8.0 > geneLenDataBase_0.99.9 > [9] BiasedUrn_1.04 biomaRt_2.12.0 > > loaded via a namespace (and not attached): > [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 DBI_0.2-5 > grid_2.15.1 > [6] hwriter_1.3 lattice_0.20-10 Matrix_1.0-6 mgcv_1.7-20 > nlme_3.1-104 > [11] RCurl_1.91-1 Rsamtools_1.8.6 RSQLite_0.11.1 > rtracklayer_1.16.3 ShortRead_1.14.4 > [16] stats4_2.15.1 tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 >> > [[alternative HTML version deleted]] ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
ADD COMMENT

Login before adding your answer.

Traffic: 812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6