Question

goseq/nullp with non-native identifiers

1

Entering edit mode

Ravi Karra ▴ 140

@ravi-karra-4463

Last seen 10.7 years ago

Hello, I am trying to use goseq to find enriched GO terms for zebrafish RNA- seq data and am looking for advice on manually providing gene length information and GO annotation to goseq. My RNA-Seq data is mapped to danRer7 Ensembl gene id's. Unfortunately danRer7 does not appear to be supported by goeqs's built-ins for ensembl gene ids. > supportedGenomes () [68,] db species date name AvailableGeneIDs 68 danRer7 Zebrafish Jul. 2010 Sanger Institute Zv9 > pwf = nullp(gene.vector, "danRer7", "ensGene") Error in getlength(names(DEgenes), genome, id) : Length information for genome danRer7 and gene ID ensGene is not in the geneLenDataBase database. You will have to specify bias.data manually. I would like to manually supply the gene length information by: > zv9txs = makeTranscriptDbFromBiomart (biomart ="ensembl", dataset = "drerio_gene_ensembl") > txsByGene=transcriptsBy(zv9txs,"gene") > lengthData=median(width(txsByGene)) and GO Data (using biomaRt): > zv9 = useDataset("drerio_gene_ensembl",mart=useMart ("ensembl")) > GOmap = getBM (filters = "ensembl_gene_id", attributes = c("ensembl_gene_id", "go_id"), values = gene.universe, mart = zv9) How can I input this GO Data and gene length data into the nullp function of goseq to calculate a probability weighting function? Thanks and sessionInfo() below, Ravi > sessionInfo () R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.8.3 AnnotationDbi_1.18.1 Biobase_2.16.0 GenomicRanges_1.8.13 [5] IRanges_1.14.4 BiocGenerics_0.2.0 goseq_1.8.0 geneLenDataBase_0.99.9 [9] BiasedUrn_1.04 biomaRt_2.12.0 loaded via a namespace (and not attached): [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 DBI_0.2-5 grid_2.15.1 [6] hwriter_1.3 lattice_0.20-10 Matrix_1.0-6 mgcv_1.7-20 nlme_3.1-104 [11] RCurl_1.91-1 Rsamtools_1.8.6 RSQLite_0.11.1 rtracklayer_1.16.3 ShortRead_1.14.4 [16] stats4_2.15.1 tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 > [[alternative HTML version deleted]]

Annotation GO zebrafish goseq Annotation GO zebrafish goseq • 3.3k views

ADD COMMENT • link updated 12.7 years ago by Alicia Oshlack ▴ 100 • written 12.7 years ago by Ravi Karra ▴ 140

score 0 · Answer 1 · 2012-09-04

Hi Ravi, You can use your own length data and GO categories by: pwf=nullp(gene.vector,bias.data=lengthData) go=goseq(pwf,gene2cat=GOmap) Cheers, Alicia On 3/09/12 8:00 PM, "bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote: > Date: Sun, 2 Sep 2012 09:39:48 -0400 > From: Ravi Karra <ravi.karra at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] goseq/nullp with non-native identifiers > Message-ID: <1446F9C1-DB8C-4F0F-BB7A-ABE4AA47A64A at gmail.com> > Content-Type: text/plain > > Hello, > > I am trying to use goseq to find enriched GO terms for zebrafish RNA-seq data > and am looking for advice on manually providing gene length information and GO > annotation to goseq. My RNA-Seq data is mapped to danRer7 Ensembl gene > id's. Unfortunately danRer7 does not appear to be supported by goeqs's > built-ins for ensembl gene ids. > >> supportedGenomes () [68,] > db species date name AvailableGeneIDs > 68 danRer7 Zebrafish Jul. 2010 Sanger Institute Zv9 > >> pwf = nullp(gene.vector, "danRer7", "ensGene") > Error in getlength(names(DEgenes), genome, id) : > Length information for genome danRer7 and gene ID ensGene is not in the > geneLenDataBase database. You will have to specify bias.data manually. > > I would like to manually supply the gene length information by: > >> zv9txs = makeTranscriptDbFromBiomart (biomart ="ensembl", dataset = >> "drerio_gene_ensembl") >> txsByGene=transcriptsBy(zv9txs,"gene") >> lengthData=median(width(txsByGene)) > > and GO Data (using biomaRt): > >> zv9 = useDataset("drerio_gene_ensembl",mart=useMart ("ensembl")) >> GOmap = getBM (filters = "ensembl_gene_id", attributes = c("ensembl_gene_id", >> "go_id"), values = gene.universe, mart = zv9) > > How can I input this GO Data and gene length data into the nullp function of > goseq to calculate a probability weighting function? > > Thanks and sessionInfo() below, > > Ravi > >> sessionInfo () > R version 2.15.1 (2012-06-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicFeatures_1.8.3 AnnotationDbi_1.18.1 Biobase_2.16.0 > GenomicRanges_1.8.13 > [5] IRanges_1.14.4 BiocGenerics_0.2.0 goseq_1.8.0 > geneLenDataBase_0.99.9 > [9] BiasedUrn_1.04 biomaRt_2.12.0 > > loaded via a namespace (and not attached): > [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 DBI_0.2-5 > grid_2.15.1 > [6] hwriter_1.3 lattice_0.20-10 Matrix_1.0-6 mgcv_1.7-20 > nlme_3.1-104 > [11] RCurl_1.91-1 Rsamtools_1.8.6 RSQLite_0.11.1 > rtracklayer_1.16.3 ShortRead_1.14.4 > [16] stats4_2.15.1 tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 >> > [[alternative HTML version deleted]] ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com