Hi all,
I'm using reportingTools following RNAseq differential analysis and am having trouble with adding annotation for GO using unsupported model organism (ie sheep, org.Oa.eg.db).
Ovis aries (sheep) is not supported by "AnnotationForge" package, so I'm following the instructions from "How To Use GOstats and Category to do Hypergeometric testing with unsupported model organisms by M. Carlson".
I obtained the following error when I'm trying to build GOFrame (with yellow marker in the code below). This seems to be cause by "<NA>" .
Can someone help me to deal with this error ?
Thanks in advance
Carine
> rm(list=ls())
> library("GOstats", lib.loc="~/R/win-library/3.3")
> hub<-AnnotationHub()
snapshotDate(): 2016-06-06
> query(hub,c("Ovis aries","OrgDb"))
AnnotationHub with 1 record
# snapshotDate(): 2016-06-06
# names(): AH48021
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Ovis aries
# $rdataclass: OrgDb
# $title: org.Ovis_aries.eg.sqlite
# $description: NCBI gene ID based annotations about Ovis_aries
# $taxonomyid: 9940
# $genome: NCBI genomes
# $sourcetype: NCBI/UniProt
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.uniprot.org/pub/databases/unip...
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: NCBI, Gene, Annotation
# retrieve record with 'object[["AH48021"]]'
> sheep<-hub[["AH48021"]]
loading from cache 'C:/Users/cagenet/Documents/AppData/.AnnotationHub/54327'
> keytypes(sheep)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENTREZID" "EVIDENCE" "EVIDENCEALL"
[7] "GENENAME" "GID" "GO" "GOALL" "ONTOLOGY" "ONTOLOGYALL"
[13] "PMID" "REFSEQ" "SYMBOL" "UNIGENE"
> columns(sheep)
[1] "ACCNUM" "ALIAS" "CHR" "ENSEMBL" "ENTREZID" "EVIDENCE"
[7] "EVIDENCEALL" "GENENAME" "GID" "GO" "GOALL" "ONTOLOGY"
[13] "ONTOLOGYALL" "PMID" "REFSEQ" "SYMBOL" "UNIGENE"
> sheepEID<-(keys(sheep,"ENTREZID"))# all ENTREZID
> sheep.eg.GO<-select(sheep, sheepEID, c("GO","EVIDENCE"),"ENTREZID")
'select()' returned 1:many mapping between keys and columns
> goframeData=data.frame(sheep.eg.GO[,c(2,3,1)])#on inverse l'ordre
> head(goframeData)
GO EVIDENCE ENTREZID
1 <NA> <NA> 100034665
2 GO:0030669 IEA 100034666
3 GO:0010008 IEA 100034666
4 GO:0033162 IEA 100034666
5 GO:0005507 IEA 100034666
6 GO:0016716 IEA 100034666
> goFrame=GOFrame(goframeData,organism="Ovis aries")
Error in .testGOFrame(x, organism) : invalid GO Evidence codes: 'NA'
> goframeData<- goframeData[-which(row(goframeData)=="\<NA>"),]
Error: '\<' is an unrecognized escape in character string starting ""\<"
> goframeData<- goframeData[-which(row(goframeData)=="/<NA>"),]
>
> head(goframeData)
[1] GO EVIDENCE ENTREZID
<0 lignes> (ou 'row.names' de longueur nulle)
> sessionInfo()
R version 3.3.1 RC (2016-06-17 r70798)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GO.db_3.3.0 RSQLite_1.0.0 DBI_0.4-1
[4] AnnotationForge_1.14.2 org.Mm.eg.db_3.3.0 AnnotationHub_2.4.2
[7] GOstats_2.38.1 graph_1.50.0 Category_2.38.0
[10] Matrix_1.2-6 AnnotationDbi_1.34.3 edgeR_3.14.0
[13] limma_3.28.14 HTSFilter_1.12.0 ReportingTools_2.12.2
[16] knitr_1.13 biomaRt_2.28.0 DESeq2_1.12.3
[19] SummarizedExperiment_1.2.3 Biobase_2.32.0 GenomicRanges_1.24.2
[22] GenomeInfoDb_1.8.2 IRanges_2.6.1 S4Vectors_0.10.1
[25] BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 splines_3.3.1 R.utils_2.3.0
[4] Formula_1.2-1 shiny_0.13.2 interactiveDisplayBase_1.10.3
[7] latticeExtra_0.6-28 RBGL_1.48.1 BSgenome_1.40.1
[10] Rsamtools_1.24.0 lattice_0.20-33 biovizBase_1.20.0
[13] chron_2.3-47 digest_0.6.9 RColorBrewer_1.1-2
[16] XVector_0.12.0 colorspace_1.2-6 ggbio_1.20.1
[19] R.oo_1.20.0 htmltools_0.3.5 httpuv_1.3.3
[22] plyr_1.8.4 OrganismDbi_1.14.1 GSEABase_1.34.0
[25] XML_3.98-1.4 genefilter_1.54.2 zlibbioc_1.18.0
[28] xtable_1.8-2 scales_0.4.0 BiocParallel_1.6.2
[31] annotate_1.50.0 ggplot2_2.1.0 PFAM.db_3.3.0
[34] GenomicFeatures_1.24.3 nnet_7.3-12 mime_0.4
[37] survival_2.39-5 magrittr_1.5 evaluate_0.9
[40] R.methodsS3_1.7.1 GGally_1.2.0 hwriter_1.3.2
[43] foreign_0.8-66 BiocInstaller_1.22.3 rsconnect_0.4.3
[46] tools_3.3.1 data.table_1.9.6 stringr_1.0.0
[49] munsell_0.4.3 locfit_1.5-9.1 cluster_2.0.4
[52] ensembldb_1.4.7 Biostrings_2.40.2 DESeq_1.24.0
[55] grid_3.3.1 RCurl_1.95-4.8 dichromat_2.0-0
[58] VariantAnnotation_1.18.1 bitops_1.0-6 gtable_0.2.0
[61] curl_0.9.7 reshape_0.8.5 reshape2_1.4.1
[64] R6_2.1.2 GenomicAlignments_1.8.3 gridExtra_2.2.1
[67] rtracklayer_1.32.1 Hmisc_3.17-4 stringi_1.1.1
[70] Rcpp_0.12.5 geneplotter_1.50.0 rpart_4.1-10
[73] acepack_1.3-3.3
Yes, I know but my problem is that I don't know how (i'm newbie). So I tried
or
goframeData<- goframeData[-which(goframeData$EVIDENCE=="<NA>"),]
But my goframeData is empty.
Finally I proceed in a different way (download file, edit with excel and import again).
Carine
There are two lessons here. First, when you ask a question, be sure that are as precise as possible. You asked for help with the error, rather than saying that you understand that there are NA values, but don't know how to get rid of them. So I answered the question you posed, rather than the question you had.
Second, in R, nothing is equal to NA (you cannot be equal to something that is not available), so there is a function
is.na
, that you can use to test for NA values.But this is a bit of a chicken/egg problem, no? How do you figure out what you need if you don't know what you need? There are two methods in my experience that are useful in this context. The first is
apropos
, which will give you all the functions that match what you query on. Do note that this is one of the rare instances where R is case-insensitive (so you can search on "na", "NA","Na" or even "nA" if you want to be crazy), and also that 'na' is going to match many things, so it may be a bit of an eyeballometric exercise. For legibility, I am going to cut out all the extra returned values.So hypothetically you could have used
apropos
to findis.na
, and went with that.An even more powerful method is to use Google. Almost any conceivable R question has already been asked and answered somewhere on the line, and a query of the form 'R remove NA rows' will almost surely come up with multiple reasonable links.
Hi James,
You were right, my problem is that I was not precise as possible. thank for your advice. I didn't know for the useful "apropos" . I do used Google and found some answers BUT as I'm neewbie, I think it was written <NA> and not NA alone. Of course, now everything work fine except that I lost my time and yours...Sorry