remove NA from named character vector

0

Entering edit mode

Iain Gallagher ▴ 930

@iain-gallagher-2532

Last seen 9.8 years ago

United Kingdom

Hi List This is likely a trivial problem but it's annoying me. I am mapping from Bos taurus ensembl ids to symbols. I can do this in biomaRt but use of the org.Bt.eg.db package means I'm not tied to an internet connection. A toy example: library(org.Bt.eg.db) ens <- c('ENSBTAG00000004218', 'ENSBTAG00000004270', 'ENSBTAG00000004578', 'ENSBTAG00000004608') egs <- unlist(mget(ens, revmap(org.Bt.egENSEMBL), ifnotfound=NA)) egs ENSBTAG00000004218 ENSBTAG00000004270 ENSBTAG00000004578 ENSBTAG00000004608 "617660" "407106" NA "100138951" # a named character vector with one NA #now get symbols syms <- unlist(mget(egs, org.Bt.egSYMBOL, ifnotfound=NA)) #throws and error - fair enough - need to drop the NA which(egs == NA) #gives named integer(0) - hmm class(egs) #gives [1] "character" - so I'm quite confused now. NA %in% egs #gives [1] TRUE How do I identify which entries in 'egs' are NA so I can remove them? It's trivial here but the dataset I'm working with is in the thousands. Thanks iain > sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Bt.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5 [4] AnnotationDbi_1.14.1 Biobase_2.10.0

Bos taurus biomaRt Bos taurus biomaRt • 2.2k views

ADD COMMENT • link 13.8 years ago Iain Gallagher ▴ 930

0

Entering edit mode

Iain Gallagher ▴ 930

@iain-gallagher-2532

Last seen 9.8 years ago

United Kingdom

Hi Axel I'm sure I knew that! Leaky brain! Thanks i --- On Fri, 22/7/11, axel.klenk at actelion.com <axel.klenk at="" actelion.com=""> wrote: > From: axel.klenk at actelion.com <axel.klenk at="" actelion.com=""> > Subject: Re: [BioC] remove NA from named character vector > To: "Iain Gallagher" <iaingallagher at="" btopenworld.com=""> > Cc: "bioconductor" <bioconductor at="" stat.math.ethz.ch="">, bioconductor-bounces at r-project.org > Date: Friday, 22 July, 2011, 12:11 > Hi Iain, > > you cannot test for NA using the == operator, you'll have > to use is.na(), > eg. > > whichis.na(egs)) > > or, if you just want to get rid of them: > > na.omit(egs) > > HTH, > > - axel > > > Axel Klenk > Research Informatician > Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 > Allschwil / > Switzerland > > > > > From: > Iain Gallagher <iaingallagher at="" btopenworld.com=""> > To: > bioconductor <bioconductor at="" stat.math.ethz.ch=""> > Date: > 22.07.2011 13:03 > Subject: > [BioC] remove NA from named character vector > Sent by: > bioconductor-bounces at r-project.org > > > > Hi List > > This is likely a trivial problem but it's annoying me. I am > mapping from > Bos taurus ensembl ids to symbols. I can do this in biomaRt > but use of the > org.Bt.eg.db package means I'm not tied to an internet > connection. > > A toy example: > > library(org.Bt.eg.db) > ens <- c('ENSBTAG00000004218', 'ENSBTAG00000004270', > 'ENSBTAG00000004578', > 'ENSBTAG00000004608') > egs <- unlist(mget(ens, revmap(org.Bt.egENSEMBL), > ifnotfound=NA)) > > egs > > ENSBTAG00000004218 ENSBTAG00000004270 ENSBTAG00000004578 > ENSBTAG00000004608 > ? ? ? ? ? "617660"? ? > ? ? ???"407106"? ? ? > ? ? ? ? ???NA "100138951" > > > # a named character vector with one NA > > #now get symbols > syms <- unlist(mget(egs, org.Bt.egSYMBOL, > ifnotfound=NA)) > > #throws and error - fair enough - need to drop the NA > > which(egs == NA) > > #gives named integer(0) - hmm > class(egs) > #gives [1] "character" - so I'm quite confused now. > > NA %in% egs > #gives [1] TRUE > > > How do I identify which entries in 'egs' are NA so I can > remove them? It's > trivial here but the dataset I'm working with is in the > thousands. > > Thanks > > iain > > > sessionInfo() > R version 2.13.1 (2011-07-08) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.utf8? ? > ???LC_NUMERIC=C > [3] LC_TIME=en_GB.utf8? ? ? ? > LC_COLLATE=en_GB.utf8 > [5] LC_MONETARY=C? ? ? ? ? > ???LC_MESSAGES=en_GB.utf8 > [7] LC_PAPER=en_GB.utf8? ? > ???LC_NAME=C > [9] LC_ADDRESS=C? ? ? ? ? ? > ? LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats? ???graphics? grDevices > utils? ???datasets? > methods???base > > other attached packages: > [1] org.Bt.eg.db_2.5.0???RSQLite_0.9-4? > ? ? ? DBI_0.2-5 > [4] AnnotationDbi_1.14.1 Biobase_2.10.0 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > The information of this email and in any file transmitted > with it is strictly confidential and may be legally > privileged. > It is intended solely for the addressee. If you are not the > intended recipient, any copying, distribution or any other > use of this email is prohibited and may be unlawful. In such > case, you should please notify the sender immediately and > destroy this email. > The content of this email is not legally binding unless > confirmed by letter. > Any views expressed in this message are those of the > individual sender, except where the message states otherwise > and the sender is authorised to state them to be the views > of the sender's company. For further information about > Actelion please see our website at http://www.actelion.com > > >

ADD COMMENT • link 13.8 years ago Iain Gallagher ▴ 930

0

Entering edit mode

Axel Klenk ★ 1.1k

@axel-klenk-3224

Last seen 15 minutes ago

UPF, Barcelona, Spain

Hi Iain, you cannot test for NA using the == operator, you'll have to use is.na(), eg. whichis.na(egs)) or, if you just want to get rid of them: na.omit(egs) HTH, - axel Axel Klenk Research Informatician Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / Switzerland From: Iain Gallagher <iaingallagher at="" btopenworld.com=""> To: bioconductor <bioconductor at="" stat.math.ethz.ch=""> Date: 22.07.2011 13:03 Subject: [BioC] remove NA from named character vector Sent by: bioconductor-bounces at r-project.org Hi List This is likely a trivial problem but it's annoying me. I am mapping from Bos taurus ensembl ids to symbols. I can do this in biomaRt but use of the org.Bt.eg.db package means I'm not tied to an internet connection. A toy example: library(org.Bt.eg.db) ens <- c('ENSBTAG00000004218', 'ENSBTAG00000004270', 'ENSBTAG00000004578', 'ENSBTAG00000004608') egs <- unlist(mget(ens, revmap(org.Bt.egENSEMBL), ifnotfound=NA)) egs ENSBTAG00000004218 ENSBTAG00000004270 ENSBTAG00000004578 ENSBTAG00000004608 "617660" "407106" NA "100138951" # a named character vector with one NA #now get symbols syms <- unlist(mget(egs, org.Bt.egSYMBOL, ifnotfound=NA)) #throws and error - fair enough - need to drop the NA which(egs == NA) #gives named integer(0) - hmm class(egs) #gives [1] "character" - so I'm quite confused now. NA %in% egs #gives [1] TRUE How do I identify which entries in 'egs' are NA so I can remove them? It's trivial here but the dataset I'm working with is in the thousands. Thanks iain > sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Bt.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5 [4] AnnotationDbi_1.14.1 Biobase_2.10.0 _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email. The content of this email is not legally binding unless confirmed by letter. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com

ADD COMMENT • link 13.8 years ago Axel Klenk ★ 1.1k

Login before adding your answer.