Rkeys function from AnnotationDbi returns all Rkeys for a subset
2
0
Entering edit mode
Leon Yee ▴ 110
@leon-yee-3088
Last seen 10.4 years ago
Dear all, I encountered a problem when I using Rkeys from AnnotationDbi package. Using [] I get a subset of a AnnDbBimap object, but when I using Rkeys for this subset, it returns all of the Rkeys from the original set. > library("hgu95av2.db") > ids = ls(hgu95av2PATH) > as.list(hgu95av2PATH[ids[1:5]]) $`1000_at` [1] "04010" "04012" "04150" "04350" "04360" "04370" "04510" "04520" "04540" [10] "04620" "04650" "04664" "04720" "04730" "04810" "04910" "04912" "04916" [19] "04930" "05210" "05211" "05212" "05213" "05214" "05215" "05216" "05218" [28] "05219" "05220" "05221" "05223" $`1001_at` [1] NA $`1002_f_at` [1] "00590" "00591" "00830" "00902" "00903" "00980" "00982" $`1003_s_at` [1] "04060" $`1004_at` [1] "04060" > Rkeys(hgu95av2PATH[ids[1:5]]) # this is expected to output only the PATH ids as outputed of the # previous command, but it outputed all the PATH ids. [1] "00010" "00020" "00030" "00031" "00040" "00051" "00052" "00053" "00061" [10] "00062" "00071" "00072" "00100" "00120" "00130" "00140" "00150" "00190" [19] "00220" "00230" "00232" "00240" "00251" "00252" "00260" "00271" "00272" [28] "00280" "00281" "00290" "00300" "00310" "00330" "00340" "00350" "00360" [37] "00361" "00363" "00380" "00400" "00401" "00410" "00430" "00440" "00450" [46] "00460" "00471" "00472" "00480" "00500" "00510" "00511" "00512" "00520" [55] "00521" "00530" "00531" "00532" "00533" "00534" "00550" "00561" "00562" [64] "00563" "00564" "00565" "00590" "00591" "00592" "00600" "00601" "00602" [73] "00603" "00604" "00620" "00624" "00625" "00630" "00632" "00640" "00641" [82] "00643" "00650" "00660" "00670" "00680" "00710" "00720" "00730" "00740" [91] "00750" "00760" "00770" "00780" "00785" "00790" "00791" "00830" "00860" [100] "00900" "00902" "00903" "00910" "00920" "00930" "00940" "00950" "00960" [109] "00970" "00980" "00982" "00983" "01030" "01031" "01032" "01040" "01430" [118] "01510" "02010" "03010" "03020" "03022" "03030" "03050" "03060" "03320" [127] "03410" "03420" "03430" "03440" "03450" "04010" "04012" "04020" "04060" [136] "04070" "04080" "04110" "04115" "04120" "04130" "04140" "04150" "04210" [145] "04310" "04330" "04340" "04350" "04360" "04370" "04510" "04512" "04514" [154] "04520" "04530" "04540" "04610" "04612" "04614" "04620" "04630" "04640" [163] "04650" "04660" "04662" "04664" "04670" "04710" "04720" "04730" "04740" [172] "04742" "04810" "04910" "04912" "04916" "04920" "04930" "04940" "04950" [181] "05010" "05020" "05030" "05040" "05050" "05060" "05110" "05120" "05130" [190] "05131" "05210" "05211" "05212" "05213" "05214" "05215" "05216" "05217" [199] "05218" "05219" "05220" "05221" "05222" "05223" "05310" "05320" "05322" [208] "05330" "05332" "05340" > sessionInfo() R version 2.8.0 (2008-10-20) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY =C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHON E=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] hgu95av2.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4 [4] AnnotationDbi_1.4.1 Biobase_2.2.1 Can anybody help? Thanks a lot! Leon
hgu95av2 hgu95av2 • 1.7k views
ADD COMMENT
0
Entering edit mode
laurent ▴ 140
@laurent-3110
Last seen 9.9 years ago
United States
Wouldn't mappedRKeys() be what you are looking for ? L. Leon Yee wrote: > Dear all, > > I encountered a problem when I using Rkeys from AnnotationDbi > package. Using [] I get a subset of a AnnDbBimap object, but when I > using Rkeys for this subset, it returns all of the Rkeys from the > original set. > >> library("hgu95av2.db") >> ids = ls(hgu95av2PATH) >> as.list(hgu95av2PATH[ids[1:5]]) > $`1000_at` > [1] "04010" "04012" "04150" "04350" "04360" "04370" "04510" "04520" "04540" > [10] "04620" "04650" "04664" "04720" "04730" "04810" "04910" "04912" "04916" > [19] "04930" "05210" "05211" "05212" "05213" "05214" "05215" "05216" "05218" > [28] "05219" "05220" "05221" "05223" > > $`1001_at` > [1] NA > > $`1002_f_at` > [1] "00590" "00591" "00830" "00902" "00903" "00980" "00982" > > $`1003_s_at` > [1] "04060" > > $`1004_at` > [1] "04060" > >> Rkeys(hgu95av2PATH[ids[1:5]]) > # this is expected to output only the PATH ids as outputed of the > # previous command, but it outputed all the PATH ids. > > [1] "00010" "00020" "00030" "00031" "00040" "00051" "00052" "00053" > "00061" > [10] "00062" "00071" "00072" "00100" "00120" "00130" "00140" "00150" > "00190" > [19] "00220" "00230" "00232" "00240" "00251" "00252" "00260" "00271" > "00272" > [28] "00280" "00281" "00290" "00300" "00310" "00330" "00340" "00350" > "00360" > [37] "00361" "00363" "00380" "00400" "00401" "00410" "00430" "00440" > "00450" > [46] "00460" "00471" "00472" "00480" "00500" "00510" "00511" "00512" > "00520" > [55] "00521" "00530" "00531" "00532" "00533" "00534" "00550" "00561" > "00562" > [64] "00563" "00564" "00565" "00590" "00591" "00592" "00600" "00601" > "00602" > [73] "00603" "00604" "00620" "00624" "00625" "00630" "00632" "00640" > "00641" > [82] "00643" "00650" "00660" "00670" "00680" "00710" "00720" "00730" > "00740" > [91] "00750" "00760" "00770" "00780" "00785" "00790" "00791" "00830" > "00860" > [100] "00900" "00902" "00903" "00910" "00920" "00930" "00940" "00950" > "00960" > [109] "00970" "00980" "00982" "00983" "01030" "01031" "01032" "01040" > "01430" > [118] "01510" "02010" "03010" "03020" "03022" "03030" "03050" "03060" > "03320" > [127] "03410" "03420" "03430" "03440" "03450" "04010" "04012" "04020" > "04060" > [136] "04070" "04080" "04110" "04115" "04120" "04130" "04140" "04150" > "04210" > [145] "04310" "04330" "04340" "04350" "04360" "04370" "04510" "04512" > "04514" > [154] "04520" "04530" "04540" "04610" "04612" "04614" "04620" "04630" > "04640" > [163] "04650" "04660" "04662" "04664" "04670" "04710" "04720" "04730" > "04740" > [172] "04742" "04810" "04910" "04912" "04916" "04920" "04930" "04940" > "04950" > [181] "05010" "05020" "05030" "05040" "05050" "05060" "05110" "05120" > "05130" > [190] "05131" "05210" "05211" "05212" "05213" "05214" "05215" "05216" > "05217" > [199] "05218" "05219" "05220" "05221" "05222" "05223" "05310" "05320" > "05322" > [208] "05330" "05332" "05340" > >> sessionInfo() > R version 2.8.0 (2008-10-20) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETA RY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPH ONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] hgu95av2.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4 > [4] AnnotationDbi_1.4.1 Biobase_2.2.1 > > > Can anybody help? Thanks a lot! > > Leon > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Leon, Leon Yee wrote: > Dear all, > > I encountered a problem when I using Rkeys from AnnotationDbi > package. Using [] I get a subset of a AnnDbBimap object, but when I > using Rkeys for this subset, it returns all of the Rkeys from the > original set. I don't believe Rkeys() is intended to return a subset - it is intended to give you all the right keys of the object. But I am not sure why you want to use Rkeys() to do what you have already done using as.list(hgu95av2PATH[ids]). Perhaps I misunderstand? Another way is to use mget() as always: mget(ids[1:5], hgu95av2PATH) Best, Jim > >> library("hgu95av2.db") >> ids = ls(hgu95av2PATH) >> as.list(hgu95av2PATH[ids[1:5]]) > $`1000_at` > [1] "04010" "04012" "04150" "04350" "04360" "04370" "04510" "04520" "04540" > [10] "04620" "04650" "04664" "04720" "04730" "04810" "04910" "04912" "04916" > [19] "04930" "05210" "05211" "05212" "05213" "05214" "05215" "05216" "05218" > [28] "05219" "05220" "05221" "05223" > > $`1001_at` > [1] NA > > $`1002_f_at` > [1] "00590" "00591" "00830" "00902" "00903" "00980" "00982" > > $`1003_s_at` > [1] "04060" > > $`1004_at` > [1] "04060" > >> Rkeys(hgu95av2PATH[ids[1:5]]) > # this is expected to output only the PATH ids as outputed of the > # previous command, but it outputed all the PATH ids. > > [1] "00010" "00020" "00030" "00031" "00040" "00051" "00052" "00053" > "00061" > [10] "00062" "00071" "00072" "00100" "00120" "00130" "00140" "00150" > "00190" > [19] "00220" "00230" "00232" "00240" "00251" "00252" "00260" "00271" > "00272" > [28] "00280" "00281" "00290" "00300" "00310" "00330" "00340" "00350" > "00360" > [37] "00361" "00363" "00380" "00400" "00401" "00410" "00430" "00440" > "00450" > [46] "00460" "00471" "00472" "00480" "00500" "00510" "00511" "00512" > "00520" > [55] "00521" "00530" "00531" "00532" "00533" "00534" "00550" "00561" > "00562" > [64] "00563" "00564" "00565" "00590" "00591" "00592" "00600" "00601" > "00602" > [73] "00603" "00604" "00620" "00624" "00625" "00630" "00632" "00640" > "00641" > [82] "00643" "00650" "00660" "00670" "00680" "00710" "00720" "00730" > "00740" > [91] "00750" "00760" "00770" "00780" "00785" "00790" "00791" "00830" > "00860" > [100] "00900" "00902" "00903" "00910" "00920" "00930" "00940" "00950" > "00960" > [109] "00970" "00980" "00982" "00983" "01030" "01031" "01032" "01040" > "01430" > [118] "01510" "02010" "03010" "03020" "03022" "03030" "03050" "03060" > "03320" > [127] "03410" "03420" "03430" "03440" "03450" "04010" "04012" "04020" > "04060" > [136] "04070" "04080" "04110" "04115" "04120" "04130" "04140" "04150" > "04210" > [145] "04310" "04330" "04340" "04350" "04360" "04370" "04510" "04512" > "04514" > [154] "04520" "04530" "04540" "04610" "04612" "04614" "04620" "04630" > "04640" > [163] "04650" "04660" "04662" "04664" "04670" "04710" "04720" "04730" > "04740" > [172] "04742" "04810" "04910" "04912" "04916" "04920" "04930" "04940" > "04950" > [181] "05010" "05020" "05030" "05040" "05050" "05060" "05110" "05120" > "05130" > [190] "05131" "05210" "05211" "05212" "05213" "05214" "05215" "05216" > "05217" > [199] "05218" "05219" "05220" "05221" "05222" "05223" "05310" "05320" > "05322" > [208] "05330" "05332" "05340" > >> sessionInfo() > R version 2.8.0 (2008-10-20) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETA RY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPH ONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] hgu95av2.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4 > [4] AnnotationDbi_1.4.1 Biobase_2.2.1 > > > Can anybody help? Thanks a lot! > > Leon > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD COMMENT
0
Entering edit mode
Hi James and Laurent, James W. MacDonald wrote: > Hi Leon, > > Leon Yee wrote: >> Dear all, >> >> I encountered a problem when I using Rkeys from AnnotationDbi >> package. Using [] I get a subset of a AnnDbBimap object, but when I >> using Rkeys for this subset, it returns all of the Rkeys from the >> original set. > > I don't believe Rkeys() is intended to return a subset - it is intended > to give you all the right keys of the object. But I am not sure why you > want to use Rkeys() to do what you have already done using > as.list(hgu95av2PATH[ids]). Perhaps I misunderstand? > Yes, as Laurent pointed out, mappedRkeys() is what I'm looking for. I just feel that Rkeys() is semantically somewhat misleading according to its name , you know, Lkeys() just returns the Lkeys of the subset (including NAs), while Rkeys() returns all of the Rkeys of the whole set. Or maybe "[]" is not a real subsetting operation? Thank you very much. Best regards, Leon
ADD REPLY
0
Entering edit mode
Leon Yee wrote: > Hi James and Laurent, > > > James W. MacDonald wrote: >> Hi Leon, >> >> Leon Yee wrote: >>> Dear all, >>> >>> I encountered a problem when I using Rkeys from AnnotationDbi >>> package. Using [] I get a subset of a AnnDbBimap object, but when I >>> using Rkeys for this subset, it returns all of the Rkeys from the >>> original set. >> >> I don't believe Rkeys() is intended to return a subset - it is >> intended to give you all the right keys of the object. But I am not >> sure why you want to use Rkeys() to do what you have already done >> using as.list(hgu95av2PATH[ids]). Perhaps I misunderstand? >> > > Yes, as Laurent pointed out, mappedRkeys() is what I'm looking for. I > just feel that Rkeys() is semantically somewhat misleading according to > its name , you know, Lkeys() just returns the Lkeys of the subset > (including NAs), while Rkeys() returns all of the Rkeys of the whole > set. Or maybe "[]" is not a real subsetting operation? > What is a "real subsetting operation" then ? That just depends on the definition one gives to it. Here the subset operation takes a subset of the "mapping", that is of the vertices in the bipartite graph, without eliminating the unconnected edges. I suppose that this choice can be defended by the fact that edges in an AnnDbBimap object can be without any associated edge, which is making sense. For example, in the context of microarray some probes can be on the array, no given association be associated with it, but yet it is practical to have such probes ID defined in a part (left or right) of the BiMap. L. > Thank you very much. > > Best regards, > Leon > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- - Laurent Gautier, Ph.D. DMAC, Head Building 301, CBS - Technical University of Denmark DK-2800 Lyngby tel: +45 45 25 61 45
ADD REPLY
0
Entering edit mode
Inversion of "edge" and "vertex" in parts of my previous email. Some people will have unconsciously corrected it. The others will be very confused. Here is what it should read: Here the subset operation takes a subset of the "mapping", that is of the edges in the bipartite graph, without eliminating the unconnected vertices. I suppose that this choice can be defended by the fact that vertices in an AnnDbBimap object can be without any associated edge, which is making sense. For example, in the context of microarray some probes can be on the array, no given association be associated with it, but yet it is practical to have such probes ID defined in a part (left or right) of the BiMap.
ADD REPLY
0
Entering edit mode
Laurent Gautier wrote: > Inversion of "edge" and "vertex" in parts of my previous email. > > Some people will have unconsciously corrected it. The others will be > very confused. > > Here is what it should read: > > Here the subset operation takes a subset of the "mapping", that is of > the edges in the bipartite graph, without eliminating the unconnected > vertices. I suppose that this choice can be defended by the fact that > vertices > in an AnnDbBimap object can be without any associated edge, which is > making sense. For example, in the context of microarray some probes can > be on the array, no given association be associated with it, but yet it > is practical to have such probes ID defined in a part (left or right) of > the BiMap. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > I believe that Laurent has the correct interpretation of our motives. These mappings are all based on database joins behind the scenes, so frequently it will be the case that things will not be connected, and often these unconnected things are of interest (and sometimes they are not). The Lkeys() and Rkeys() functions just give all the left or all of the right keys, whether or not they are mapped to anything on the other side. mappedRkeys() and mappedLkeys() are what you want if you only want keys that actually "connect" to something. Marc
ADD REPLY
0
Entering edit mode
The first motivation for keeping keys that are not mapped to anything was to be backward compatible with the old environment-based annotations. For example the hgu95av2PMID map in the hgu95av2 package is a "real" environment containing one symbol per probeset id. And the value of those symbols that are not mapped to a PubMed id is set to NA. This allow all *direct* maps (i.e. maps that go from probeset ids to some other ids) to have the same set of keys (which is the set of all probeset ids defined for the chip). I personally find this to be a nice property because it makes the set of maps defined in a given package more coherent. Cheers, H. Marc Carlson wrote: > Laurent Gautier wrote: >> Inversion of "edge" and "vertex" in parts of my previous email. >> >> Some people will have unconsciously corrected it. The others will be >> very confused. >> >> Here is what it should read: >> >> Here the subset operation takes a subset of the "mapping", that is of >> the edges in the bipartite graph, without eliminating the unconnected >> vertices. I suppose that this choice can be defended by the fact that >> vertices >> in an AnnDbBimap object can be without any associated edge, which is >> making sense. For example, in the context of microarray some probes can >> be on the array, no given association be associated with it, but yet it >> is practical to have such probes ID defined in a part (left or right) of >> the BiMap. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > I believe that Laurent has the correct interpretation of our motives. > These mappings are all based on database joins behind the scenes, so > frequently it will be the case that things will not be connected, and > often these unconnected things are of interest (and sometimes they are > not). The Lkeys() and Rkeys() functions just give all the left or all > of the right keys, whether or not they are mapped to anything on the > other side. mappedRkeys() and mappedLkeys() are what you want if you > only want keys that actually "connect" to something. > > > Marc > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6