AnnBuilder results question.
1
0
Entering edit mode
@johan-lindberg-815
Last seen 10.2 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20051020/ 60503334/attachment.pl
• 591 views
ADD COMMENT
0
Entering edit mode
rgentleman ★ 5.5k
@rgentleman-7725
Last seen 9.6 years ago
United States
Hi, Johan Lindberg wrote: > Hi all. I have a question about something that puzzles me. > > I have a set of genes in an index vector that I am interested in. > ###################################### > >>de.idx.Avs > > RG89-F4 RG249-E4 RG317-E7 RG18-B1 RG130-F1 RG88-E1 RG301-B5 > 856 2528 2666 3413 3638 4226 6687 > RG279-A2 RG121-A5 RG145-A8 RG205-A2 RG18-F2 RG90-B11 RG170-F8 > 7313 7679 7729 7845 8822 8971 9130 > RG248-E11 RG18-A2 RG42-A2 RG94-A5 RG306-A5 RG299-F12 RG89-F3 > 9960 10173 10221 10327 10751 11416 11670 > RG289-B9 RG7-A12 RG31-A3 RG243-E12 RG265-E3 RG305-A6 RG64-B9 > 12073 12183 12225 12656 13374 13455 13645 > RG200-F3 RG95-C1 RG95-G7 RG211-C10 RG283-G4 RG88-D10 RG122-D1 > 13914 17761 17766 17999 18140 19103 19845 > RG202-H10 RG206-D7 RG124-C10 RG252-C10 RG18-C1 RG22-G1 RG22-G4 > 20012 20017 20527 20783 20989 20998 21000 > RG238-G10 RG147-H5 RG283-H2 RG185-H2 RG95-C2 RG215-C11 RG231-G2 > 21436 21924 22194 22678 23169 23415 23442 > RG97-C2 RG145-C11 RG16-D8 RG184-D2 RG202-H11 RG120-C8 RG220-C8 > 23853 23955 24365 24697 25420 25925 26125 > RG261-D3 RG305-H3 RG7-C12 RG95-G3 RG95-G6 RG151-C3 RG231-G3 > 28237 28326 28407 28578 28580 28689 28850 > RG293-C9 RG18-D3 RG230-H9 RG18-C3 RG170-G6 > 29657 30453 30882 31805 32112 > ###################################### > > The names in the vector are the unique identifiers on the chip and the > number is the location on the chip. > > If I use my home-brewed package to this chip and retrieve geneIDs and > accessionnumbers I use: > this was built with AnnBuilder? > ###################################### > Vec.Acc <- unlist(mget(names(de.idx.Avs),Hum30kbatch1to5ACCNUM)) > Vec.GeneN <- unlist(mget(names(de.idx.Avs),Hum30kbatch1to5GENENAME)) > ###################################### > > But if I look at the length of those vectors: > ###################################### > >>length(Vec.Acc) > > [1] 68 > >>length(Vec.GeneN) > > [1] 69 > ###################################### > They are not of the same length. I think this depends on some error in the > annotation, because if I just look at the 4:th item in the names(de.idx.Avs) > vector > Its: > ###################################### > >>names(de.idx.Avs)[4] > > [1] "RG18-B1" > >>names(unlist(mget(names(de.idx.Avs[4]),Hum30kbatch1to5ACCNUM))) > > [1] "RG18-B1" > >>names(unlist(mget(names(de.idx.Avs[4]),Hum30kbatch1to5GENENAME))) > > [1] "RG18-B11" "RG18-B12" > ###################################### > > Then two items are returned from the Hum30kbatch1to5GENENAME environment but > only one from the Hum30kbatch1to5ACCNUM environment. I would guess it had > something to do with the mget functions discriminatory ability between > "RG18-B1" and "RG18-B12" or "RG18-B11" but since it works for > Hum30kbatch1to5ACCNUM I do not know. I am not sure what the issue is. Some mappings are one to many, it appears as if this (GENENAME) is one such case (there are lots of others). So you must deal with this. Your annotation package says that the id, RG18-B1 (I think that is the fourth entry in your vector) is mapped to two gene names, but only one ACCNUM. It has nothing to do with mget or any other function, that I can see. > > What also puzzles me is that > > ###################################### > >>unlist(mget("RG18-B1",Hum30kbatch1to5GENENAME)) > > RG18-B11 > > "myeloid/lymphoid or mixed-lineage leukemia (trithorax homolog, Drosophila)" > > RG18-B12 > > " translocated to, 10" > So this says, that in your GENENAME environment (or hash table) the symbol RG18-B1 is mapped to two different names. Do you know if the mappings (RG18-B11 and RG18-B12) are right? And then if so, you need to decide which of the two sets of names are right these ones, or the ones below (or is it the name that is right, and the symbol wrong). > ###################################### > >>unlist(mget("RG18-B11",Hum30kbatch1to5GENENAME)) > > RG18-B11 > "multiple coagulation factor deficiency 2" this is really an odd way to go about it, use Hum30kbatch1to5GENENAME$"RG18-B11" it is slightly easier to follow, and mget is "multi-get", and intended for use when you want more than one thing, same goes for the example below. > ###################################### > >>unlist(mget("RG18-B12",Hum30kbatch1to5GENENAME)) > > RG18-B12 > "transportin 1" > ###################################### > > Different GENENAME:s are returned for "RG18-B11" depending if I use > "RG18-B1" or "RG18-B11". Do you know which one is correct? It does seem that something is confused (but since you built them yourself and we don't have them it will be kind of hard to debug). There are lots of possible places where problems can have arisen, but it would be nice to know where. Unfortunately much of the work needed ends up on you. Best wishes, Robert > > Any advice someone? > > Best regards > > // Johan L > > > > > >>sessionInfo() > > R version 2.1.1, 2005-06-20, i386-pc-mingw32 > > attached base packages: > [1] "splines" "tools" "methods" "stats" "graphics" > [6] "grDevices" "utils" "datasets" "base" > > other attached packages: > marray hgu95av2 GOstats multtest > "1.6.3" "1.8.4" "1.1.1" "1.7.3" > genefilter survival xtable RBGL > "1.6.3" "2.18" "1.2-5" "1.3.13" > graph Ruuid cluster Hum30kbatch1to5 > "1.5.9" "1.5.3" "1.10.0" "1.1.0" > hgu133plus2 annaffy KEGG GO > "1.7.0" "1.0.18" "1.8.1" "1.8.2" > gcrma matchprobes affy maanova > "1.1.4" "1.0.22" "1.6.7" "0.98-3" > kth aroma R.io R.graphics > "0.4.5" "0.85" "0.62" "0.62" > R.colors R.basic R.utils R.oo > "0.4" "0.62" "0.62" "0.62" > limma reposTools annotate Biobase > "2.0.3" "1.5.19" "1.5.16" "1.5.12" > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD COMMENT

Login before adding your answer.

Traffic: 913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6