Question

Building the tomato annotation library(Affy)

0

Entering edit mode

Groot, Philip de ▴ 630

@groot-philip-de-1307

Last seen 10.7 years ago

Hello list, I found out that the Affymetrix tomato array is supported in Bioconductor, except that no annotation library is available. Well, using the genbank IDs from the Affymetrix csv-file, I tried to build the annotation library myself and I obtain the following error (all the time): Error in loadFromUrl(srcUrl) : URL NArefLink.txt.gz is incorrect or the target site is not responding! In addition: Warning messages: 1: Organism Lycopersicon esculentum is not supported by GoldenPath (GP). in: getUCSCUrl(organism) 2: Organism Lycopersicon esculentum is not supported by GoldenPath (GP). in: getUCSCUrl(organism) [1] "1250 2 2" Error in sort.list(unique.default(x), na.last = TRUE) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? In addition: There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: Failed to parse Golden Path data because of: Error in loadFromUrl(srcUrl) : URL NArefLink.txt.gz is incorrect or the target site is not responding! in: getAnnData(srcObjs) 2: NAs introduced by coercion (The NAs introduction occurs only in R-2.5.0 / BioC 2.0; not in the previous R/Bioc installation that both uses AnnBuilder 1.14.0 running on SuSE Linux Professional 9). Apparently, the problem is that "ABPkgBuilder" tries to sort on something that is not there (the GoldenPath stuff) and this prevents the annotation library of being built. Note that it is still possible to build "something" with this information lacking (or am I wrong with this assumption?). Anyone any idea on how to solve this problem? Regards, Dr Philip de Groot Wageningen University

Annotation Organism AnnBuilder Annotation Organism AnnBuilder • 1.5k views

ADD COMMENT • link 18.0 years ago Groot, Philip de ▴ 630

score 0 · Answer 1 · 2007-05-09

Dear Dr Philip de Groot, Thanks for the report and sorry for the late reply. The "refLink.txt" error was intended, but we have changed it to a more informative message. The "sort.list" error was a bug. It happens when parsing KEGG pathway/enzyme data. The KEGG data file usually contains both pathway and enzyme data for a given organism. But it only has pathway data for tomato (actually ESTs only). This broke the code. We have updated AnnBuilder. Please try the latest one in the bioc 2.1 repository or donwload it from http://bioconductor.org/packages/2.1/bioc/html/AnnBuilder.html A test run for Affymetrix tomato array shows that the annotation is very sparse. Here is the QC data, just FYI: Mappings found for probe based rda files: tomatoACCNUM found 10198 of 10209 tomatoCHR found 0 of 10209 tomatoENTREZID found 1288 of 10209 tomatoENZYME found 0 of 10209 tomatoGENENAME found 1288 of 10209 tomatoMAP found 0 of 10209 tomatoPATH found 0 of 10209 tomatoPMID found 778 of 10209 tomatoREFSEQ found 2 of 10209 tomatoSYMBOL found 1288 of 10209 tomatoUNIGENE found 1288 of 10209 Mappings found for non-probe based rda files: tomatoORGANISM found 1 tomatoPMID2PROBE found 359 We only used the genbank IDs from the Affymetrix csv file, just like what you did. You can also extract the entrez IDs form the csv file and give it as "otherSrc" to ABPkgBuilder. It may increase the annotation coverage. good luck Martin and Nianhua

score 0 · Answer 2 · 2007-05-11

Hello Martin and Nianhua, Thank you very much for solving the issues in AnnBuilder!! I succesfully build the tomato annotation library and even improved it a little bit by adding the Affy EntrezIDs via the otherSrc section. This is my tomatoQC(): Quality control information for tomato Date built: Created: Thu May 10 17:33:37 2007 Number of probes: 10209 Probe number missmatch: None Probe missmatch: None Mappings found for probe based rda files: tomatoACCNUM found 10198 of 10209 tomatoCHR found 0 of 10209 tomatoENTREZID found 1296 of 10209 tomatoENZYME found 0 of 10209 tomatoGENENAME found 1296 of 10209 tomatoMAP found 0 of 10209 tomatoPATH found 0 of 10209 tomatoPMID found 783 of 10209 tomatoREFSEQ found 2 of 10209 tomatoSYMBOL found 1296 of 10209 tomatoUNIGENE found 1296 of 10209 Mappings found for non-probe based rda files: tomatoORGANISM found 1 tomatoPMID2PROBE found 360 which is a minimal improvement compared what was obtained first. Using unigene (instead of gb) did not improve it (on the contrary). The only problem I have now is that the GO-annotation is totally missing, whereas it is available in the Affymetrix annotation library. Furthermore, the CHRLOC environment is missing (among against other environments I guess) and this causes an inconsistency: if the information cannot be retrieved, why not include a vector with only NAs? At least, the (now missing) environments are there and (in this case: my script) won't break on it. For the moment, I solved it by checking whether tomato is being analysed or not, but including at least an empty vector is a "nicer" solution (to my opinion). And that the annotation of the tomato array is poor: well, we expected this. Thank you anyway for helping us out! Regards, Philip ________________________________ From: Nianhua Li [mailto:nli@fhcrc.org] Sent: Wed 9-5-2007 20:28 To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Building the tomato annotation library(Affy) Dear Dr Philip de Groot, Thanks for the report and sorry for the late reply. The "refLink.txt" error was intended, but we have changed it to a more informative message. The "sort.list" error was a bug. It happens when parsing KEGG pathway/enzyme data. The KEGG data file usually contains both pathway and enzyme data for a given organism. But it only has pathway data for tomato (actually ESTs only). This broke the code. We have updated AnnBuilder. Please try the latest one in the bioc 2.1 repository or donwload it from http://bioconductor.org/packages/2.1/bioc/html/AnnBuilder.html A test run for Affymetrix tomato array shows that the annotation is very sparse. Here is the QC data, just FYI: Mappings found for probe based rda files: tomatoACCNUM found 10198 of 10209 tomatoCHR found 0 of 10209 tomatoENTREZID found 1288 of 10209 tomatoENZYME found 0 of 10209 tomatoGENENAME found 1288 of 10209 tomatoMAP found 0 of 10209 tomatoPATH found 0 of 10209 tomatoPMID found 778 of 10209 tomatoREFSEQ found 2 of 10209 tomatoSYMBOL found 1288 of 10209 tomatoUNIGENE found 1288 of 10209 Mappings found for non-probe based rda files: tomatoORGANISM found 1 tomatoPMID2PROBE found 359 We only used the genbank IDs from the Affymetrix csv file, just like what you did. You can also extract the entrez IDs form the csv file and give it as "otherSrc" to ABPkgBuilder. It may increase the annotation coverage. good luck Martin and Nianhua