metadata for Affymetrix Poplar array
3
0
Entering edit mode
Dick Beyer ★ 1.4k
@dick-beyer-26
Last seen 10.3 years ago
Will there be a metadata file available for the Affymetrix poplar array? There is poplarcdf and poplarprobe, but no poplar, "Affymetrix Poplar Genome Array Annotation Data (poplar)". Thanks much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer
Annotation cdf Annotation cdf • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Dick Beyer wrote: > Will there be a metadata file available for the Affymetrix poplar > array? There is poplarcdf and poplarprobe, but no poplar, > "Affymetrix Poplar Genome Array Annotation Data (poplar)". I don't think poplar is popular enough ;-D Seriously though, there is a limit on the number of annotation packages that can be made (time is money and all that), so for a lot of chips there probably will never be an annotation package. The expectation is that the end user will use AnnBuilder or possibly biomaRt to do the annotation themselves (not that I think it would be trivial in this case). Best, Jim > > Thanks much, Dick > ******************************************************************** *********** > Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 > Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 > Roosevelt Way NE, # 100 Seattle, WA 98105-6099 > http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html > http://staff.washington.edu/~dbeyer > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
ADD COMMENT
0
Entering edit mode
Hi Jim, Thanks for responding so quickly. Into the trenches with AnnBuilder! Cheers, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* On Fri, 16 Feb 2007, James W. MacDonald wrote: > Dick Beyer wrote: >> Will there be a metadata file available for the Affymetrix poplar >> array? There is poplarcdf and poplarprobe, but no poplar, >> "Affymetrix Poplar Genome Array Annotation Data (poplar)". > > I don't think poplar is popular enough ;-D > > Seriously though, there is a limit on the number of annotation packages that > can be made (time is money and all that), so for a lot of chips there probably > will never be an annotation package. > > The expectation is that the end user will use AnnBuilder or possibly biomaRt to > do the annotation themselves (not that I think it would be trivial in this > case). > > Best, > > Jim > > >> >> Thanks much, Dick >> ******************************************************************* ************ >> Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 >> Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 >> Roosevelt Way NE, # 100 Seattle, WA 98105-6099 >> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >> http://staff.washington.edu/~dbeyer >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 > > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues. >
ADD REPLY
0
Entering edit mode
Nianhua Li ▴ 870
@nianhua-li-1606
Last seen 10.3 years ago
Hi, Dick, AnnBuilder won't work for poplar right away. Here is a mini guide. You can also follow this instruction to enable AnnBuilder for other organisms. (Dick, I am sorry but it seems almost hopeless for poplar.) A term definition before we start: organism name: I will use this term through out the email. The organism name for human is "Homo sapiens". Function "ABPkgBuilder" has an argument "organism". So, if you want to build annotation for human genes, give "Homo sapiens" as the value for "organism". The function will use the argument value to find data at UCSC Genome Database, IPI, KEGG and UniGene. 1. Make sure the organism is supported by Entrez Gene: 1.1 Search http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Taxonomy for NCBI taxonomy id with your organism name. Poplar sp. is 3697. 1.2 Check whether the taxonomy id is included in the files at ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ . You want to check gene2accession, gene2pubmed, gene2refseq, gene2unigene and mim2gene. Check "README" to see if your organism is included in gene2go. Poplar is not on the list, so you won't get GO annotation. 2. Check KEGG Find your organism from ftp.genome.ad.jp/pub/kegg/tarfiles/genome make sure the organism name is consistent with the value in field "DEFINITION" in this file. Populus sp. is not on the list, but there are "Populus tremula" and "Populus balsamifera". (KEGG is temporarily down right now) 3. Check UCSC Genome Database: Go to UCSC Genome Database website ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/ and find the folder name corresponding to your organism. Poplar is not availabe, so you won't get information about chromosome location (CHRLOC). If your organism is supported, modify function "getUCSCUrl" in file "getSrcUrl.R" to add the folder name. For example, this is for chicken. "GALLUS GALLUS" is the organism name in upper case. "Gallus_gallus" is the folder name. ================================================================= --- getSrcUrl.R (new) +++ getSrcUrl.R (old) <at> <at> -65,7 +65,6 <at> <at> "DANIO RERIO" = "Danio_Rerio", "CAENORHABDITIS ELEGANS" = "Caenorhabditis_elegans", "DROSOPHILA MELANOGASTER" = "Drosophila_melanogaster", - "GALLUS GALLUS" = "Gallus_gallus", NA) ifis.na(key)) { warning(paste("Organism", organism, "is not supported by GoldenPath (GP).")) ================================================================== Similarly, add the folder name to function "getPubDataGoldenPath" in file "downloadSourceData.R". 4. Check UniGene (only necessary when you use "ug" or "gbNRef" as baseMapType to invoke ABPkgBuilder: Look at function "UGSciNames" in file "getSrcUrl.R", check if your organism is on the list. If not, visit ftp://ftp.ncbi.nih.gov/repository/UniGene, find the folder for your organism, go inside the folder, find *.data.gz. I can't find "Populus sp.", but there is "Populus_trichocarpa" and "Populus_tremula_x_Populus_tremuloides". The file for Populus_trichocarpa is "Pth.data.gz". "Pth" is the "UGSciName". Add it to the R function. Make sure it is mapped to the organism name. 5. Check IPI: Go to ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ and find the folder corresponding to your organism. Poplar is not supported, so you won't get crossreferences between gene and PFAM (and PROSITE). If your organism is supported, modify function "speciesNorganism" in file "IPI.R" to add your organism. For example, the mapping for human is: c("human", "Homo sapiens"), "human" is the folder name in all lower case. "Homo sapiens" is the organism name. Hope this is helpful and good luck! nianhua
ADD COMMENT
0
Entering edit mode
Nianhua Li ▴ 870
@nianhua-li-1606
Last seen 10.3 years ago
> Hi, Dick, > > Here are some additional infomation: > > You can extract probeset-to-EntrezGene mapping from affymetrix's > annotation file, give it as "otherSrc" and feed to ABPkgBuilder: > >> ABPkgBuilder(baseName="affy_poplar_GeneBank_for_AnnBuilder.txt", >> baseMapType="gbNRef", >> pkgName="poplar", >> pkgPath=".", >> organism="Populus trichocarpa", >> version="1.12.0", >> otherSrc=c( >> EG= "affy_poplar_Entrez_for_AnnBuilder.txt"), >> author=list( >> authors="Dick Beyer", >> maintainer="Dick Beyer..." >> ) >> ) > > AnnBuilder will use GenBank mapping as the primary source to find > Entrez Gene mappings for the probesets. If any probeset doesn't have > mappings, AnnBuilder will use the file given as "otherSrc" as a > supplement. So you can get better annotation coverage. > >> I am not sure if this whole approach will ultimately be correct as the >> Affy poplar array has 13 different Populus species on it, with Populus >> trichocarpa only one of them. > > This won't be a big problem in your case. AnnBuilder extracts > annotations from Entrez Gene by using Entrez Gene IDs, not taxonomy > IDs. The organism argument will only affect the following annotations: > pathway from KEGG, PROSITE and PFAM cross-reference from IPI, and > chromosome location from UCSC Genome. Neither IPI or UCSC support any > Populus species. KEGG supports Populus tremula (aspen) (EST) (eptp) > and Populus balsamifera (poplar) (EST) (epba), but only have > gene-pathway mappings for epba. The mapping is for ESTs, not for gene, > so may not match any Entrez Gene IDs at all. If you want to use this > mapping, give "Populus balsamifera (poplar) (EST)" as organism. I am > not sure whether you need the whole string or just the Latin name > part. But then it will conflict with UniGene, because UniGene only > supports Populus_trichocarpa and > Populus_tremula_x_Populus_tremuloides. UniGene is less important. It > is only used as a supplemental source for probeset to Entrez Gene > mapping. If you give probeset-to-EntrezGene mapping as the baseName > and set baseMapType as ll, you can bypass UniGene. > > To summary, two options: > 1. Use the above script to invoke AnnBuilder and add > "Populus_trichocarpa=Pth" to function "UGSciNames" in file "getSrcUrl.R". > > 2. Change organism to "Populus balsamifera" and use > "probeset-to-EntrezGene" mapping as baseName and "ll" as baseMapType > > The bottom line is that you can get gene name, gene symbol, > chromosome, cytogenetic band, pubmed, unigene, refseq, and entrez gene > for your probesets. > > let me know if you need any help and good luck > > nianhua > >
ADD COMMENT

Login before adding your answer.

Traffic: 563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6