FW: PLANdbAffy + Alternative Exon Annotation +XPS, aroma, oligo, RMAExpress

0

Entering edit mode

Ramil Nurtdinov ▴ 10

@ramil-nurtdinov-4381

Last seen 10.2 years ago

Dear colleagues My experience with R BioConductor and Affymetrix Human Exon 1.0 ST array started from oligo package. Unfortunately for my 19 HuExon1.0 arrays R asks for approx 6-7 Gigabytes of memory. While RMA algorithm in Affymetrix Expression Console takes 40 minutes of my Sony Vailo notebook. Second there was no good annotation for this chip in R, except X:Map, my competitor for the paper :)) So first problem I had solved by Expression Console and for second problem we had developed PLANdbAffy http://nar.oxfordjournals.org/content/38/suppl_1/D726.long Now I am finishing EnsEmbl plus hg19 version of database. I understand that BioConductor is widely used in scientific word but my load is rather big because of many new projects. If somebody gives me the format for annotation I can make corresponding database summary file. Yours sincerely, Ramil Nurtdinov, PhD .On 12/7/10, B.Misovic at lumc.nl <b.misovic at="" lumc.nl=""> wrote: > Dear Ramil, > > > > I see I forgot to add you in the email bellow which I've sent to > bioConductor mailing list and our collaborators in Poland... just in > case you have some comments. > > > > Best, > > Branko > > > > ________________________________ > > From: Misovic, B. (TOXGEN) > Sent: 07 December 2010 15:09 > To: 'roman.jaksik at polsl.pl'; 'bioconductor at r-project.org' > Cc: 'cstrato' > Subject: PLANdbAffy + Alternative Exon Annotation > +XPS,aroma,oligo,RMAExpress > > > > Dear Roman, all > > > > Recently we tried your version of Annotation files for Gene 1.0 ST > array that your team built from PLANdbAffy DB . I encountered some > problems so I hope you can help. > > > > You provide nice CDF and Affy PGF/CLF files , but, the PGF/CLF were not > useful in bioConductor packages for affy Exon/Gene type arrays ,namely: > oligo & XPS as they require annotation file in csv format. I tried the > annotation csv file from Affymetrix and after that from PLANdbAffy DB. > The PLANdbAffy csv file is very different from Affymetrix one so import > is not possible (actually csv file on the website is TAB delimited > instead of comma so problem already starts there , and it requires > reformatting). > > Christian from XPS was kind to inform me that : > > >>... PLANdbAffy annotation columns have nothing to do with the > Affymetrix >>annotation columns. Thus xps will not read these annotation files. > >>Alternative annotation files must contain exactly the same columns as > >>the Affymetrix annotation files. > > > >>For whole genome and exon arrays it is not possible to use only the > PGF->files w/o the annotation files, since I extract most of the > important >information from the probeset-annotation file first, so this > file is >absolutely essential. For example, column "level" contains the > information >Core/Extended/Full, see the corresponding annotation README > files for an >explanation of all columns. > > > >>xps error you get simply says that their PGF-file does not contain the >>AFFX controls, so maybe adding the AFFX controls to their PGF-file > might >help. However, as you mention, they use their own Probesetids, > which will >not match the Probesetids of the Affymetrix annotation > files, thus it may >not work anyhow. > > > >>It is not quite clear to me why they created their own PGF-file. The >>Affymetrix PGF-file contains only 1-4 probes for each probeset, where > each >exon consists of one or more probesets, thus the probability that > a probe >within a probeset is not correct should be pretty small. > However, a >probeset could be mapped to a wrong exon/gene or no gene at > all, so it >should be sufficient to correct the Affymetrix annotation > files. > > > > The tools like RMAExpress, EC., and Aroma.affymetrix, can work with > CDF only. So after using RMAExpress (in command line mode) I did get > Expression matrix out but I could not link 19532 Probeset ids to > PLANdbAffy annotation csv file to collect gene basic information. What i > did was , 1st load the full annotation file (not filtered) from > PLANdbAffy: > http://affymetrix2.bioinf.fbb.msu.ru/files.html > > and search the 2nd colum (Probe_Sets) with ids after RMA and I find 0... > then i tried the 1st column (the Probes ) and found 8664... but I would > expect vice versa situation ? > > > > So Roman can you please: > 1) advise how to get real ids after RMAExpress run? > 2) do you plan to build Annotation csv file as Affymetrix dose so that > other software from Bioconductor oligo & XPS can use it? > 3) comment on Christian feedback. > > > > Btw. Christian, how come RMAExpress, EC., and Aroma.affymetrix can work > with CDFs only and oligo & XPS require extra annotation? From what I > gather (after peaking into CDF and PGF files ) they show what probes are > belonging to probe_set. So for probe_set level analysis (or more > exon_like analysis) the PGF/CLF files alone seem to be enough? > > > > For bioc list, just to bring attention to this article & DB : > > > > PLANdbAffy: probe-level annotation database for Affymetrix expression > microarrays , Ramil N. Nurtdinov1 et al. > > http://nar.oxfordjournals.org/content/38/suppl_1/D726.full > > > > http://affymetrix2.bioinf.fbb.msu.ru/ > > > > Maybe some of bioC experts have comments about it? > > > > Best, > > Branko > > > > -------------------------- > > Branislav Misovic, > > Department of Toxicogenetics > > Leiden University Medical Center > > Einthovenweg 20, 2333 ZC Leiden > > PO.box 9600, Building2,Room:T3-11 > > 2300 RC Leiden > > The Netherlands > > Phone: +31 71 526 9636 > > Mob: 0653135855 > > E-mail: > > b.misovic at lumc.nl > > braniti at gmail.com > > > >

Annotation cdf probe affy oligo xps DOSE Annotation cdf probe affy oligo xps DOSE • 1.8k views

ADD COMMENT • link updated 14.0 years ago by branislav misovic ▴ 120 • written 14.0 years ago by Ramil Nurtdinov ▴ 10

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.1 years ago

Austria

Dear Ramil, Please let me mention that handling 19 HuExon arrays on your notebook using one of the Bioconductor packages should not be a problem as you can see on the Bioconductor workflows site: http://www.bioconductor.org/help/workflows/oligo-arrays/#pre- processing-resources It says for example that xps "will run on conventional desktop computers". Regarding the format for annotation: Package xps requires the annotation.csv file format from Affymetrix. Other BioC packages usually require one of the metadata packages found in: http://www.bioconductor.org/help/bioc-views/release/data/annotation/ To my knowledge these metadata are usually created by package AnnotationDbi. Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 12/7/10 4:21 PM, Ramil Nurtdinov wrote: > Dear colleagues > > My experience with R BioConductor and Affymetrix Human Exon 1.0 ST > array started from oligo package. Unfortunately for my 19 HuExon1.0 > arrays R asks for approx 6-7 Gigabytes of memory. While RMA algorithm > in Affymetrix Expression Console takes 40 minutes of my Sony Vailo > notebook. Second there was no good annotation for this chip in R, > except X:Map, my competitor for the paper :)) > > So first problem I had solved by Expression Console and for second > problem we had developed PLANdbAffy > http://nar.oxfordjournals.org/content/38/suppl_1/D726.long > > Now I am finishing EnsEmbl plus hg19 version of database. I understand > that BioConductor is > widely used in scientific word but my load is rather big because of > many new projects. > > If somebody gives me the format for annotation I can make > corresponding database summary file. > > Yours sincerely, > Ramil Nurtdinov, PhD > > .On 12/7/10, B.Misovic at lumc.nl<b.misovic at="" lumc.nl=""> wrote: >> Dear Ramil, >> >> >> >> I see I forgot to add you in the email bellow which I've sent to >> bioConductor mailing list and our collaborators in Poland... just in >> case you have some comments. >> >> >> >> Best, >> >> Branko >> >> >> >> ________________________________ >> >> From: Misovic, B. (TOXGEN) >> Sent: 07 December 2010 15:09 >> To: 'roman.jaksik at polsl.pl'; 'bioconductor at r-project.org' >> Cc: 'cstrato' >> Subject: PLANdbAffy + Alternative Exon Annotation >> +XPS,aroma,oligo,RMAExpress >> >> >> >> Dear Roman, all >> >> >> >> Recently we tried your version of Annotation files for Gene 1.0 ST >> array that your team built from PLANdbAffy DB . I encountered some >> problems so I hope you can help. >> >> >> >> You provide nice CDF and Affy PGF/CLF files , but, the PGF/CLF were not >> useful in bioConductor packages for affy Exon/Gene type arrays ,namely: >> oligo& XPS as they require annotation file in csv format. I tried the >> annotation csv file from Affymetrix and after that from PLANdbAffy DB. >> The PLANdbAffy csv file is very different from Affymetrix one so import >> is not possible (actually csv file on the website is TAB delimited >> instead of comma so problem already starts there , and it requires >> reformatting). >> >> Christian from XPS was kind to inform me that : >> >> >>> ... PLANdbAffy annotation columns have nothing to do with the >> Affymetrix >>> annotation columns. Thus xps will not read these annotation files. >> >>> Alternative annotation files must contain exactly the same columns as >> >>> the Affymetrix annotation files. >> >> >> >>> For whole genome and exon arrays it is not possible to use only the >> PGF->files w/o the annotation files, since I extract most of the >> important>information from the probeset-annotation file first, so this >> file is>absolutely essential. For example, column "level" contains the >> information>Core/Extended/Full, see the corresponding annotation README >> files for an>explanation of all columns. >> >> >> >>> xps error you get simply says that their PGF-file does not contain the >>> AFFX controls, so maybe adding the AFFX controls to their PGF-file >> might>help. However, as you mention, they use their own Probesetids, >> which will>not match the Probesetids of the Affymetrix annotation >> files, thus it may>not work anyhow. >> >> >> >>> It is not quite clear to me why they created their own PGF-file. The >>> Affymetrix PGF-file contains only 1-4 probes for each probeset, where >> each>exon consists of one or more probesets, thus the probability that >> a probe>within a probeset is not correct should be pretty small. >> However, a>probeset could be mapped to a wrong exon/gene or no gene at >> all, so it>should be sufficient to correct the Affymetrix annotation >> files. >> >> >> >> The tools like RMAExpress, EC., and Aroma.affymetrix, can work with >> CDF only. So after using RMAExpress (in command line mode) I did get >> Expression matrix out but I could not link 19532 Probeset ids to >> PLANdbAffy annotation csv file to collect gene basic information. What i >> did was , 1st load the full annotation file (not filtered) from >> PLANdbAffy: >> http://affymetrix2.bioinf.fbb.msu.ru/files.html >> >> and search the 2nd colum (Probe_Sets) with ids after RMA and I find 0... >> then i tried the 1st column (the Probes ) and found 8664... but I would >> expect vice versa situation ? >> >> >> >> So Roman can you please: >> 1) advise how to get real ids after RMAExpress run? >> 2) do you plan to build Annotation csv file as Affymetrix dose so that >> other software from Bioconductor oligo& XPS can use it? >> 3) comment on Christian feedback. >> >> >> >> Btw. Christian, how come RMAExpress, EC., and Aroma.affymetrix can work >> with CDFs only and oligo& XPS require extra annotation? From what I >> gather (after peaking into CDF and PGF files ) they show what probes are >> belonging to probe_set. So for probe_set level analysis (or more >> exon_like analysis) the PGF/CLF files alone seem to be enough? >> >> >> >> For bioc list, just to bring attention to this article& DB : >> >> >> >> PLANdbAffy: probe-level annotation database for Affymetrix expression >> microarrays , Ramil N. Nurtdinov1 et al. >> >> http://nar.oxfordjournals.org/content/38/suppl_1/D726.full >> >> >> >> http://affymetrix2.bioinf.fbb.msu.ru/ >> >> >> >> Maybe some of bioC experts have comments about it? >> >> >> >> Best, >> >> Branko >> >> >> >> -------------------------- >> >> Branislav Misovic, >> >> Department of Toxicogenetics >> >> Leiden University Medical Center >> >> Einthovenweg 20, 2333 ZC Leiden >> >> PO.box 9600, Building2,Room:T3-11 >> >> 2300 RC Leiden >> >> The Netherlands >> >> Phone: +31 71 526 9636 >> >> Mob: 0653135855 >> >> E-mail: >> >> b.misovic at lumc.nl >> >> braniti at gmail.com >> >> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 14.0 years ago cstrato ★ 3.9k

0

Entering edit mode

branislav misovic ▴ 120

@branislav-misovic-4248

Last seen 5.6 years ago

Netherlands/Amsterdam

Thank you Roman for clarification . >> What i did was , 1st load the full annotation file (not filtered) from >> PLANdbAffy: >> http://affymetrix2.bioinf.fbb.msu.ru/files.html >> and search the 2nd colum (Probe_Sets) with ids after RMA and I find 0... >> then i tried the 1st column (the Probes ) and found 8664... but I would >In the first situation there were no matches because of the inconsistency in the ID type, in second you probably got 8664 matches by chance since the probe id values share the same range as EntrezGene ids. You should try 14th column from the >HuGene-1_0.flat file (entrezgene_id) although again the file is probe specific and it might not be much of a help depending on what annotation data you are looking for. Now using entrezgene as identifier I can use in Bioconductor several DBs to build my gene information etc., so no problem ... Please add in your readme.txt file about this important entrezgene detail. >> do you plan to build Annotation csv file as Affymetrix dose so that >> other software from Bioconductor oligo & XPS can use it? >I can do it very quickly if you could specify what annotation data you are interested in, although most of the EntrezGene data is directly available through NCBI's FTP server. Btw. are you on BioC list? If not please join http://www.bioconductor.org/help/mailing-list/ , the answer is in thread: http://thread.gmane.org/gmane.science.biology.informatics.conductor/32 337/focus=32347 also more here http://thread.gmane.org/gmane.science.biology.informatics.conductor/32 333 Ciao, Branko -------------------------- Branislav Misovic, Department of Toxicogenetics Leiden University Medical Center Einthovenweg 20, 2333 ZC Leiden PO.box 9600, Building2,Room:T3-11 2300 RC Leiden The Netherlands Phone: +31 71 526 9636 Mob: 0653135855 E-mail: b.misovic@lumc.nl braniti@gmail.com ________________________________ From: Roman Jaksik [mailto:Roman.Jaksik@polsl.pl] Sent: 08 December 2010 18:48 To: Ramil Nurtdinov; Misovic, B. (TOXGEN); bioconductor@r-project.org Cc: Joanna Polañska Subject: ODP: FW: PLANdbAffy + Alternative Exon Annotation +XPS,aroma,oligo,RMAExpress Dear colleagues, The main purpose of annotation files provided by Affymetrix is to link probeset id's with corresponding genes represented by various identifiers, genomic location etc. Probesets included in the custom CDF file created based on PLANdbAffy are gene specific and the probeset ids are in fact EntrezGene identifiers (like in the custom CDFs described in Dai 2005) therefore I never suspected that there will be a need for an annotation file. As stated by Christian from XPS PLANdbAffy database files available on the authors website have nothing to do with Affymetrix library files (aside from the inconsistency in CSV formatting). Official annotation files are probeset specific and their first column contains probeset or so called transcript cluster ids. PLANdbAffy database files describe individual probes and the first column contains probe ids assigned by Affymetrix based on their location (x+y*array_size), in the second you will find official probeset ids. By combining the RMAExpress result files with PLANdbAffy database file there is a big risk that you might end up linking probesets to probes. > What i did was , 1st load the full annotation file (not filtered) from > PLANdbAffy: > http://affymetrix2.bioinf.fbb.msu.ru/files.html > and search the 2nd colum (Probe_Sets) with ids after RMA and I find 0... > then i tried the 1st column (the Probes ) and found 8664... but I would > expect vice versa situation ? In the first situation there were no matches because of the inconsistency in the ID type, in second you probably got 8664 matches by chance since the probe id values share the same range as EntrezGene ids. You should try 14th column from the HuGene-1_0.flat file (entrezgene_id) although again the file is probe specific and it might not be much of a help depending on what annotation data you are looking for. > do you plan to build Annotation csv file as Affymetrix dose so that > other software from Bioconductor oligo & XPS can use it? I can do it very quickly if you could specify what annotation data you are interested in, although most of the EntrezGene data is directly available through NCBI's FTP server. >>It is not quite clear to me why they created their own PGF-file. The >>Affymetrix PGF-file contains only 1-4 probes for each probeset, where > each >exon consists of one or more probesets, thus the probability that > a probe >within a probeset is not correct should be pretty small. > However, a >probeset could be mapped to a wrong exon/gene or no gene at > all, so it >should be sufficient to correct the Affymetrix annotation > files. Some of the exon specific sets in the official library files can reach even 25 probes (ex.: 8124458) while those transcript specific even over 100 (ex.: 7900710). Why remap the original library files is a very long story and Ramil has probably more to say about that. The biggest threat are not probes which do not map a specific region but those with small specificity capable of binding to other gene products. In our study we also tried the exon and transcript level analysis based on the official library files but they turned out to be of much lower quality. In case of any additional questions please feel free to send me an email, I will do my best to help as soon as possible. A side note to Ramil Nurtdinov: please take a look at probe id 576313 in both HuGene-1_0.flat and HuGene-1_0.full.flat files. Kind regards, Roman Jaksik Institute of Automatic Control Silesian University of Technology Akademicka 16 44-100 Gliwice Poland ________________________________ Od: Ramil Nurtdinov [mailto:ramil@bioinf.fbb.msu.ru] Wys³ano: Wt 2010-12-07 16:21 Do: B.Misovic@lumc.nl; Roman Jaksik; bioconductor@r-project.org Temat: Re: FW: PLANdbAffy + Alternative Exon Annotation +XPS,aroma,oligo,RMAExpress Dear colleagues My experience with R BioConductor and Affymetrix Human Exon 1.0 ST array started from oligo package. Unfortunately for my 19 HuExon1.0 arrays R asks for approx 6-7 Gigabytes of memory. While RMA algorithm in Affymetrix Expression Console takes 40 minutes of my Sony Vailo notebook. Second there was no good annotation for this chip in R, except X:Map, my competitor for the paper :)) So first problem I had solved by Expression Console and for second problem we had developed PLANdbAffy http://nar.oxfordjournals.org/content/38/suppl_1/D726.long Now I am finishing EnsEmbl plus hg19 version of database. I understand that BioConductor is widely used in scientific word but my load is rather big because of many new projects. If somebody gives me the format for annotation I can make corresponding database summary file. Yours sincerely, Ramil Nurtdinov, PhD .On 12/7/10, B.Misovic@lumc.nl <b.misovic@lumc.nl> wrote: > Dear Ramil, > > > > I see I forgot to add you in the email bellow which I've sent to > bioConductor mailing list and our collaborators in Poland... just in > case you have some comments. > > > > Best, > > Branko > > > > ________________________________ > > From: Misovic, B. (TOXGEN) > Sent: 07 December 2010 15:09 > To: 'roman.jaksik@polsl.pl'; 'bioconductor@r-project.org' > Cc: 'cstrato' > Subject: PLANdbAffy + Alternative Exon Annotation > +XPS,aroma,oligo,RMAExpress > > > > Dear Roman, all > > > > Recently we tried your version of Annotation files for Gene 1.0 ST > array that your team built from PLANdbAffy DB . I encountered some > problems so I hope you can help. > > > > You provide nice CDF and Affy PGF/CLF files , but, the PGF/CLF were not > useful in bioConductor packages for affy Exon/Gene type arrays ,namely: > oligo & XPS as they require annotation file in csv format. I tried the > annotation csv file from Affymetrix and after that from PLANdbAffy DB. > The PLANdbAffy csv file is very different from Affymetrix one so import > is not possible (actually csv file on the website is TAB delimited > instead of comma so problem already starts there , and it requires > reformatting). > > Christian from XPS was kind to inform me that : > > >>... PLANdbAffy annotation columns have nothing to do with the > Affymetrix >>annotation columns. Thus xps will not read these annotation files. > >>Alternative annotation files must contain exactly the same columns as > >>the Affymetrix annotation files. > > > >>For whole genome and exon arrays it is not possible to use only the > PGF->files w/o the annotation files, since I extract most of the > important >information from the probeset-annotation file first, so this > file is >absolutely essential. For example, column "level" contains the > information >Core/Extended/Full, see the corresponding annotation README > files for an >explanation of all columns. > > > >>xps error you get simply says that their PGF-file does not contain the >>AFFX controls, so maybe adding the AFFX controls to their PGF-file > might >help. However, as you mention, they use their own Probesetids, > which will >not match the Probesetids of the Affymetrix annotation > files, thus it may >not work anyhow. > > > >>It is not quite clear to me why they created their own PGF-file. The >>Affymetrix PGF-file contains only 1-4 probes for each probeset, where > each >exon consists of one or more probesets, thus the probability that > a probe >within a probeset is not correct should be pretty small. > However, a >probeset could be mapped to a wrong exon/gene or no gene at > all, so it >should be sufficient to correct the Affymetrix annotation > files. > > > > The tools like RMAExpress, EC., and Aroma.affymetrix, can work with > CDF only. So after using RMAExpress (in command line mode) I did get > Expression matrix out but I could not link 19532 Probeset ids to > PLANdbAffy annotation csv file to collect gene basic information. What i > did was , 1st load the full annotation file (not filtered) from > PLANdbAffy: > http://affymetrix2.bioinf.fbb.msu.ru/files.html > > and search the 2nd colum (Probe_Sets) with ids after RMA and I find 0... > then i tried the 1st column (the Probes ) and found 8664... but I would > expect vice versa situation ? > > > > So Roman can you please: > 1) advise how to get real ids after RMAExpress run? > 2) do you plan to build Annotation csv file as Affymetrix dose so that > other software from Bioconductor oligo & XPS can use it? > 3) comment on Christian feedback. > > > > Btw. Christian, how come RMAExpress, EC., and Aroma.affymetrix can work > with CDFs only and oligo & XPS require extra annotation? From what I > gather (after peaking into CDF and PGF files ) they show what probes are > belonging to probe_set. So for probe_set level analysis (or more > exon_like analysis) the PGF/CLF files alone seem to be enough? > > > > For bioc list, just to bring attention to this article & DB : > > > > PLANdbAffy: probe-level annotation database for Affymetrix expression > microarrays , Ramil N. Nurtdinov1 et al. > > http://nar.oxfordjournals.org/content/38/suppl_1/D726.full > > > > http://affymetrix2.bioinf.fbb.msu.ru/ > > > > Maybe some of bioC experts have comments about it? > > > > Best, > > Branko > > > > -------------------------- > > Branislav Misovic, > > Department of Toxicogenetics > > Leiden University Medical Center > > Einthovenweg 20, 2333 ZC Leiden > > PO.box 9600, Building2,Room:T3-11 > > 2300 RC Leiden > > The Netherlands > > Phone: +31 71 526 9636 > > Mob: 0653135855 > > E-mail: > > b.misovic@lumc.nl > > braniti@gmail.com > > > > [[alternative HTML version deleted]]

ADD COMMENT • link 14.0 years ago branislav misovic ▴ 120

Login before adding your answer.