Analysis and annotation (full) of Affymetrix Mouse Exon 1.0 ST arrays
1
0
Entering edit mode
@andreas-heider-4538
Last seen 9.8 years ago
Dear mailing list, I know this was on the list couple of times, and I think I read it all, but actually I still don't get it right. So here is my problem: I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene 1.0 ST) in a similar fashion to eg. HG-U133 arrays. That means, I want to finally have it accessible as an ExpressionSet object with a right Bioconductor annotation assigned. This should include GENE SYMBOLS, RefSeq IDs and ENTREZ IDs. I can import it as a AffyBatch and generate an ExpressionSet with the help of the Xmap/exonmap supplied CDF, but there is no annotation attached to it. OR I can import the CEL files with the "oligo" package as a Exon Array object and generate an ExpressionSet from it. However in that case it still have no annotation. Surprisingly on the Bioconductor website there are all packages needed to deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse Exon 1.0 ST arrays seems missing! What am I doing wrong here? Has someone else had such problems? Thanks in advance for your effort, Andreas [[alternative HTML version deleted]]
Annotation cdf Annotation cdf • 4.0k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States
Hi Andreas, On 6/13/2012 3:14 AM, Andreas Heider wrote: > Dear mailing list, > I know this was on the list couple of times, and I think I read it all, but > actually I still don't get it right. So here is my problem: > > I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene 1.0 > ST) in a similar fashion to eg. HG-U133 arrays. > That means, I want to finally have it accessible as an ExpressionSet object > with a right Bioconductor annotation assigned. This should include GENE > SYMBOLS, RefSeq IDs and ENTREZ IDs. The problem here is that you want to do something that AFAIK isn't easy to do. The Gene ST arrays allow you to summarize all the probes that interrogate a particular transcript (e.g., all the exon-level probesets are collapsed to transcript level, and then you summarize). However, for the Exon ST arrays that isn't the case, unless there is something in xps to allow for that - I know next to nothing about that package, so Cristian Stratowa will have to chime in if I am missing something. For the Exon chips, you are always summarizing at the same probeset level, where there are <= 4 probes per probeset, and there can be any number of probesets that interrogate a given exon. Lots of these probesets interrogate regions that aren't even transcribed, according to current knowledge of the genome. When you choose core, extended or full probesets, you are just changing the number of probesets being used, not summarizing at a different level as with the Gene ST chip. So when you say you want gene symbols, refseq ids and gene ids, what exactly are you after? If a given probeset is in the intron of a gene do you want to annotate it as being part of that gene? How about if it is in the UTR (or really close to the UTR)? What do you want to do with the probesets where one or more of the probes binds in multiple positions in the genome? These are all questions that the exonmap package tries to consider, and it gets really complicated. That's why Affy went with the Gene ST chips - they unleashed the Exon chips on us and couldn't sell them because people were saying WTF do I do with this thing? I don't think there is an easy or obvious answer to your question. If you were to come up with what you think are reasonable answers to my questions, then it wouldn't be much work to extract the chr, start, end from the pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., findOverlaps()) to decide what regions are being interrogated, and annotate from there. Best, Jim > > I can import it as a AffyBatch and generate an ExpressionSet with the help > of the Xmap/exonmap supplied CDF, but there is no annotation attached to it. > > OR > > I can import the CEL files with the "oligo" package as a Exon Array object > and generate an ExpressionSet from it. > However in that case it still have no annotation. > > Surprisingly on the Bioconductor website there are all packages needed to > deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse > Exon 1.0 ST arrays seems missing! > > What am I doing wrong here? Has someone else had such problems? > > Thanks in advance for your effort, > Andreas > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Dear Andreas, As Jim already mentioned, package xps is able to preprocess MoExon 1.0 ST arrays at the probeset and the gene level, see also my earlier reply to a similar question: https://www.stat.math.ethz.ch/pipermail/bioconductor/2012-June/045958. html Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 6/13/12 4:47 PM, James W. MacDonald wrote: > Hi Andreas, > > On 6/13/2012 3:14 AM, Andreas Heider wrote: >> Dear mailing list, >> I know this was on the list couple of times, and I think I read it >> all, but >> actually I still don't get it right. So here is my problem: >> >> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse >> Gene 1.0 >> ST) in a similar fashion to eg. HG-U133 arrays. >> That means, I want to finally have it accessible as an ExpressionSet >> object >> with a right Bioconductor annotation assigned. This should include GENE >> SYMBOLS, RefSeq IDs and ENTREZ IDs. > > The problem here is that you want to do something that AFAIK isn't easy > to do. The Gene ST arrays allow you to summarize all the probes that > interrogate a particular transcript (e.g., all the exon-level probesets > are collapsed to transcript level, and then you summarize). However, for > the Exon ST arrays that isn't the case, unless there is something in xps > to allow for that - I know next to nothing about that package, so > Cristian Stratowa will have to chime in if I am missing something. > > For the Exon chips, you are always summarizing at the same probeset > level, where there are <= 4 probes per probeset, and there can be any > number of probesets that interrogate a given exon. Lots of these > probesets interrogate regions that aren't even transcribed, according to > current knowledge of the genome. When you choose core, extended or full > probesets, you are just changing the number of probesets being used, not > summarizing at a different level as with the Gene ST chip. > > So when you say you want gene symbols, refseq ids and gene ids, what > exactly are you after? If a given probeset is in the intron of a gene do > you want to annotate it as being part of that gene? How about if it is > in the UTR (or really close to the UTR)? What do you want to do with the > probesets where one or more of the probes binds in multiple positions in > the genome? These are all questions that the exonmap package tries to > consider, and it gets really complicated. That's why Affy went with the > Gene ST chips - they unleashed the Exon chips on us and couldn't sell > them because people were saying WTF do I do with this thing? > > I don't think there is an easy or obvious answer to your question. If > you were to come up with what you think are reasonable answers to my > questions, then it wouldn't be much work to extract the chr, start, end > from the pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., > findOverlaps()) to decide what regions are being interrogated, and > annotate from there. > > Best, > > Jim > > >> >> I can import it as a AffyBatch and generate an ExpressionSet with the >> help >> of the Xmap/exonmap supplied CDF, but there is no annotation attached >> to it. >> >> OR >> >> I can import the CEL files with the "oligo" package as a Exon Array >> object >> and generate an ExpressionSet from it. >> However in that case it still have no annotation. >> >> Surprisingly on the Bioconductor website there are all packages needed to >> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse >> Exon 1.0 ST arrays seems missing! >> >> What am I doing wrong here? Has someone else had such problems? >> >> Thanks in advance for your effort, >> Andreas >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Yes, you are right! rma(target=()) can be used to collapse to transcript or probeset level. However, the problem is still there, as I a left with a nice ExpressionSet obejct that has values mapped to transcripts (if I decide so) but they are only annotated by something like 4701234. That is a probeset/transcript name for example. Now that wouldn'T be a problem given that normally such an identifier could be easily translated via Bioconductors annotation packages. But here comes the most significant part: There is no annotation package available that includes MoEx 1.0 ST identifiers! I am trying to get my package to work on these Exon arrays. And the package expects a proper annotation package such as eg. "mouse4302" to be attached to the annotation slot of the ExpressionSet. I'm still puzzled. 2012/6/13 cstrato <cstrato@aon.at> > Dear Andreas, > > As Jim already mentioned, package xps is able to preprocess MoExon 1.0 ST > arrays at the probeset and the gene level, see also my earlier reply to a > similar question: > https://www.stat.math.ethz.ch/**pipermail/bioconductor/2012-** > June/045958.html<https: www.stat.math.ethz.ch="" pipermail="" bioconducto="" r="" 2012-june="" 045958.html=""> > > Best regards > Christian > _._._._._._._._._._._._._._._.**_._._ > C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a > V.i.e.n.n.a A.u.s.t.r.i.a > e.m.a.i.l: cstrato at aon.at > _._._._._._._._._._._._._._._.**_._._ > > > > > On 6/13/12 4:47 PM, James W. MacDonald wrote: > >> Hi Andreas, >> >> On 6/13/2012 3:14 AM, Andreas Heider wrote: >> >>> Dear mailing list, >>> I know this was on the list couple of times, and I think I read it >>> all, but >>> actually I still don't get it right. So here is my problem: >>> >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse >>> Gene 1.0 >>> ST) in a similar fashion to eg. HG-U133 arrays. >>> That means, I want to finally have it accessible as an ExpressionSet >>> object >>> with a right Bioconductor annotation assigned. This should include GENE >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. >>> >> >> The problem here is that you want to do something that AFAIK isn't easy >> to do. The Gene ST arrays allow you to summarize all the probes that >> interrogate a particular transcript (e.g., all the exon-level probesets >> are collapsed to transcript level, and then you summarize). However, for >> the Exon ST arrays that isn't the case, unless there is something in xps >> to allow for that - I know next to nothing about that package, so >> Cristian Stratowa will have to chime in if I am missing something. >> >> For the Exon chips, you are always summarizing at the same probeset >> level, where there are <= 4 probes per probeset, and there can be any >> number of probesets that interrogate a given exon. Lots of these >> probesets interrogate regions that aren't even transcribed, according to >> current knowledge of the genome. When you choose core, extended or full >> probesets, you are just changing the number of probesets being used, not >> summarizing at a different level as with the Gene ST chip. >> >> So when you say you want gene symbols, refseq ids and gene ids, what >> exactly are you after? If a given probeset is in the intron of a gene do >> you want to annotate it as being part of that gene? How about if it is >> in the UTR (or really close to the UTR)? What do you want to do with the >> probesets where one or more of the probes binds in multiple positions in >> the genome? These are all questions that the exonmap package tries to >> consider, and it gets really complicated. That's why Affy went with the >> Gene ST chips - they unleashed the Exon chips on us and couldn't sell >> them because people were saying WTF do I do with this thing? >> >> I don't think there is an easy or obvious answer to your question. If >> you were to come up with what you think are reasonable answers to my >> questions, then it wouldn't be much work to extract the chr, start, end >> from the pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., >> findOverlaps()) to decide what regions are being interrogated, and >> annotate from there. >> >> Best, >> >> Jim >> >> >> >>> I can import it as a AffyBatch and generate an ExpressionSet with the >>> help >>> of the Xmap/exonmap supplied CDF, but there is no annotation attached >>> to it. >>> >>> OR >>> >>> I can import the CEL files with the "oligo" package as a Exon Array >>> object >>> and generate an ExpressionSet from it. >>> However in that case it still have no annotation. >>> >>> Surprisingly on the Bioconductor website there are all packages needed to >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse >>> Exon 1.0 ST arrays seems missing! >>> >>> What am I doing wrong here? Has someone else had such problems? >>> >>> Thanks in advance for your effort, >>> Andreas >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >> >> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear Andreas, Please note that I talk only about package xps, which does contain it's own annotation, based on the Affymetrix annotation files, in this case on files "MoEx-1_0-st-v1.na32.mm9.probeset.csv" and "MoEx-1_0-st-v1.na32.mm9.transcript.csv", respectively. Thus with xps you can do rma() on the trancript level and get the transcript annotation. Package xps creates first a "scheme" file (see e.g. script "script4schemes.R") which contains the Affymetrix annotation files for probesets and transcripts, including the MoEx 1.0 ST identifiers. Best regards Christian On 6/13/12 7:47 PM, Andreas Heider wrote: > Yes, you are right! > rma(target=()) can be used to collapse to transcript or probeset level. > However, the problem is still there, as I a left with a nice > ExpressionSet obejct that has values mapped to transcripts (if I decide > so) but they are only annotated by something like 4701234. That is a > probeset/transcript name for example. Now that wouldn'T be a problem > given that normally such an identifier could be easily translated via > Bioconductors annotation packages. > > But here comes the most significant part: There is no annotation package > available that includes MoEx 1.0 ST identifiers! > > I am trying to get my package to work on these Exon arrays. And the > package expects a proper annotation package such as eg. "mouse4302" to > be attached to the annotation slot of the ExpressionSet. > > I'm still puzzled. > > 2012/6/13 cstrato <cstrato at="" aon.at="" <mailto:cstrato="" at="" aon.at="">> > > Dear Andreas, > > As Jim already mentioned, package xps is able to preprocess MoExon > 1.0 ST arrays at the probeset and the gene level, see also my > earlier reply to a similar question: > https://www.stat.math.ethz.ch/__pipermail/bioconductor/2012-__Ju ne/045958.html > <https: www.stat.math.ethz.ch="" pipermail="" bioconductor="" 2012-june="" 045958.html=""> > > Best regards > Christian > _._._._._._._._._._._._._._._.___._._ > C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a > V.i.e.n.n.a A.u.s.t.r.i.a > e.m.a.i.l: cstrato at aon.at <http: aon.at=""> > _._._._._._._._._._._._._._._.___._._ > > > > > On 6/13/12 4:47 PM, James W. MacDonald wrote: > > Hi Andreas, > > On 6/13/2012 3:14 AM, Andreas Heider wrote: > > Dear mailing list, > I know this was on the list couple of times, and I think I > read it > all, but > actually I still don't get it right. So here is my problem: > > I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT > Mouse > Gene 1.0 > ST) in a similar fashion to eg. HG-U133 arrays. > That means, I want to finally have it accessible as an > ExpressionSet > object > with a right Bioconductor annotation assigned. This should > include GENE > SYMBOLS, RefSeq IDs and ENTREZ IDs. > > > The problem here is that you want to do something that AFAIK > isn't easy > to do. The Gene ST arrays allow you to summarize all the probes that > interrogate a particular transcript (e.g., all the exon- level > probesets > are collapsed to transcript level, and then you summarize). > However, for > the Exon ST arrays that isn't the case, unless there is > something in xps > to allow for that - I know next to nothing about that package, so > Cristian Stratowa will have to chime in if I am missing something. > > For the Exon chips, you are always summarizing at the same probeset > level, where there are <= 4 probes per probeset, and there can > be any > number of probesets that interrogate a given exon. Lots of these > probesets interrogate regions that aren't even transcribed, > according to > current knowledge of the genome. When you choose core, extended > or full > probesets, you are just changing the number of probesets being > used, not > summarizing at a different level as with the Gene ST chip. > > So when you say you want gene symbols, refseq ids and gene ids, what > exactly are you after? If a given probeset is in the intron of a > gene do > you want to annotate it as being part of that gene? How about if > it is > in the UTR (or really close to the UTR)? What do you want to do > with the > probesets where one or more of the probes binds in multiple > positions in > the genome? These are all questions that the exonmap package > tries to > consider, and it gets really complicated. That's why Affy went > with the > Gene ST chips - they unleashed the Exon chips on us and couldn't > sell > them because people were saying WTF do I do with this thing? > > I don't think there is an easy or obvious answer to your > question. If > you were to come up with what you think are reasonable answers to my > questions, then it wouldn't be much work to extract the chr, > start, end > from the pd.moex.1.0.st.v1 package, and then use GenomicFeatures > (e.g., > findOverlaps()) to decide what regions are being interrogated, and > annotate from there. > > Best, > > Jim > > > > I can import it as a AffyBatch and generate an ExpressionSet > with the > help > of the Xmap/exonmap supplied CDF, but there is no annotation > attached > to it. > > OR > > I can import the CEL files with the "oligo" package as a > Exon Array > object > and generate an ExpressionSet from it. > However in that case it still have no annotation. > > Surprisingly on the Bioconductor website there are all > packages needed to > deal with Mouse Gene 1.0 ST arrays but the informtion to > work with Mouse > Exon 1.0 ST arrays seems missing! > > What am I doing wrong here? Has someone else had such problems? > > Thanks in advance for your effort, > Andreas > > [[alternative HTML version deleted]] > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > >
ADD REPLY
0
Entering edit mode
FWIW, remember that you can obtain the contents of the annotation files (the NA32 Affymetrix files) with: library(Biobase) library(oligo) raw = read.celfiles(list.celfiles()) eset = rma(raw, target='transcript') featureData(eset) = getNetAffx(eset, 'transcript') head(fData(eset)) b On 13 June 2012 15:47, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Andreas, > > > On 6/13/2012 3:14 AM, Andreas Heider wrote: >> >> Dear mailing list, >> I know this was on the list couple of times, and I think I read it all, >> but >> actually I still don't get it right. So here is my problem: >> >> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene >> 1.0 >> ST) in a similar fashion to eg. HG-U133 arrays. >> That means, I want to finally have it accessible as an ExpressionSet >> object >> with a right Bioconductor annotation assigned. This should include GENE >> SYMBOLS, RefSeq IDs and ENTREZ IDs. > > > The problem here is that you want to do something that AFAIK isn't easy to > do. The Gene ST arrays allow you to summarize all the probes that > interrogate a particular transcript (e.g., all the exon-level probesets are > collapsed to transcript level, and then you summarize). However, for the > Exon ST arrays that isn't the case, unless there is something in xps to > allow for that - I know next to nothing about that package, so Cristian > Stratowa will have to chime in if I am missing something. > > For the Exon chips, you are always summarizing at the same probeset level, > where there are <= 4 probes per probeset, and there can be any number of > probesets that interrogate a given exon. Lots of these probesets interrogate > regions that aren't even transcribed, according to current knowledge of the > genome. When you choose core, extended or full probesets, you are just > changing the number of probesets being used, not summarizing at a different > level as with the Gene ST chip. > > So when you say you want gene symbols, refseq ids and gene ids, what exactly > are you after? If a given probeset is in the intron of a gene do you want to > annotate it as being part of that gene? How about if it is in the UTR (or > really close to the UTR)? What do you want to do with the probesets where > one or more of the probes binds in multiple positions in the genome? These > are all questions that the exonmap package tries to consider, and it gets > really complicated. That's why Affy went with the Gene ST chips - they > unleashed the Exon chips on us and couldn't sell them because people were > saying WTF do I do with this thing? > > I don't think there is an easy or obvious answer to your question. If you > were to come up with what you think are reasonable answers to my questions, > then it wouldn't be much work to extract the chr, start, end from the > pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., > ?findOverlaps()) to decide what regions are being interrogated, and annotate > from there. > > Best, > > Jim > > > >> >> I can import it as a AffyBatch and generate an ExpressionSet with the help >> of the Xmap/exonmap supplied CDF, but there is no annotation attached to >> it. >> >> OR >> >> I can import the CEL files with the "oligo" package as a Exon Array object >> and generate an ExpressionSet from it. >> However in that case it still have no annotation. >> >> Surprisingly on the Bioconductor website there are all packages needed to >> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse >> Exon 1.0 ST arrays seems missing! >> >> What am I doing wrong here? Has someone else had such problems? >> >> Thanks in advance for your effort, >> Andreas >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
please correct the code below to: eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is available) and if you want results at the exon level eset = rma(raw, target='probeset') featureData(eset) = getNetAffx(raw, 'probeset') apologies for the mistake below. b On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: > FWIW, remember that you can obtain the contents of the annotation > files (the NA32 Affymetrix files) with: > > library(Biobase) > library(oligo) > raw = read.celfiles(list.celfiles()) > eset = rma(raw, target='transcript') > featureData(eset) = getNetAffx(eset, 'transcript') > head(fData(eset)) > > b > > On 13 June 2012 15:47, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >> Hi Andreas, >> >> >> On 6/13/2012 3:14 AM, Andreas Heider wrote: >>> >>> Dear mailing list, >>> I know this was on the list couple of times, and I think I read it all, >>> but >>> actually I still don't get it right. So here is my problem: >>> >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene >>> 1.0 >>> ST) in a similar fashion to eg. HG-U133 arrays. >>> That means, I want to finally have it accessible as an ExpressionSet >>> object >>> with a right Bioconductor annotation assigned. This should include GENE >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. >> >> >> The problem here is that you want to do something that AFAIK isn't easy to >> do. The Gene ST arrays allow you to summarize all the probes that >> interrogate a particular transcript (e.g., all the exon-level probesets are >> collapsed to transcript level, and then you summarize). However, for the >> Exon ST arrays that isn't the case, unless there is something in xps to >> allow for that - I know next to nothing about that package, so Cristian >> Stratowa will have to chime in if I am missing something. >> >> For the Exon chips, you are always summarizing at the same probeset level, >> where there are <= 4 probes per probeset, and there can be any number of >> probesets that interrogate a given exon. Lots of these probesets interrogate >> regions that aren't even transcribed, according to current knowledge of the >> genome. When you choose core, extended or full probesets, you are just >> changing the number of probesets being used, not summarizing at a different >> level as with the Gene ST chip. >> >> So when you say you want gene symbols, refseq ids and gene ids, what exactly >> are you after? If a given probeset is in the intron of a gene do you want to >> annotate it as being part of that gene? How about if it is in the UTR (or >> really close to the UTR)? What do you want to do with the probesets where >> one or more of the probes binds in multiple positions in the genome? These >> are all questions that the exonmap package tries to consider, and it gets >> really complicated. That's why Affy went with the Gene ST chips - they >> unleashed the Exon chips on us and couldn't sell them because people were >> saying WTF do I do with this thing? >> >> I don't think there is an easy or obvious answer to your question. If you >> were to come up with what you think are reasonable answers to my questions, >> then it wouldn't be much work to extract the chr, start, end from the >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., >> ?findOverlaps()) to decide what regions are being interrogated, and annotate >> from there. >> >> Best, >> >> Jim >> >> >> >>> >>> I can import it as a AffyBatch and generate an ExpressionSet with the help >>> of the Xmap/exonmap supplied CDF, but there is no annotation attached to >>> it. >>> >>> OR >>> >>> I can import the CEL files with the "oligo" package as a Exon Array object >>> and generate an ExpressionSet from it. >>> However in that case it still have no annotation. >>> >>> Surprisingly on the Bioconductor website there are all packages needed to >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse >>> Exon 1.0 ST arrays seems missing! >>> >>> What am I doing wrong here? Has someone else had such problems? >>> >>> Thanks in advance for your effort, >>> Andreas >>> >>> ? ? ? ?[[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi, I wasn't sure if this was worth starting a new thread for this, since my question is very much related to this thread... Is there any plan to include the "comprehensive" exon array mappings? E.g. for rat: If one goes here http://www.affymetrix.com/estore/browse/products.jsp?productId=131489& categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 Then to Technical Documentation tab And downloads the "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full, extended and comprehensive rn4" data http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx- 1_0-st-v1.r2.dt1.rn4.ps.zip There are the core/extended/full ps and mps files here. However there is also a comprehensive mps file. Full, core and extended are from 2006. The comprehensive is from 2010 (and gets updated more regularly), so perhaps would be a better file to use for getNetAffx ? Apologies if this has been covered before. I am never sure of what is the best way to analyse exon array data at the gene level. Thanks, Jim On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: > > please correct the code below to: > > eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is available) > > and if you want results at the exon level > > eset = rma(raw, target='probeset') > featureData(eset) = getNetAffx(raw, 'probeset') > > apologies for the mistake below. > > b > > On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: > > FWIW, remember that you can obtain the contents of the annotation > > files (the NA32 Affymetrix files) with: > > > > library(Biobase) > > library(oligo) > > raw = read.celfiles(list.celfiles()) > > eset = rma(raw, target='transcript') > > featureData(eset) = getNetAffx(eset, 'transcript') > > head(fData(eset)) > > > > b > > > > On 13 June 2012 15:47, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > >> Hi Andreas, > >> > >> > >> On 6/13/2012 3:14 AM, Andreas Heider wrote: > >>> > >>> Dear mailing list, > >>> I know this was on the list couple of times, and I think I read it all, > >>> but > >>> actually I still don't get it right. So here is my problem: > >>> > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene > >>> 1.0 > >>> ST) in a similar fashion to eg. HG-U133 arrays. > >>> That means, I want to finally have it accessible as an ExpressionSet > >>> object > >>> with a right Bioconductor annotation assigned. This should include GENE > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. > >> > >> > >> The problem here is that you want to do something that AFAIK isn't easy to > >> do. The Gene ST arrays allow you to summarize all the probes that > >> interrogate a particular transcript (e.g., all the exon-level probesets are > >> collapsed to transcript level, and then you summarize). However, for the > >> Exon ST arrays that isn't the case, unless there is something in xps to > >> allow for that - I know next to nothing about that package, so Cristian > >> Stratowa will have to chime in if I am missing something. > >> > >> For the Exon chips, you are always summarizing at the same probeset level, > >> where there are <= 4 probes per probeset, and there can be any number of > >> probesets that interrogate a given exon. Lots of these probesets interrogate > >> regions that aren't even transcribed, according to current knowledge of the > >> genome. When you choose core, extended or full probesets, you are just > >> changing the number of probesets being used, not summarizing at a different > >> level as with the Gene ST chip. > >> > >> So when you say you want gene symbols, refseq ids and gene ids, what exactly > >> are you after? If a given probeset is in the intron of a gene do you want to > >> annotate it as being part of that gene? How about if it is in the UTR (or > >> really close to the UTR)? What do you want to do with the probesets where > >> one or more of the probes binds in multiple positions in the genome? These > >> are all questions that the exonmap package tries to consider, and it gets > >> really complicated. That's why Affy went with the Gene ST chips - they > >> unleashed the Exon chips on us and couldn't sell them because people were > >> saying WTF do I do with this thing? > >> > >> I don't think there is an easy or obvious answer to your question. If you > >> were to come up with what you think are reasonable answers to my questions, > >> then it wouldn't be much work to extract the chr, start, end from the > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., > >> ?findOverlaps()) to decide what regions are being interrogated, and annotate > >> from there. > >> > >> Best, > >> > >> Jim > >> > >> > >> > >>> > >>> I can import it as a AffyBatch and generate an ExpressionSet with the help > >>> of the Xmap/exonmap supplied CDF, but there is no annotation attached to > >>> it. > >>> > >>> OR > >>> > >>> I can import the CEL files with the "oligo" package as a Exon Array object > >>> and generate an ExpressionSet from it. > >>> However in that case it still have no annotation. > >>> > >>> Surprisingly on the Bioconductor website there are all packages needed to > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse > >>> Exon 1.0 ST arrays seems missing! > >>> > >>> What am I doing wrong here? Has someone else had such problems? > >>> > >>> Thanks in advance for your effort, > >>> Andreas > >>> > >>> ? ? ? ?[[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > >> -- > >> James W. MacDonald, M.S. > >> Biostatistician > >> University of Washington > >> Environmental and Occupational Health Sciences > >> 4225 Roosevelt Way NE, # 100 > >> Seattle WA 98105-6099 > >> > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Dear Jim, I pulled all relevant annotation via biomaRt, as biomart was all mappings of exon array probeset IDs to eg ENTREZID or GENESYMBOL. Than you can go on from that. Cheers, Andreas 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > Hi, > > I wasn't sure if this was worth starting a new thread for this, since > my question is very much related to this thread... > > Is there any plan to include the "comprehensive" exon array mappings? > > E.g. for rat: > > If one goes here > > > http://www.affymetrix.com/estore/browse/products.jsp?productId=13148 9&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 > > Then to Technical Documentation tab > > And downloads the > > "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full, > extended and comprehensive rn4" data > > > http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx- 1_0-st-v1.r2.dt1.rn4.ps.zip > > There are the core/extended/full ps and mps files here. > > However there is also a comprehensive mps file. > > Full, core and extended are from 2006. > > The comprehensive is from 2010 (and gets updated more regularly), so > perhaps would be a better file to use for getNetAffx ? > > Apologies if this has been covered before. I am never sure of what is > the best way to analyse exon array data at the gene level. > > Thanks, > > Jim > > > > > On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho@gmail.com> > wrote: > > > > please correct the code below to: > > > > eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is > available) > > > > and if you want results at the exon level > > > > eset = rma(raw, target='probeset') > > featureData(eset) = getNetAffx(raw, 'probeset') > > > > apologies for the mistake below. > > > > b > > > > On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho@gmail.com> > wrote: > > > FWIW, remember that you can obtain the contents of the annotation > > > files (the NA32 Affymetrix files) with: > > > > > > library(Biobase) > > > library(oligo) > > > raw = read.celfiles(list.celfiles()) > > > eset = rma(raw, target='transcript') > > > featureData(eset) = getNetAffx(eset, 'transcript') > > > head(fData(eset)) > > > > > > b > > > > > > On 13 June 2012 15:47, James W. MacDonald <jmacdon@uw.edu> wrote: > > >> Hi Andreas, > > >> > > >> > > >> On 6/13/2012 3:14 AM, Andreas Heider wrote: > > >>> > > >>> Dear mailing list, > > >>> I know this was on the list couple of times, and I think I read it > all, > > >>> but > > >>> actually I still don't get it right. So here is my problem: > > >>> > > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse > Gene > > >>> 1.0 > > >>> ST) in a similar fashion to eg. HG-U133 arrays. > > >>> That means, I want to finally have it accessible as an ExpressionSet > > >>> object > > >>> with a right Bioconductor annotation assigned. This should include > GENE > > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. > > >> > > >> > > >> The problem here is that you want to do something that AFAIK isn't > easy to > > >> do. The Gene ST arrays allow you to summarize all the probes that > > >> interrogate a particular transcript (e.g., all the exon-level > probesets are > > >> collapsed to transcript level, and then you summarize). However, for > the > > >> Exon ST arrays that isn't the case, unless there is something in xps > to > > >> allow for that - I know next to nothing about that package, so > Cristian > > >> Stratowa will have to chime in if I am missing something. > > >> > > >> For the Exon chips, you are always summarizing at the same probeset > level, > > >> where there are <= 4 probes per probeset, and there can be any number > of > > >> probesets that interrogate a given exon. Lots of these probesets > interrogate > > >> regions that aren't even transcribed, according to current knowledge > of the > > >> genome. When you choose core, extended or full probesets, you are just > > >> changing the number of probesets being used, not summarizing at a > different > > >> level as with the Gene ST chip. > > >> > > >> So when you say you want gene symbols, refseq ids and gene ids, what > exactly > > >> are you after? If a given probeset is in the intron of a gene do you > want to > > >> annotate it as being part of that gene? How about if it is in the UTR > (or > > >> really close to the UTR)? What do you want to do with the probesets > where > > >> one or more of the probes binds in multiple positions in the genome? > These > > >> are all questions that the exonmap package tries to consider, and it > gets > > >> really complicated. That's why Affy went with the Gene ST chips - they > > >> unleashed the Exon chips on us and couldn't sell them because people > were > > >> saying WTF do I do with this thing? > > >> > > >> I don't think there is an easy or obvious answer to your question. If > you > > >> were to come up with what you think are reasonable answers to my > questions, > > >> then it wouldn't be much work to extract the chr, start, end from the > > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., > > >> findOverlaps()) to decide what regions are being interrogated, and > annotate > > >> from there. > > >> > > >> Best, > > >> > > >> Jim > > >> > > >> > > >> > > >>> > > >>> I can import it as a AffyBatch and generate an ExpressionSet with > the help > > >>> of the Xmap/exonmap supplied CDF, but there is no annotation > attached to > > >>> it. > > >>> > > >>> OR > > >>> > > >>> I can import the CEL files with the "oligo" package as a Exon Array > object > > >>> and generate an ExpressionSet from it. > > >>> However in that case it still have no annotation. > > >>> > > >>> Surprisingly on the Bioconductor website there are all packages > needed to > > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with > Mouse > > >>> Exon 1.0 ST arrays seems missing! > > >>> > > >>> What am I doing wrong here? Has someone else had such problems? > > >>> > > >>> Thanks in advance for your effort, > > >>> Andreas > > >>> > > >>> [[alternative HTML version deleted]] > > >>> > > >>> _______________________________________________ > > >>> Bioconductor mailing list > > >>> Bioconductor@r-project.org > > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > > >>> Search the archives: > > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > > >> > > >> > > >> -- > > >> James W. MacDonald, M.S. > > >> Biostatistician > > >> University of Washington > > >> Environmental and Occupational Health Sciences > > >> 4225 Roosevelt Way NE, # 100 > > >> Seattle WA 98105-6099 > > >> > > >> > > >> _______________________________________________ > > >> Bioconductor mailing list > > >> Bioconductor@r-project.org > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > >> Search the archives: > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thanks for the pointer Andreas, How did you go from probe sets for a given gene to the transcript level? And how did you know if it was "core", "extended", "full" confidence? Also, how did you summarise the probeset expression levels to make a transcript? Using biomart I get ~25k unique ensembl genes mapping to probe set ids, which is much higher than when I follow the oligo pipeline and perform RMA at core/extended/full level, and use getAffx for annotation. Thanks, Jim On 27 June 2012 16:03, Andreas Heider <aheider at="" trm.uni-leipzig.de=""> wrote: > Dear Jim, > I pulled all relevant annotation via biomaRt, as biomart was all mappings of > exon array probeset IDs to eg ENTREZID or GENESYMBOL. Than you can go on > from that. > > Cheers, > Andreas > > > 2012/6/27 James Perkins <jperkins at="" biochem.ucl.ac.uk=""> >> >> Hi, >> >> I wasn't sure if this was worth starting a new thread for this, since >> my question is very much related to this thread... >> >> Is there any plan to include the "comprehensive" exon array mappings? >> >> E.g. for rat: >> >> If one goes here >> >> >> http://www.affymetrix.com/estore/browse/products.jsp?productId=1314 89&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 >> >> Then to Technical Documentation tab >> >> And downloads the >> >> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full, >> extended and comprehensive rn4" data >> >> >> http://www.affymetrix.com/Auth/support/downloads/library_files /RaEx-1_0-st-v1.r2.dt1.rn4.ps.zip >> >> There are the core/extended/full ps and mps files here. >> >> However there is also a comprehensive mps file. >> >> Full, core and extended are from 2006. >> >> The comprehensive is from 2010 (and gets updated more regularly), so >> perhaps would be a better file to use for getNetAffx ? >> >> Apologies if this has been covered before. I am never sure of what is >> the best way to analyse exon array data at the gene level. >> >> Thanks, >> >> Jim >> >> >> >> >> On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> >> wrote: >> > >> > please correct the code below to: >> > >> > eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is >> > available) >> > >> > and if you want results at the exon level >> > >> > eset = rma(raw, target='probeset') >> > featureData(eset) = getNetAffx(raw, 'probeset') >> > >> > apologies for the mistake below. >> > >> > b >> > >> > On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> >> > wrote: >> > > FWIW, remember that you can obtain the contents of the annotation >> > > files (the NA32 Affymetrix files) with: >> > > >> > > library(Biobase) >> > > library(oligo) >> > > raw = read.celfiles(list.celfiles()) >> > > eset = rma(raw, target='transcript') >> > > featureData(eset) = getNetAffx(eset, 'transcript') >> > > head(fData(eset)) >> > > >> > > b >> > > >> > > On 13 June 2012 15:47, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >> > >> Hi Andreas, >> > >> >> > >> >> > >> On 6/13/2012 3:14 AM, Andreas Heider wrote: >> > >>> >> > >>> Dear mailing list, >> > >>> I know this was on the list couple of times, and I think I read it >> > >>> all, >> > >>> but >> > >>> actually I still don't get it right. So here is my problem: >> > >>> >> > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse >> > >>> Gene >> > >>> 1.0 >> > >>> ST) in a similar fashion to eg. HG-U133 arrays. >> > >>> That means, I want to finally have it accessible as an ExpressionSet >> > >>> object >> > >>> with a right Bioconductor annotation assigned. This should include >> > >>> GENE >> > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. >> > >> >> > >> >> > >> The problem here is that you want to do something that AFAIK isn't >> > >> easy to >> > >> do. The Gene ST arrays allow you to summarize all the probes that >> > >> interrogate a particular transcript (e.g., all the exon-level >> > >> probesets are >> > >> collapsed to transcript level, and then you summarize). However, for >> > >> the >> > >> Exon ST arrays that isn't the case, unless there is something in xps >> > >> to >> > >> allow for that - I know next to nothing about that package, so >> > >> Cristian >> > >> Stratowa will have to chime in if I am missing something. >> > >> >> > >> For the Exon chips, you are always summarizing at the same probeset >> > >> level, >> > >> where there are <= 4 probes per probeset, and there can be any number >> > >> of >> > >> probesets that interrogate a given exon. Lots of these probesets >> > >> interrogate >> > >> regions that aren't even transcribed, according to current knowledge >> > >> of the >> > >> genome. When you choose core, extended or full probesets, you are >> > >> just >> > >> changing the number of probesets being used, not summarizing at a >> > >> different >> > >> level as with the Gene ST chip. >> > >> >> > >> So when you say you want gene symbols, refseq ids and gene ids, what >> > >> exactly >> > >> are you after? If a given probeset is in the intron of a gene do you >> > >> want to >> > >> annotate it as being part of that gene? How about if it is in the UTR >> > >> (or >> > >> really close to the UTR)? What do you want to do with the probesets >> > >> where >> > >> one or more of the probes binds in multiple positions in the genome? >> > >> These >> > >> are all questions that the exonmap package tries to consider, and it >> > >> gets >> > >> really complicated. That's why Affy went with the Gene ST chips - >> > >> they >> > >> unleashed the Exon chips on us and couldn't sell them because people >> > >> were >> > >> saying WTF do I do with this thing? >> > >> >> > >> I don't think there is an easy or obvious answer to your question. If >> > >> you >> > >> were to come up with what you think are reasonable answers to my >> > >> questions, >> > >> then it wouldn't be much work to extract the chr, start, end from the >> > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., >> > >> ?findOverlaps()) to decide what regions are being interrogated, and >> > >> annotate >> > >> from there. >> > >> >> > >> Best, >> > >> >> > >> Jim >> > >> >> > >> >> > >> >> > >>> >> > >>> I can import it as a AffyBatch and generate an ExpressionSet with >> > >>> the help >> > >>> of the Xmap/exonmap supplied CDF, but there is no annotation >> > >>> attached to >> > >>> it. >> > >>> >> > >>> OR >> > >>> >> > >>> I can import the CEL files with the "oligo" package as a Exon Array >> > >>> object >> > >>> and generate an ExpressionSet from it. >> > >>> However in that case it still have no annotation. >> > >>> >> > >>> Surprisingly on the Bioconductor website there are all packages >> > >>> needed to >> > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with >> > >>> Mouse >> > >>> Exon 1.0 ST arrays seems missing! >> > >>> >> > >>> What am I doing wrong here? Has someone else had such problems? >> > >>> >> > >>> Thanks in advance for your effort, >> > >>> Andreas >> > >>> >> > >>> ? ? ? ?[[alternative HTML version deleted]] >> > >>> >> > >>> _______________________________________________ >> > >>> Bioconductor mailing list >> > >>> Bioconductor at r-project.org >> > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> > >>> Search the archives: >> > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> >> > >> >> > >> -- >> > >> James W. MacDonald, M.S. >> > >> Biostatistician >> > >> University of Washington >> > >> Environmental and Occupational Health Sciences >> > >> 4225 Roosevelt Way NE, # 100 >> > >> Seattle WA 98105-6099 >> > >> >> > >> >> > >> _______________________________________________ >> > >> Bioconductor mailing list >> > >> Bioconductor at r-project.org >> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> > >> Search the archives: >> > >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
Ok, sorry, that was the "short answer". Here comes the longer one: 1. get a CDF for the chip, get it at http://annmap.picr.man.ac.uk/download/ 2. load CEL files using standard affy package 3. asign the downloaded CDF to your AffyBatch object 4. calculate RMA or whatever you want (NOTE: this will get you all probesets, no restrictions as in eg "core") 5. pull the whole set of identifiers from biomaRt and annotate your expression matrix with this information 6. "collapse" probesets targetting the same identifier to its mean, median or medpolish, whatever suits your needs best via functions as "recast" or "aggregate" 7. have fun with your new expression matrix! Hope that helps, I needed also some time to figure out the individual steps. 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > Thanks for the pointer Andreas, > > How did you go from probe sets for a given gene to the transcript > level? And how did you know if it was "core", "extended", "full" > confidence? > > Also, how did you summarise the probeset expression levels to make a > transcript? Using biomart I get ~25k unique ensembl genes mapping to > probe set ids, which is much higher than when I follow the oligo > pipeline and perform RMA at core/extended/full level, and use getAffx > for annotation. > > Thanks, > > Jim > > On 27 June 2012 16:03, Andreas Heider <aheider@trm.uni-leipzig.de> wrote: > > Dear Jim, > > I pulled all relevant annotation via biomaRt, as biomart was all > mappings of > > exon array probeset IDs to eg ENTREZID or GENESYMBOL. Than you can go on > > from that. > > > > Cheers, > > Andreas > > > > > > 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > >> > >> Hi, > >> > >> I wasn't sure if this was worth starting a new thread for this, since > >> my question is very much related to this thread... > >> > >> Is there any plan to include the "comprehensive" exon array mappings? > >> > >> E.g. for rat: > >> > >> If one goes here > >> > >> > >> > http://www.affymetrix.com/estore/browse/products.jsp?productId=13148 9&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 > >> > >> Then to Technical Documentation tab > >> > >> And downloads the > >> > >> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full, > >> extended and comprehensive rn4" data > >> > >> > >> > http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx- 1_0-st-v1.r2.dt1.rn4.ps.zip > >> > >> There are the core/extended/full ps and mps files here. > >> > >> However there is also a comprehensive mps file. > >> > >> Full, core and extended are from 2006. > >> > >> The comprehensive is from 2010 (and gets updated more regularly), so > >> perhaps would be a better file to use for getNetAffx ? > >> > >> Apologies if this has been covered before. I am never sure of what is > >> the best way to analyse exon array data at the gene level. > >> > >> Thanks, > >> > >> Jim > >> > >> > >> > >> > >> On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho@gmail.com> > >> wrote: > >> > > >> > please correct the code below to: > >> > > >> > eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is > >> > available) > >> > > >> > and if you want results at the exon level > >> > > >> > eset = rma(raw, target='probeset') > >> > featureData(eset) = getNetAffx(raw, 'probeset') > >> > > >> > apologies for the mistake below. > >> > > >> > b > >> > > >> > On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho@gmail.com> > >> > wrote: > >> > > FWIW, remember that you can obtain the contents of the annotation > >> > > files (the NA32 Affymetrix files) with: > >> > > > >> > > library(Biobase) > >> > > library(oligo) > >> > > raw = read.celfiles(list.celfiles()) > >> > > eset = rma(raw, target='transcript') > >> > > featureData(eset) = getNetAffx(eset, 'transcript') > >> > > head(fData(eset)) > >> > > > >> > > b > >> > > > >> > > On 13 June 2012 15:47, James W. MacDonald <jmacdon@uw.edu> wrote: > >> > >> Hi Andreas, > >> > >> > >> > >> > >> > >> On 6/13/2012 3:14 AM, Andreas Heider wrote: > >> > >>> > >> > >>> Dear mailing list, > >> > >>> I know this was on the list couple of times, and I think I read it > >> > >>> all, > >> > >>> but > >> > >>> actually I still don't get it right. So here is my problem: > >> > >>> > >> > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse > >> > >>> Gene > >> > >>> 1.0 > >> > >>> ST) in a similar fashion to eg. HG-U133 arrays. > >> > >>> That means, I want to finally have it accessible as an > ExpressionSet > >> > >>> object > >> > >>> with a right Bioconductor annotation assigned. This should include > >> > >>> GENE > >> > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. > >> > >> > >> > >> > >> > >> The problem here is that you want to do something that AFAIK isn't > >> > >> easy to > >> > >> do. The Gene ST arrays allow you to summarize all the probes that > >> > >> interrogate a particular transcript (e.g., all the exon- level > >> > >> probesets are > >> > >> collapsed to transcript level, and then you summarize). However, > for > >> > >> the > >> > >> Exon ST arrays that isn't the case, unless there is something in > xps > >> > >> to > >> > >> allow for that - I know next to nothing about that package, so > >> > >> Cristian > >> > >> Stratowa will have to chime in if I am missing something. > >> > >> > >> > >> For the Exon chips, you are always summarizing at the same probeset > >> > >> level, > >> > >> where there are <= 4 probes per probeset, and there can be any > number > >> > >> of > >> > >> probesets that interrogate a given exon. Lots of these probesets > >> > >> interrogate > >> > >> regions that aren't even transcribed, according to current > knowledge > >> > >> of the > >> > >> genome. When you choose core, extended or full probesets, you are > >> > >> just > >> > >> changing the number of probesets being used, not summarizing at a > >> > >> different > >> > >> level as with the Gene ST chip. > >> > >> > >> > >> So when you say you want gene symbols, refseq ids and gene ids, > what > >> > >> exactly > >> > >> are you after? If a given probeset is in the intron of a gene do > you > >> > >> want to > >> > >> annotate it as being part of that gene? How about if it is in the > UTR > >> > >> (or > >> > >> really close to the UTR)? What do you want to do with the probesets > >> > >> where > >> > >> one or more of the probes binds in multiple positions in the > genome? > >> > >> These > >> > >> are all questions that the exonmap package tries to consider, and > it > >> > >> gets > >> > >> really complicated. That's why Affy went with the Gene ST chips - > >> > >> they > >> > >> unleashed the Exon chips on us and couldn't sell them because > people > >> > >> were > >> > >> saying WTF do I do with this thing? > >> > >> > >> > >> I don't think there is an easy or obvious answer to your question. > If > >> > >> you > >> > >> were to come up with what you think are reasonable answers to my > >> > >> questions, > >> > >> then it wouldn't be much work to extract the chr, start, end from > the > >> > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., > >> > >> findOverlaps()) to decide what regions are being interrogated, and > >> > >> annotate > >> > >> from there. > >> > >> > >> > >> Best, > >> > >> > >> > >> Jim > >> > >> > >> > >> > >> > >> > >> > >>> > >> > >>> I can import it as a AffyBatch and generate an ExpressionSet with > >> > >>> the help > >> > >>> of the Xmap/exonmap supplied CDF, but there is no annotation > >> > >>> attached to > >> > >>> it. > >> > >>> > >> > >>> OR > >> > >>> > >> > >>> I can import the CEL files with the "oligo" package as a Exon > Array > >> > >>> object > >> > >>> and generate an ExpressionSet from it. > >> > >>> However in that case it still have no annotation. > >> > >>> > >> > >>> Surprisingly on the Bioconductor website there are all packages > >> > >>> needed to > >> > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with > >> > >>> Mouse > >> > >>> Exon 1.0 ST arrays seems missing! > >> > >>> > >> > >>> What am I doing wrong here? Has someone else had such problems? > >> > >>> > >> > >>> Thanks in advance for your effort, > >> > >>> Andreas > >> > >>> > >> > >>> [[alternative HTML version deleted]] > >> > >>> > >> > >>> _______________________________________________ > >> > >>> Bioconductor mailing list > >> > >>> Bioconductor@r-project.org > >> > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > >>> Search the archives: > >> > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > >> > >> > >> > >> -- > >> > >> James W. MacDonald, M.S. > >> > >> Biostatistician > >> > >> University of Washington > >> > >> Environmental and Occupational Health Sciences > >> > >> 4225 Roosevelt Way NE, # 100 > >> > >> Seattle WA 98105-6099 > >> > >> > >> > >> > >> > >> _______________________________________________ > >> > >> Bioconductor mailing list > >> > >> Bioconductor@r-project.org > >> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > >> Search the archives: > >> > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thanks Andreas! That's really useful information, I will have a look. Out of interest, did you look at the distribution of expression levels for the different prob-sets? If you are including all probe-sets, I would guess that if there were a lot of predicted/intronic probe sets that aren't expressed that could bias your gene-level estimation, i.e. if it the proportion is above the break-down point of the summarisation/aggregation method. Although perhaps the CDF from annmap takes care of that? Cheers! Jim On 27 June 2012 16:45, Andreas Heider <aheider at="" trm.uni-leipzig.de=""> wrote: > Ok, sorry, that was the "short answer". Here comes the longer one: > 1. get a CDF for the chip, get it at http://annmap.picr.man.ac.uk/download/ > 2. load CEL files using standard affy package > 3. asign the downloaded CDF to your AffyBatch object > 4. calculate RMA or whatever you want (NOTE: this will get you all > probesets, no restrictions as in eg "core") > 5. pull the whole set of identifiers from biomaRt and annotate your > expression matrix with this information > 6. "collapse" probesets targetting the same identifier to its mean, median > or medpolish, whatever suits your needs best via functions as "recast" or > "aggregate" > 7. have fun with your new expression matrix! > > Hope that helps, I needed also some time to figure out the individual steps. > > > 2012/6/27 James Perkins <jperkins at="" biochem.ucl.ac.uk=""> >> >> Thanks for the pointer Andreas, >> >> How did you go from probe sets for a given gene to the transcript >> level? And how did you know if it was "core", "extended", "full" >> confidence? >> >> Also, how did you summarise the probeset expression levels to make a >> transcript? Using biomart I get ~25k unique ensembl genes mapping to >> probe set ids, which is much higher than when I follow the oligo >> pipeline and perform RMA at core/extended/full level, and use getAffx >> for annotation. >> >> Thanks, >> >> Jim >> >> On 27 June 2012 16:03, Andreas Heider <aheider at="" trm.uni-="" leipzig.de=""> wrote: >> > Dear Jim, >> > I pulled all relevant annotation via biomaRt, as biomart was all >> > mappings of >> > exon array probeset IDs to eg ENTREZID or GENESYMBOL. Than you can go on >> > from that. >> > >> > Cheers, >> > Andreas >> > >> > >> > 2012/6/27 James Perkins <jperkins at="" biochem.ucl.ac.uk=""> >> >> >> >> Hi, >> >> >> >> I wasn't sure if this was worth starting a new thread for this, since >> >> my question is very much related to this thread... >> >> >> >> Is there any plan to include the "comprehensive" exon array mappings? >> >> >> >> E.g. for rat: >> >> >> >> If one goes here >> >> >> >> >> >> >> >> http://www.affymetrix.com/estore/browse/products.jsp?productId=1 31489&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 >> >> >> >> Then to Technical Documentation tab >> >> >> >> And downloads the >> >> >> >> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full, >> >> extended and comprehensive rn4" data >> >> >> >> >> >> >> >> http://www.affymetrix.com/Auth/support/downloads/library_files /RaEx-1_0-st-v1.r2.dt1.rn4.ps.zip >> >> >> >> There are the core/extended/full ps and mps files here. >> >> >> >> However there is also a comprehensive mps file. >> >> >> >> Full, core and extended are from 2006. >> >> >> >> The comprehensive is from 2010 (and gets updated more regularly), so >> >> perhaps would be a better file to use for getNetAffx ? >> >> >> >> Apologies if this has been covered before. I am never sure of what is >> >> the best way to analyse exon array data at the gene level. >> >> >> >> Thanks, >> >> >> >> Jim >> >> >> >> >> >> >> >> >> >> On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> >> >> wrote: >> >> > >> >> > please correct the code below to: >> >> > >> >> > eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is >> >> > available) >> >> > >> >> > and if you want results at the exon level >> >> > >> >> > eset = rma(raw, target='probeset') >> >> > featureData(eset) = getNetAffx(raw, 'probeset') >> >> > >> >> > apologies for the mistake below. >> >> > >> >> > b >> >> > >> >> > On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> >> >> > wrote: >> >> > > FWIW, remember that you can obtain the contents of the annotation >> >> > > files (the NA32 Affymetrix files) with: >> >> > > >> >> > > library(Biobase) >> >> > > library(oligo) >> >> > > raw = read.celfiles(list.celfiles()) >> >> > > eset = rma(raw, target='transcript') >> >> > > featureData(eset) = getNetAffx(eset, 'transcript') >> >> > > head(fData(eset)) >> >> > > >> >> > > b >> >> > > >> >> > > On 13 June 2012 15:47, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >> >> > >> Hi Andreas, >> >> > >> >> >> > >> >> >> > >> On 6/13/2012 3:14 AM, Andreas Heider wrote: >> >> > >>> >> >> > >>> Dear mailing list, >> >> > >>> I know this was on the list couple of times, and I think I read >> >> > >>> it >> >> > >>> all, >> >> > >>> but >> >> > >>> actually I still don't get it right. So here is my problem: >> >> > >>> >> >> > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT >> >> > >>> Mouse >> >> > >>> Gene >> >> > >>> 1.0 >> >> > >>> ST) in a similar fashion to eg. HG-U133 arrays. >> >> > >>> That means, I want to finally have it accessible as an >> >> > >>> ExpressionSet >> >> > >>> object >> >> > >>> with a right Bioconductor annotation assigned. This should >> >> > >>> include >> >> > >>> GENE >> >> > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. >> >> > >> >> >> > >> >> >> > >> The problem here is that you want to do something that AFAIK isn't >> >> > >> easy to >> >> > >> do. The Gene ST arrays allow you to summarize all the probes that >> >> > >> interrogate a particular transcript (e.g., all the exon- level >> >> > >> probesets are >> >> > >> collapsed to transcript level, and then you summarize). However, >> >> > >> for >> >> > >> the >> >> > >> Exon ST arrays that isn't the case, unless there is something in >> >> > >> xps >> >> > >> to >> >> > >> allow for that - I know next to nothing about that package, so >> >> > >> Cristian >> >> > >> Stratowa will have to chime in if I am missing something. >> >> > >> >> >> > >> For the Exon chips, you are always summarizing at the same >> >> > >> probeset >> >> > >> level, >> >> > >> where there are <= 4 probes per probeset, and there can be any >> >> > >> number >> >> > >> of >> >> > >> probesets that interrogate a given exon. Lots of these probesets >> >> > >> interrogate >> >> > >> regions that aren't even transcribed, according to current >> >> > >> knowledge >> >> > >> of the >> >> > >> genome. When you choose core, extended or full probesets, you are >> >> > >> just >> >> > >> changing the number of probesets being used, not summarizing at a >> >> > >> different >> >> > >> level as with the Gene ST chip. >> >> > >> >> >> > >> So when you say you want gene symbols, refseq ids and gene ids, >> >> > >> what >> >> > >> exactly >> >> > >> are you after? If a given probeset is in the intron of a gene do >> >> > >> you >> >> > >> want to >> >> > >> annotate it as being part of that gene? How about if it is in the >> >> > >> UTR >> >> > >> (or >> >> > >> really close to the UTR)? What do you want to do with the >> >> > >> probesets >> >> > >> where >> >> > >> one or more of the probes binds in multiple positions in the >> >> > >> genome? >> >> > >> These >> >> > >> are all questions that the exonmap package tries to consider, and >> >> > >> it >> >> > >> gets >> >> > >> really complicated. That's why Affy went with the Gene ST chips - >> >> > >> they >> >> > >> unleashed the Exon chips on us and couldn't sell them because >> >> > >> people >> >> > >> were >> >> > >> saying WTF do I do with this thing? >> >> > >> >> >> > >> I don't think there is an easy or obvious answer to your question. >> >> > >> If >> >> > >> you >> >> > >> were to come up with what you think are reasonable answers to my >> >> > >> questions, >> >> > >> then it wouldn't be much work to extract the chr, start, end from >> >> > >> the >> >> > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., >> >> > >> ?findOverlaps()) to decide what regions are being interrogated, >> >> > >> and >> >> > >> annotate >> >> > >> from there. >> >> > >> >> >> > >> Best, >> >> > >> >> >> > >> Jim >> >> > >> >> >> > >> >> >> > >> >> >> > >>> >> >> > >>> I can import it as a AffyBatch and generate an ExpressionSet with >> >> > >>> the help >> >> > >>> of the Xmap/exonmap supplied CDF, but there is no annotation >> >> > >>> attached to >> >> > >>> it. >> >> > >>> >> >> > >>> OR >> >> > >>> >> >> > >>> I can import the CEL files with the "oligo" package as a Exon >> >> > >>> Array >> >> > >>> object >> >> > >>> and generate an ExpressionSet from it. >> >> > >>> However in that case it still have no annotation. >> >> > >>> >> >> > >>> Surprisingly on the Bioconductor website there are all packages >> >> > >>> needed to >> >> > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work >> >> > >>> with >> >> > >>> Mouse >> >> > >>> Exon 1.0 ST arrays seems missing! >> >> > >>> >> >> > >>> What am I doing wrong here? Has someone else had such problems? >> >> > >>> >> >> > >>> Thanks in advance for your effort, >> >> > >>> Andreas >> >> > >>> >> >> > >>> ? ? ? ?[[alternative HTML version deleted]] >> >> > >>> >> >> > >>> _______________________________________________ >> >> > >>> Bioconductor mailing list >> >> > >>> Bioconductor at r-project.org >> >> > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > >>> Search the archives: >> >> > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >> >> >> > >> >> >> > >> -- >> >> > >> James W. MacDonald, M.S. >> >> > >> Biostatistician >> >> > >> University of Washington >> >> > >> Environmental and Occupational Health Sciences >> >> > >> 4225 Roosevelt Way NE, # 100 >> >> > >> Seattle WA 98105-6099 >> >> > >> >> >> > >> >> >> > >> _______________________________________________ >> >> > >> Bioconductor mailing list >> >> > >> Bioconductor at r-project.org >> >> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > >> Search the archives: >> >> > >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >> >> > _______________________________________________ >> >> > Bioconductor mailing list >> >> > Bioconductor at r-project.org >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > Search the archives: >> >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > > >
ADD REPLY
0
Entering edit mode
The AnnMap CDF should take care of that. 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > Thanks Andreas! That's really useful information, I will have a look. > > Out of interest, did you look at the distribution of expression levels > for the different prob-sets? If you are including all probe-sets, I > would guess that if there were a lot of predicted/intronic probe sets > that aren't expressed that could bias your gene-level estimation, i.e. > if it the proportion is above the break-down point of the > summarisation/aggregation method. > > Although perhaps the CDF from annmap takes care of that? > > Cheers! > > Jim > > On 27 June 2012 16:45, Andreas Heider <aheider@trm.uni-leipzig.de> wrote: > > Ok, sorry, that was the "short answer". Here comes the longer one: > > 1. get a CDF for the chip, get it at > http://annmap.picr.man.ac.uk/download/ > > 2. load CEL files using standard affy package > > 3. asign the downloaded CDF to your AffyBatch object > > 4. calculate RMA or whatever you want (NOTE: this will get you all > > probesets, no restrictions as in eg "core") > > 5. pull the whole set of identifiers from biomaRt and annotate your > > expression matrix with this information > > 6. "collapse" probesets targetting the same identifier to its mean, > median > > or medpolish, whatever suits your needs best via functions as "recast" or > > "aggregate" > > 7. have fun with your new expression matrix! > > > > Hope that helps, I needed also some time to figure out the individual > steps. > > > > > > 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > >> > >> Thanks for the pointer Andreas, > >> > >> How did you go from probe sets for a given gene to the transcript > >> level? And how did you know if it was "core", "extended", "full" > >> confidence? > >> > >> Also, how did you summarise the probeset expression levels to make a > >> transcript? Using biomart I get ~25k unique ensembl genes mapping to > >> probe set ids, which is much higher than when I follow the oligo > >> pipeline and perform RMA at core/extended/full level, and use getAffx > >> for annotation. > >> > >> Thanks, > >> > >> Jim > >> > >> On 27 June 2012 16:03, Andreas Heider <aheider@trm.uni- leipzig.de=""> > wrote: > >> > Dear Jim, > >> > I pulled all relevant annotation via biomaRt, as biomart was all > >> > mappings of > >> > exon array probeset IDs to eg ENTREZID or GENESYMBOL. Than you can go > on > >> > from that. > >> > > >> > Cheers, > >> > Andreas > >> > > >> > > >> > 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > >> >> > >> >> Hi, > >> >> > >> >> I wasn't sure if this was worth starting a new thread for this, since > >> >> my question is very much related to this thread... > >> >> > >> >> Is there any plan to include the "comprehensive" exon array mappings? > >> >> > >> >> E.g. for rat: > >> >> > >> >> If one goes here > >> >> > >> >> > >> >> > >> >> > http://www.affymetrix.com/estore/browse/products.jsp?productId=13148 9&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 > >> >> > >> >> Then to Technical Documentation tab > >> >> > >> >> And downloads the > >> >> > >> >> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full, > >> >> extended and comprehensive rn4" data > >> >> > >> >> > >> >> > >> >> > http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx- 1_0-st-v1.r2.dt1.rn4.ps.zip > >> >> > >> >> There are the core/extended/full ps and mps files here. > >> >> > >> >> However there is also a comprehensive mps file. > >> >> > >> >> Full, core and extended are from 2006. > >> >> > >> >> The comprehensive is from 2010 (and gets updated more regularly), so > >> >> perhaps would be a better file to use for getNetAffx ? > >> >> > >> >> Apologies if this has been covered before. I am never sure of what is > >> >> the best way to analyse exon array data at the gene level. > >> >> > >> >> Thanks, > >> >> > >> >> Jim > >> >> > >> >> > >> >> > >> >> > >> >> On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho@gmail.com> > > >> >> wrote: > >> >> > > >> >> > please correct the code below to: > >> >> > > >> >> > eset = rma(raw, target='full') ## or 'core', 'extended' (whatever > is > >> >> > available) > >> >> > > >> >> > and if you want results at the exon level > >> >> > > >> >> > eset = rma(raw, target='probeset') > >> >> > featureData(eset) = getNetAffx(raw, 'probeset') > >> >> > > >> >> > apologies for the mistake below. > >> >> > > >> >> > b > >> >> > > >> >> > On 13 June 2012 20:11, Benilton Carvalho < > beniltoncarvalho@gmail.com> > >> >> > wrote: > >> >> > > FWIW, remember that you can obtain the contents of the annotation > >> >> > > files (the NA32 Affymetrix files) with: > >> >> > > > >> >> > > library(Biobase) > >> >> > > library(oligo) > >> >> > > raw = read.celfiles(list.celfiles()) > >> >> > > eset = rma(raw, target='transcript') > >> >> > > featureData(eset) = getNetAffx(eset, 'transcript') > >> >> > > head(fData(eset)) > >> >> > > > >> >> > > b > >> >> > > > >> >> > > On 13 June 2012 15:47, James W. MacDonald <jmacdon@uw.edu> > wrote: > >> >> > >> Hi Andreas, > >> >> > >> > >> >> > >> > >> >> > >> On 6/13/2012 3:14 AM, Andreas Heider wrote: > >> >> > >>> > >> >> > >>> Dear mailing list, > >> >> > >>> I know this was on the list couple of times, and I think I read > >> >> > >>> it > >> >> > >>> all, > >> >> > >>> but > >> >> > >>> actually I still don't get it right. So here is my problem: > >> >> > >>> > >> >> > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT > >> >> > >>> Mouse > >> >> > >>> Gene > >> >> > >>> 1.0 > >> >> > >>> ST) in a similar fashion to eg. HG-U133 arrays. > >> >> > >>> That means, I want to finally have it accessible as an > >> >> > >>> ExpressionSet > >> >> > >>> object > >> >> > >>> with a right Bioconductor annotation assigned. This should > >> >> > >>> include > >> >> > >>> GENE > >> >> > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. > >> >> > >> > >> >> > >> > >> >> > >> The problem here is that you want to do something that AFAIK > isn't > >> >> > >> easy to > >> >> > >> do. The Gene ST arrays allow you to summarize all the probes > that > >> >> > >> interrogate a particular transcript (e.g., all the exon- level > >> >> > >> probesets are > >> >> > >> collapsed to transcript level, and then you summarize). However, > >> >> > >> for > >> >> > >> the > >> >> > >> Exon ST arrays that isn't the case, unless there is something in > >> >> > >> xps > >> >> > >> to > >> >> > >> allow for that - I know next to nothing about that package, so > >> >> > >> Cristian > >> >> > >> Stratowa will have to chime in if I am missing something. > >> >> > >> > >> >> > >> For the Exon chips, you are always summarizing at the same > >> >> > >> probeset > >> >> > >> level, > >> >> > >> where there are <= 4 probes per probeset, and there can be any > >> >> > >> number > >> >> > >> of > >> >> > >> probesets that interrogate a given exon. Lots of these probesets > >> >> > >> interrogate > >> >> > >> regions that aren't even transcribed, according to current > >> >> > >> knowledge > >> >> > >> of the > >> >> > >> genome. When you choose core, extended or full probesets, you > are > >> >> > >> just > >> >> > >> changing the number of probesets being used, not summarizing at > a > >> >> > >> different > >> >> > >> level as with the Gene ST chip. > >> >> > >> > >> >> > >> So when you say you want gene symbols, refseq ids and gene ids, > >> >> > >> what > >> >> > >> exactly > >> >> > >> are you after? If a given probeset is in the intron of a gene do > >> >> > >> you > >> >> > >> want to > >> >> > >> annotate it as being part of that gene? How about if it is in > the > >> >> > >> UTR > >> >> > >> (or > >> >> > >> really close to the UTR)? What do you want to do with the > >> >> > >> probesets > >> >> > >> where > >> >> > >> one or more of the probes binds in multiple positions in the > >> >> > >> genome? > >> >> > >> These > >> >> > >> are all questions that the exonmap package tries to consider, > and > >> >> > >> it > >> >> > >> gets > >> >> > >> really complicated. That's why Affy went with the Gene ST chips > - > >> >> > >> they > >> >> > >> unleashed the Exon chips on us and couldn't sell them because > >> >> > >> people > >> >> > >> were > >> >> > >> saying WTF do I do with this thing? > >> >> > >> > >> >> > >> I don't think there is an easy or obvious answer to your > question. > >> >> > >> If > >> >> > >> you > >> >> > >> were to come up with what you think are reasonable answers to my > >> >> > >> questions, > >> >> > >> then it wouldn't be much work to extract the chr, start, end > from > >> >> > >> the > >> >> > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., > >> >> > >> findOverlaps()) to decide what regions are being interrogated, > >> >> > >> and > >> >> > >> annotate > >> >> > >> from there. > >> >> > >> > >> >> > >> Best, > >> >> > >> > >> >> > >> Jim > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >>> > >> >> > >>> I can import it as a AffyBatch and generate an ExpressionSet > with > >> >> > >>> the help > >> >> > >>> of the Xmap/exonmap supplied CDF, but there is no annotation > >> >> > >>> attached to > >> >> > >>> it. > >> >> > >>> > >> >> > >>> OR > >> >> > >>> > >> >> > >>> I can import the CEL files with the "oligo" package as a Exon > >> >> > >>> Array > >> >> > >>> object > >> >> > >>> and generate an ExpressionSet from it. > >> >> > >>> However in that case it still have no annotation. > >> >> > >>> > >> >> > >>> Surprisingly on the Bioconductor website there are all packages > >> >> > >>> needed to > >> >> > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work > >> >> > >>> with > >> >> > >>> Mouse > >> >> > >>> Exon 1.0 ST arrays seems missing! > >> >> > >>> > >> >> > >>> What am I doing wrong here? Has someone else had such problems? > >> >> > >>> > >> >> > >>> Thanks in advance for your effort, > >> >> > >>> Andreas > >> >> > >>> > >> >> > >>> [[alternative HTML version deleted]] > >> >> > >>> > >> >> > >>> _______________________________________________ > >> >> > >>> Bioconductor mailing list > >> >> > >>> Bioconductor@r-project.org > >> >> > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> > >>> Search the archives: > >> >> > >>> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >> > >> > >> >> > >> > >> >> > >> -- > >> >> > >> James W. MacDonald, M.S. > >> >> > >> Biostatistician > >> >> > >> University of Washington > >> >> > >> Environmental and Occupational Health Sciences > >> >> > >> 4225 Roosevelt Way NE, # 100 > >> >> > >> Seattle WA 98105-6099 > >> >> > >> > >> >> > >> > >> >> > >> _______________________________________________ > >> >> > >> Bioconductor mailing list > >> >> > >> Bioconductor@r-project.org > >> >> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> > >> Search the archives: > >> >> > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >> > > >> >> > _______________________________________________ > >> >> > Bioconductor mailing list > >> >> > Bioconductor@r-project.org > >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> > Search the archives: > >> >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > > > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Also remember, that this will be influenced by your selection of identifiers in biomart! 2012/6/27 Andreas Heider <aheider@trm.uni-leipzig.de> > The AnnMap CDF should take care of that. > > > 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > >> Thanks Andreas! That's really useful information, I will have a look. >> >> Out of interest, did you look at the distribution of expression levels >> for the different prob-sets? If you are including all probe-sets, I >> would guess that if there were a lot of predicted/intronic probe sets >> that aren't expressed that could bias your gene-level estimation, i.e. >> if it the proportion is above the break-down point of the >> summarisation/aggregation method. >> >> Although perhaps the CDF from annmap takes care of that? >> >> Cheers! >> >> Jim >> >> On 27 June 2012 16:45, Andreas Heider <aheider@trm.uni-leipzig.de> wrote: >> > Ok, sorry, that was the "short answer". Here comes the longer one: >> > 1. get a CDF for the chip, get it at >> http://annmap.picr.man.ac.uk/download/ >> > 2. load CEL files using standard affy package >> > 3. asign the downloaded CDF to your AffyBatch object >> > 4. calculate RMA or whatever you want (NOTE: this will get you all >> > probesets, no restrictions as in eg "core") >> > 5. pull the whole set of identifiers from biomaRt and annotate your >> > expression matrix with this information >> > 6. "collapse" probesets targetting the same identifier to its mean, >> median >> > or medpolish, whatever suits your needs best via functions as "recast" >> or >> > "aggregate" >> > 7. have fun with your new expression matrix! >> > >> > Hope that helps, I needed also some time to figure out the individual >> steps. >> > >> > >> > 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> >> >> >> >> Thanks for the pointer Andreas, >> >> >> >> How did you go from probe sets for a given gene to the transcript >> >> level? And how did you know if it was "core", "extended", "full" >> >> confidence? >> >> >> >> Also, how did you summarise the probeset expression levels to make a >> >> transcript? Using biomart I get ~25k unique ensembl genes mapping to >> >> probe set ids, which is much higher than when I follow the oligo >> >> pipeline and perform RMA at core/extended/full level, and use getAffx >> >> for annotation. >> >> >> >> Thanks, >> >> >> >> Jim >> >> >> >> On 27 June 2012 16:03, Andreas Heider <aheider@trm.uni- leipzig.de=""> >> wrote: >> >> > Dear Jim, >> >> > I pulled all relevant annotation via biomaRt, as biomart was all >> >> > mappings of >> >> > exon array probeset IDs to eg ENTREZID or GENESYMBOL. Than you can >> go on >> >> > from that. >> >> > >> >> > Cheers, >> >> > Andreas >> >> > >> >> > >> >> > 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> >> >> >> >> >> >> Hi, >> >> >> >> >> >> I wasn't sure if this was worth starting a new thread for this, >> since >> >> >> my question is very much related to this thread... >> >> >> >> >> >> Is there any plan to include the "comprehensive" exon array >> mappings? >> >> >> >> >> >> E.g. for rat: >> >> >> >> >> >> If one goes here >> >> >> >> >> >> >> >> >> >> >> >> >> http://www.affymetrix.com/estore/browse/products.jsp?productId=1314 89&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 >> >> >> >> >> >> Then to Technical Documentation tab >> >> >> >> >> >> And downloads the >> >> >> >> >> >> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, >> full, >> >> >> extended and comprehensive rn4" data >> >> >> >> >> >> >> >> >> >> >> >> >> http://www.affymetrix.com/Auth/support/downloads/library_files /RaEx-1_0-st-v1.r2.dt1.rn4.ps.zip >> >> >> >> >> >> There are the core/extended/full ps and mps files here. >> >> >> >> >> >> However there is also a comprehensive mps file. >> >> >> >> >> >> Full, core and extended are from 2006. >> >> >> >> >> >> The comprehensive is from 2010 (and gets updated more regularly), so >> >> >> perhaps would be a better file to use for getNetAffx ? >> >> >> >> >> >> Apologies if this has been covered before. I am never sure of what >> is >> >> >> the best way to analyse exon array data at the gene level. >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Jim >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On 13 June 2012 21:37, Benilton Carvalho < >> beniltoncarvalho@gmail.com> >> >> >> wrote: >> >> >> > >> >> >> > please correct the code below to: >> >> >> > >> >> >> > eset = rma(raw, target='full') ## or 'core', 'extended' (whatever >> is >> >> >> > available) >> >> >> > >> >> >> > and if you want results at the exon level >> >> >> > >> >> >> > eset = rma(raw, target='probeset') >> >> >> > featureData(eset) = getNetAffx(raw, 'probeset') >> >> >> > >> >> >> > apologies for the mistake below. >> >> >> > >> >> >> > b >> >> >> > >> >> >> > On 13 June 2012 20:11, Benilton Carvalho < >> beniltoncarvalho@gmail.com> >> >> >> > wrote: >> >> >> > > FWIW, remember that you can obtain the contents of the >> annotation >> >> >> > > files (the NA32 Affymetrix files) with: >> >> >> > > >> >> >> > > library(Biobase) >> >> >> > > library(oligo) >> >> >> > > raw = read.celfiles(list.celfiles()) >> >> >> > > eset = rma(raw, target='transcript') >> >> >> > > featureData(eset) = getNetAffx(eset, 'transcript') >> >> >> > > head(fData(eset)) >> >> >> > > >> >> >> > > b >> >> >> > > >> >> >> > > On 13 June 2012 15:47, James W. MacDonald <jmacdon@uw.edu> >> wrote: >> >> >> > >> Hi Andreas, >> >> >> > >> >> >> >> > >> >> >> >> > >> On 6/13/2012 3:14 AM, Andreas Heider wrote: >> >> >> > >>> >> >> >> > >>> Dear mailing list, >> >> >> > >>> I know this was on the list couple of times, and I think I >> read >> >> >> > >>> it >> >> >> > >>> all, >> >> >> > >>> but >> >> >> > >>> actually I still don't get it right. So here is my problem: >> >> >> > >>> >> >> >> > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT >> >> >> > >>> Mouse >> >> >> > >>> Gene >> >> >> > >>> 1.0 >> >> >> > >>> ST) in a similar fashion to eg. HG-U133 arrays. >> >> >> > >>> That means, I want to finally have it accessible as an >> >> >> > >>> ExpressionSet >> >> >> > >>> object >> >> >> > >>> with a right Bioconductor annotation assigned. This should >> >> >> > >>> include >> >> >> > >>> GENE >> >> >> > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. >> >> >> > >> >> >> >> > >> >> >> >> > >> The problem here is that you want to do something that AFAIK >> isn't >> >> >> > >> easy to >> >> >> > >> do. The Gene ST arrays allow you to summarize all the probes >> that >> >> >> > >> interrogate a particular transcript (e.g., all the exon- level >> >> >> > >> probesets are >> >> >> > >> collapsed to transcript level, and then you summarize). >> However, >> >> >> > >> for >> >> >> > >> the >> >> >> > >> Exon ST arrays that isn't the case, unless there is something >> in >> >> >> > >> xps >> >> >> > >> to >> >> >> > >> allow for that - I know next to nothing about that package, so >> >> >> > >> Cristian >> >> >> > >> Stratowa will have to chime in if I am missing something. >> >> >> > >> >> >> >> > >> For the Exon chips, you are always summarizing at the same >> >> >> > >> probeset >> >> >> > >> level, >> >> >> > >> where there are <= 4 probes per probeset, and there can be any >> >> >> > >> number >> >> >> > >> of >> >> >> > >> probesets that interrogate a given exon. Lots of these >> probesets >> >> >> > >> interrogate >> >> >> > >> regions that aren't even transcribed, according to current >> >> >> > >> knowledge >> >> >> > >> of the >> >> >> > >> genome. When you choose core, extended or full probesets, you >> are >> >> >> > >> just >> >> >> > >> changing the number of probesets being used, not summarizing >> at a >> >> >> > >> different >> >> >> > >> level as with the Gene ST chip. >> >> >> > >> >> >> >> > >> So when you say you want gene symbols, refseq ids and gene ids, >> >> >> > >> what >> >> >> > >> exactly >> >> >> > >> are you after? If a given probeset is in the intron of a gene >> do >> >> >> > >> you >> >> >> > >> want to >> >> >> > >> annotate it as being part of that gene? How about if it is in >> the >> >> >> > >> UTR >> >> >> > >> (or >> >> >> > >> really close to the UTR)? What do you want to do with the >> >> >> > >> probesets >> >> >> > >> where >> >> >> > >> one or more of the probes binds in multiple positions in the >> >> >> > >> genome? >> >> >> > >> These >> >> >> > >> are all questions that the exonmap package tries to consider, >> and >> >> >> > >> it >> >> >> > >> gets >> >> >> > >> really complicated. That's why Affy went with the Gene ST >> chips - >> >> >> > >> they >> >> >> > >> unleashed the Exon chips on us and couldn't sell them because >> >> >> > >> people >> >> >> > >> were >> >> >> > >> saying WTF do I do with this thing? >> >> >> > >> >> >> >> > >> I don't think there is an easy or obvious answer to your >> question. >> >> >> > >> If >> >> >> > >> you >> >> >> > >> were to come up with what you think are reasonable answers to >> my >> >> >> > >> questions, >> >> >> > >> then it wouldn't be much work to extract the chr, start, end >> from >> >> >> > >> the >> >> >> > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., >> >> >> > >> findOverlaps()) to decide what regions are being interrogated, >> >> >> > >> and >> >> >> > >> annotate >> >> >> > >> from there. >> >> >> > >> >> >> >> > >> Best, >> >> >> > >> >> >> >> > >> Jim >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >>> >> >> >> > >>> I can import it as a AffyBatch and generate an ExpressionSet >> with >> >> >> > >>> the help >> >> >> > >>> of the Xmap/exonmap supplied CDF, but there is no annotation >> >> >> > >>> attached to >> >> >> > >>> it. >> >> >> > >>> >> >> >> > >>> OR >> >> >> > >>> >> >> >> > >>> I can import the CEL files with the "oligo" package as a Exon >> >> >> > >>> Array >> >> >> > >>> object >> >> >> > >>> and generate an ExpressionSet from it. >> >> >> > >>> However in that case it still have no annotation. >> >> >> > >>> >> >> >> > >>> Surprisingly on the Bioconductor website there are all >> packages >> >> >> > >>> needed to >> >> >> > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work >> >> >> > >>> with >> >> >> > >>> Mouse >> >> >> > >>> Exon 1.0 ST arrays seems missing! >> >> >> > >>> >> >> >> > >>> What am I doing wrong here? Has someone else had such >> problems? >> >> >> > >>> >> >> >> > >>> Thanks in advance for your effort, >> >> >> > >>> Andreas >> >> >> > >>> >> >> >> > >>> [[alternative HTML version deleted]] >> >> >> > >>> >> >> >> > >>> _______________________________________________ >> >> >> > >>> Bioconductor mailing list >> >> >> > >>> Bioconductor@r-project.org >> >> >> > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> > >>> Search the archives: >> >> >> > >>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> >> >> >> > >> >> >> >> > >> -- >> >> >> > >> James W. MacDonald, M.S. >> >> >> > >> Biostatistician >> >> >> > >> University of Washington >> >> >> > >> Environmental and Occupational Health Sciences >> >> >> > >> 4225 Roosevelt Way NE, # 100 >> >> >> > >> Seattle WA 98105-6099 >> >> >> > >> >> >> >> > >> >> >> >> > >> _______________________________________________ >> >> >> > >> Bioconductor mailing list >> >> >> > >> Bioconductor@r-project.org >> >> >> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> > >> Search the archives: >> >> >> > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> >> >> > _______________________________________________ >> >> >> > Bioconductor mailing list >> >> >> > Bioconductor@r-project.org >> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> > Search the archives: >> >> >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >> >> > >> > >> > >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Could you expand on that a little? Do you mean you can change the level of confidence of the ps ids mapping to the ENSEMBL gene using biomart? On 27 June 2012 17:30, Andreas Heider <aheider at="" trm.uni-leipzig.de=""> wrote: > Also remember, that this will be influenced by your selection of identifiers > in biomart! > > > 2012/6/27 Andreas Heider <aheider at="" trm.uni-leipzig.de=""> >> >> The AnnMap CDF should take care of that. >> >> >> 2012/6/27 James Perkins <jperkins at="" biochem.ucl.ac.uk=""> >>> >>> Thanks Andreas! That's really useful information, I will have a look. >>> >>> Out of interest, did you look at the distribution of expression levels >>> for the different prob-sets? If you are including all probe-sets, I >>> would guess that if there were a lot of predicted/intronic probe sets >>> that aren't expressed that could bias your gene-level estimation, i.e. >>> if it the proportion is above the break-down point of the >>> summarisation/aggregation method. >>> >>> Although perhaps the CDF from annmap takes care of that? >>> >>> Cheers! >>> >>> Jim >>> >>> On 27 June 2012 16:45, Andreas Heider <aheider at="" trm.uni-="" leipzig.de=""> wrote: >>> > Ok, sorry, that was the "short answer". Here comes the longer one: >>> > 1. get a CDF for the chip, get it at >>> > http://annmap.picr.man.ac.uk/download/ >>> > 2. load CEL files using standard affy package >>> > 3. asign the downloaded CDF to your AffyBatch object >>> > 4. calculate RMA or whatever you want (NOTE: this will get you all >>> > probesets, no restrictions as in eg "core") >>> > 5. pull the whole set of identifiers from biomaRt and annotate your >>> > expression matrix with this information >>> > 6. "collapse" probesets targetting the same identifier to its mean, >>> > median >>> > or medpolish, whatever suits your needs best via functions as "recast" >>> > or >>> > "aggregate" >>> > 7. have fun with your new expression matrix! >>> > >>> > Hope that helps, I needed also some time to figure out the individual >>> > steps. >>> > >>> > >>> > 2012/6/27 James Perkins <jperkins at="" biochem.ucl.ac.uk=""> >>> >> >>> >> Thanks for the pointer Andreas, >>> >> >>> >> How did you go from probe sets for a given gene to the transcript >>> >> level? And how did you know if it was "core", "extended", "full" >>> >> confidence? >>> >> >>> >> Also, how did you summarise the probeset expression levels to make a >>> >> transcript? Using biomart I get ~25k unique ensembl genes mapping to >>> >> probe set ids, which is much higher than when I follow the oligo >>> >> pipeline and perform RMA at core/extended/full level, and use getAffx >>> >> for annotation. >>> >> >>> >> Thanks, >>> >> >>> >> Jim >>> >> >>> >> On 27 June 2012 16:03, Andreas Heider <aheider at="" trm.uni-="" leipzig.de=""> >>> >> wrote: >>> >> > Dear Jim, >>> >> > I pulled all relevant annotation via biomaRt, as biomart was all >>> >> > mappings of >>> >> > exon array probeset IDs to eg ENTREZID or GENESYMBOL. Than you can >>> >> > go on >>> >> > from that. >>> >> > >>> >> > Cheers, >>> >> > Andreas >>> >> > >>> >> > >>> >> > 2012/6/27 James Perkins <jperkins at="" biochem.ucl.ac.uk=""> >>> >> >> >>> >> >> Hi, >>> >> >> >>> >> >> I wasn't sure if this was worth starting a new thread for this, >>> >> >> since >>> >> >> my question is very much related to this thread... >>> >> >> >>> >> >> Is there any plan to include the "comprehensive" exon array >>> >> >> mappings? >>> >> >> >>> >> >> E.g. for rat: >>> >> >> >>> >> >> If one goes here >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> http://www.affymetrix.com/estore/browse/products.jsp?product Id=131489&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST- Array#1_1 >>> >> >> >>> >> >> Then to Technical Documentation tab >>> >> >> >>> >> >> And downloads the >>> >> >> >>> >> >> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, >>> >> >> full, >>> >> >> extended and comprehensive rn4" data >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx- 1_0-st-v1.r2.dt1.rn4.ps.zip >>> >> >> >>> >> >> There are the core/extended/full ps and mps files here. >>> >> >> >>> >> >> However there is also a comprehensive mps file. >>> >> >> >>> >> >> Full, core and extended are from 2006. >>> >> >> >>> >> >> The comprehensive is from 2010 (and gets updated more regularly), >>> >> >> so >>> >> >> perhaps would be a better file to use for getNetAffx ? >>> >> >> >>> >> >> Apologies if this has been covered before. I am never sure of what >>> >> >> is >>> >> >> the best way to analyse exon array data at the gene level. >>> >> >> >>> >> >> Thanks, >>> >> >> >>> >> >> Jim >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> On 13 June 2012 21:37, Benilton Carvalho >>> >> >> <beniltoncarvalho at="" gmail.com=""> >>> >> >> wrote: >>> >> >> > >>> >> >> > please correct the code below to: >>> >> >> > >>> >> >> > eset = rma(raw, target='full') ## or 'core', 'extended' (whatever >>> >> >> > is >>> >> >> > available) >>> >> >> > >>> >> >> > and if you want results at the exon level >>> >> >> > >>> >> >> > eset = rma(raw, target='probeset') >>> >> >> > featureData(eset) = getNetAffx(raw, 'probeset') >>> >> >> > >>> >> >> > apologies for the mistake below. >>> >> >> > >>> >> >> > b >>> >> >> > >>> >> >> > On 13 June 2012 20:11, Benilton Carvalho >>> >> >> > <beniltoncarvalho at="" gmail.com=""> >>> >> >> > wrote: >>> >> >> > > FWIW, remember that you can obtain the contents of the >>> >> >> > > annotation >>> >> >> > > files (the NA32 Affymetrix files) with: >>> >> >> > > >>> >> >> > > library(Biobase) >>> >> >> > > library(oligo) >>> >> >> > > raw = read.celfiles(list.celfiles()) >>> >> >> > > eset = rma(raw, target='transcript') >>> >> >> > > featureData(eset) = getNetAffx(eset, 'transcript') >>> >> >> > > head(fData(eset)) >>> >> >> > > >>> >> >> > > b >>> >> >> > > >>> >> >> > > On 13 June 2012 15:47, James W. MacDonald <jmacdon at="" uw.edu=""> >>> >> >> > > wrote: >>> >> >> > >> Hi Andreas, >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> On 6/13/2012 3:14 AM, Andreas Heider wrote: >>> >> >> > >>> >>> >> >> > >>> Dear mailing list, >>> >> >> > >>> I know this was on the list couple of times, and I think I >>> >> >> > >>> read >>> >> >> > >>> it >>> >> >> > >>> all, >>> >> >> > >>> but >>> >> >> > >>> actually I still don't get it right. So here is my problem: >>> >> >> > >>> >>> >> >> > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT >>> >> >> > >>> Mouse >>> >> >> > >>> Gene >>> >> >> > >>> 1.0 >>> >> >> > >>> ST) in a similar fashion to eg. HG-U133 arrays. >>> >> >> > >>> That means, I want to finally have it accessible as an >>> >> >> > >>> ExpressionSet >>> >> >> > >>> object >>> >> >> > >>> with a right Bioconductor annotation assigned. This should >>> >> >> > >>> include >>> >> >> > >>> GENE >>> >> >> > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> The problem here is that you want to do something that AFAIK >>> >> >> > >> isn't >>> >> >> > >> easy to >>> >> >> > >> do. The Gene ST arrays allow you to summarize all the probes >>> >> >> > >> that >>> >> >> > >> interrogate a particular transcript (e.g., all the exon-level >>> >> >> > >> probesets are >>> >> >> > >> collapsed to transcript level, and then you summarize). >>> >> >> > >> However, >>> >> >> > >> for >>> >> >> > >> the >>> >> >> > >> Exon ST arrays that isn't the case, unless there is something >>> >> >> > >> in >>> >> >> > >> xps >>> >> >> > >> to >>> >> >> > >> allow for that - I know next to nothing about that package, so >>> >> >> > >> Cristian >>> >> >> > >> Stratowa will have to chime in if I am missing something. >>> >> >> > >> >>> >> >> > >> For the Exon chips, you are always summarizing at the same >>> >> >> > >> probeset >>> >> >> > >> level, >>> >> >> > >> where there are <= 4 probes per probeset, and there can be any >>> >> >> > >> number >>> >> >> > >> of >>> >> >> > >> probesets that interrogate a given exon. Lots of these >>> >> >> > >> probesets >>> >> >> > >> interrogate >>> >> >> > >> regions that aren't even transcribed, according to current >>> >> >> > >> knowledge >>> >> >> > >> of the >>> >> >> > >> genome. When you choose core, extended or full probesets, you >>> >> >> > >> are >>> >> >> > >> just >>> >> >> > >> changing the number of probesets being used, not summarizing >>> >> >> > >> at a >>> >> >> > >> different >>> >> >> > >> level as with the Gene ST chip. >>> >> >> > >> >>> >> >> > >> So when you say you want gene symbols, refseq ids and gene >>> >> >> > >> ids, >>> >> >> > >> what >>> >> >> > >> exactly >>> >> >> > >> are you after? If a given probeset is in the intron of a gene >>> >> >> > >> do >>> >> >> > >> you >>> >> >> > >> want to >>> >> >> > >> annotate it as being part of that gene? How about if it is in >>> >> >> > >> the >>> >> >> > >> UTR >>> >> >> > >> (or >>> >> >> > >> really close to the UTR)? What do you want to do with the >>> >> >> > >> probesets >>> >> >> > >> where >>> >> >> > >> one or more of the probes binds in multiple positions in the >>> >> >> > >> genome? >>> >> >> > >> These >>> >> >> > >> are all questions that the exonmap package tries to consider, >>> >> >> > >> and >>> >> >> > >> it >>> >> >> > >> gets >>> >> >> > >> really complicated. That's why Affy went with the Gene ST >>> >> >> > >> chips - >>> >> >> > >> they >>> >> >> > >> unleashed the Exon chips on us and couldn't sell them because >>> >> >> > >> people >>> >> >> > >> were >>> >> >> > >> saying WTF do I do with this thing? >>> >> >> > >> >>> >> >> > >> I don't think there is an easy or obvious answer to your >>> >> >> > >> question. >>> >> >> > >> If >>> >> >> > >> you >>> >> >> > >> were to come up with what you think are reasonable answers to >>> >> >> > >> my >>> >> >> > >> questions, >>> >> >> > >> then it wouldn't be much work to extract the chr, start, end >>> >> >> > >> from >>> >> >> > >> the >>> >> >> > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., >>> >> >> > >> ?findOverlaps()) to decide what regions are being >>> >> >> > >> interrogated, >>> >> >> > >> and >>> >> >> > >> annotate >>> >> >> > >> from there. >>> >> >> > >> >>> >> >> > >> Best, >>> >> >> > >> >>> >> >> > >> Jim >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > >>> >>> >> >> > >>> I can import it as a AffyBatch and generate an ExpressionSet >>> >> >> > >>> with >>> >> >> > >>> the help >>> >> >> > >>> of the Xmap/exonmap supplied CDF, but there is no annotation >>> >> >> > >>> attached to >>> >> >> > >>> it. >>> >> >> > >>> >>> >> >> > >>> OR >>> >> >> > >>> >>> >> >> > >>> I can import the CEL files with the "oligo" package as a Exon >>> >> >> > >>> Array >>> >> >> > >>> object >>> >> >> > >>> and generate an ExpressionSet from it. >>> >> >> > >>> However in that case it still have no annotation. >>> >> >> > >>> >>> >> >> > >>> Surprisingly on the Bioconductor website there are all >>> >> >> > >>> packages >>> >> >> > >>> needed to >>> >> >> > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work >>> >> >> > >>> with >>> >> >> > >>> Mouse >>> >> >> > >>> Exon 1.0 ST arrays seems missing! >>> >> >> > >>> >>> >> >> > >>> What am I doing wrong here? Has someone else had such >>> >> >> > >>> problems? >>> >> >> > >>> >>> >> >> > >>> Thanks in advance for your effort, >>> >> >> > >>> Andreas >>> >> >> > >>> >>> >> >> > >>> ? ? ? ?[[alternative HTML version deleted]] >>> >> >> > >>> >>> >> >> > >>> _______________________________________________ >>> >> >> > >>> Bioconductor mailing list >>> >> >> > >>> Bioconductor at r-project.org >>> >> >> > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> >> > >>> Search the archives: >>> >> >> > >>> >>> >> >> > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> -- >>> >> >> > >> James W. MacDonald, M.S. >>> >> >> > >> Biostatistician >>> >> >> > >> University of Washington >>> >> >> > >> Environmental and Occupational Health Sciences >>> >> >> > >> 4225 Roosevelt Way NE, # 100 >>> >> >> > >> Seattle WA 98105-6099 >>> >> >> > >> >>> >> >> > >> >>> >> >> > >> _______________________________________________ >>> >> >> > >> Bioconductor mailing list >>> >> >> > >> Bioconductor at r-project.org >>> >> >> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> >> > >> Search the archives: >>> >> >> > >> >>> >> >> > >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > >>> >> >> > _______________________________________________ >>> >> >> > Bioconductor mailing list >>> >> >> > Bioconductor at r-project.org >>> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> >> > Search the archives: >>> >> >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > >>> >> > >>> > >>> > >> >> >
ADD REPLY
0
Entering edit mode
For example you could use information containing regarding exon contribution, like ENSEMBLE EXON IDs. 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > Could you expand on that a little? Do you mean you can change the > level of confidence of the ps ids mapping to the ENSEMBL gene using > biomart? > > On 27 June 2012 17:30, Andreas Heider <aheider@trm.uni-leipzig.de> wrote: > > Also remember, that this will be influenced by your selection of > identifiers > > in biomart! > > > > > > 2012/6/27 Andreas Heider <aheider@trm.uni-leipzig.de> > >> > >> The AnnMap CDF should take care of that. > >> > >> > >> 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > >>> > >>> Thanks Andreas! That's really useful information, I will have a look. > >>> > >>> Out of interest, did you look at the distribution of expression levels > >>> for the different prob-sets? If you are including all probe- sets, I > >>> would guess that if there were a lot of predicted/intronic probe sets > >>> that aren't expressed that could bias your gene-level estimation, i.e. > >>> if it the proportion is above the break-down point of the > >>> summarisation/aggregation method. > >>> > >>> Although perhaps the CDF from annmap takes care of that? > >>> > >>> Cheers! > >>> > >>> Jim > >>> > >>> On 27 June 2012 16:45, Andreas Heider <aheider@trm.uni- leipzig.de=""> > wrote: > >>> > Ok, sorry, that was the "short answer". Here comes the longer one: > >>> > 1. get a CDF for the chip, get it at > >>> > http://annmap.picr.man.ac.uk/download/ > >>> > 2. load CEL files using standard affy package > >>> > 3. asign the downloaded CDF to your AffyBatch object > >>> > 4. calculate RMA or whatever you want (NOTE: this will get you all > >>> > probesets, no restrictions as in eg "core") > >>> > 5. pull the whole set of identifiers from biomaRt and annotate your > >>> > expression matrix with this information > >>> > 6. "collapse" probesets targetting the same identifier to its mean, > >>> > median > >>> > or medpolish, whatever suits your needs best via functions as > "recast" > >>> > or > >>> > "aggregate" > >>> > 7. have fun with your new expression matrix! > >>> > > >>> > Hope that helps, I needed also some time to figure out the individual > >>> > steps. > >>> > > >>> > > >>> > 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > >>> >> > >>> >> Thanks for the pointer Andreas, > >>> >> > >>> >> How did you go from probe sets for a given gene to the transcript > >>> >> level? And how did you know if it was "core", "extended", "full" > >>> >> confidence? > >>> >> > >>> >> Also, how did you summarise the probeset expression levels to make a > >>> >> transcript? Using biomart I get ~25k unique ensembl genes mapping to > >>> >> probe set ids, which is much higher than when I follow the oligo > >>> >> pipeline and perform RMA at core/extended/full level, and use > getAffx > >>> >> for annotation. > >>> >> > >>> >> Thanks, > >>> >> > >>> >> Jim > >>> >> > >>> >> On 27 June 2012 16:03, Andreas Heider <aheider@trm.uni- leipzig.de=""> > >>> >> wrote: > >>> >> > Dear Jim, > >>> >> > I pulled all relevant annotation via biomaRt, as biomart was all > >>> >> > mappings of > >>> >> > exon array probeset IDs to eg ENTREZID or GENESYMBOL. Than you can > >>> >> > go on > >>> >> > from that. > >>> >> > > >>> >> > Cheers, > >>> >> > Andreas > >>> >> > > >>> >> > > >>> >> > 2012/6/27 James Perkins <jperkins@biochem.ucl.ac.uk> > >>> >> >> > >>> >> >> Hi, > >>> >> >> > >>> >> >> I wasn't sure if this was worth starting a new thread for this, > >>> >> >> since > >>> >> >> my question is very much related to this thread... > >>> >> >> > >>> >> >> Is there any plan to include the "comprehensive" exon array > >>> >> >> mappings? > >>> >> >> > >>> >> >> E.g. for rat: > >>> >> >> > >>> >> >> If one goes here > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > http://www.affymetrix.com/estore/browse/products.jsp?productId=13148 9&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 > >>> >> >> > >>> >> >> Then to Technical Documentation tab > >>> >> >> > >>> >> >> And downloads the > >>> >> >> > >>> >> >> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, > >>> >> >> full, > >>> >> >> extended and comprehensive rn4" data > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx- 1_0-st-v1.r2.dt1.rn4.ps.zip > >>> >> >> > >>> >> >> There are the core/extended/full ps and mps files here. > >>> >> >> > >>> >> >> However there is also a comprehensive mps file. > >>> >> >> > >>> >> >> Full, core and extended are from 2006. > >>> >> >> > >>> >> >> The comprehensive is from 2010 (and gets updated more regularly), > >>> >> >> so > >>> >> >> perhaps would be a better file to use for getNetAffx ? > >>> >> >> > >>> >> >> Apologies if this has been covered before. I am never sure of > what > >>> >> >> is > >>> >> >> the best way to analyse exon array data at the gene level. > >>> >> >> > >>> >> >> Thanks, > >>> >> >> > >>> >> >> Jim > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> On 13 June 2012 21:37, Benilton Carvalho > >>> >> >> <beniltoncarvalho@gmail.com> > >>> >> >> wrote: > >>> >> >> > > >>> >> >> > please correct the code below to: > >>> >> >> > > >>> >> >> > eset = rma(raw, target='full') ## or 'core', 'extended' > (whatever > >>> >> >> > is > >>> >> >> > available) > >>> >> >> > > >>> >> >> > and if you want results at the exon level > >>> >> >> > > >>> >> >> > eset = rma(raw, target='probeset') > >>> >> >> > featureData(eset) = getNetAffx(raw, 'probeset') > >>> >> >> > > >>> >> >> > apologies for the mistake below. > >>> >> >> > > >>> >> >> > b > >>> >> >> > > >>> >> >> > On 13 June 2012 20:11, Benilton Carvalho > >>> >> >> > <beniltoncarvalho@gmail.com> > >>> >> >> > wrote: > >>> >> >> > > FWIW, remember that you can obtain the contents of the > >>> >> >> > > annotation > >>> >> >> > > files (the NA32 Affymetrix files) with: > >>> >> >> > > > >>> >> >> > > library(Biobase) > >>> >> >> > > library(oligo) > >>> >> >> > > raw = read.celfiles(list.celfiles()) > >>> >> >> > > eset = rma(raw, target='transcript') > >>> >> >> > > featureData(eset) = getNetAffx(eset, 'transcript') > >>> >> >> > > head(fData(eset)) > >>> >> >> > > > >>> >> >> > > b > >>> >> >> > > > >>> >> >> > > On 13 June 2012 15:47, James W. MacDonald <jmacdon@uw.edu> > >>> >> >> > > wrote: > >>> >> >> > >> Hi Andreas, > >>> >> >> > >> > >>> >> >> > >> > >>> >> >> > >> On 6/13/2012 3:14 AM, Andreas Heider wrote: > >>> >> >> > >>> > >>> >> >> > >>> Dear mailing list, > >>> >> >> > >>> I know this was on the list couple of times, and I think I > >>> >> >> > >>> read > >>> >> >> > >>> it > >>> >> >> > >>> all, > >>> >> >> > >>> but > >>> >> >> > >>> actually I still don't get it right. So here is my problem: > >>> >> >> > >>> > >>> >> >> > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays > (NOT > >>> >> >> > >>> Mouse > >>> >> >> > >>> Gene > >>> >> >> > >>> 1.0 > >>> >> >> > >>> ST) in a similar fashion to eg. HG-U133 arrays. > >>> >> >> > >>> That means, I want to finally have it accessible as an > >>> >> >> > >>> ExpressionSet > >>> >> >> > >>> object > >>> >> >> > >>> with a right Bioconductor annotation assigned. This should > >>> >> >> > >>> include > >>> >> >> > >>> GENE > >>> >> >> > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. > >>> >> >> > >> > >>> >> >> > >> > >>> >> >> > >> The problem here is that you want to do something that AFAIK > >>> >> >> > >> isn't > >>> >> >> > >> easy to > >>> >> >> > >> do. The Gene ST arrays allow you to summarize all the probes > >>> >> >> > >> that > >>> >> >> > >> interrogate a particular transcript (e.g., all the > exon-level > >>> >> >> > >> probesets are > >>> >> >> > >> collapsed to transcript level, and then you summarize). > >>> >> >> > >> However, > >>> >> >> > >> for > >>> >> >> > >> the > >>> >> >> > >> Exon ST arrays that isn't the case, unless there is > something > >>> >> >> > >> in > >>> >> >> > >> xps > >>> >> >> > >> to > >>> >> >> > >> allow for that - I know next to nothing about that package, > so > >>> >> >> > >> Cristian > >>> >> >> > >> Stratowa will have to chime in if I am missing something. > >>> >> >> > >> > >>> >> >> > >> For the Exon chips, you are always summarizing at the same > >>> >> >> > >> probeset > >>> >> >> > >> level, > >>> >> >> > >> where there are <= 4 probes per probeset, and there can be > any > >>> >> >> > >> number > >>> >> >> > >> of > >>> >> >> > >> probesets that interrogate a given exon. Lots of these > >>> >> >> > >> probesets > >>> >> >> > >> interrogate > >>> >> >> > >> regions that aren't even transcribed, according to current > >>> >> >> > >> knowledge > >>> >> >> > >> of the > >>> >> >> > >> genome. When you choose core, extended or full probesets, > you > >>> >> >> > >> are > >>> >> >> > >> just > >>> >> >> > >> changing the number of probesets being used, not summarizing > >>> >> >> > >> at a > >>> >> >> > >> different > >>> >> >> > >> level as with the Gene ST chip. > >>> >> >> > >> > >>> >> >> > >> So when you say you want gene symbols, refseq ids and gene > >>> >> >> > >> ids, > >>> >> >> > >> what > >>> >> >> > >> exactly > >>> >> >> > >> are you after? If a given probeset is in the intron of a > gene > >>> >> >> > >> do > >>> >> >> > >> you > >>> >> >> > >> want to > >>> >> >> > >> annotate it as being part of that gene? How about if it is > in > >>> >> >> > >> the > >>> >> >> > >> UTR > >>> >> >> > >> (or > >>> >> >> > >> really close to the UTR)? What do you want to do with the > >>> >> >> > >> probesets > >>> >> >> > >> where > >>> >> >> > >> one or more of the probes binds in multiple positions in the > >>> >> >> > >> genome? > >>> >> >> > >> These > >>> >> >> > >> are all questions that the exonmap package tries to > consider, > >>> >> >> > >> and > >>> >> >> > >> it > >>> >> >> > >> gets > >>> >> >> > >> really complicated. That's why Affy went with the Gene ST > >>> >> >> > >> chips - > >>> >> >> > >> they > >>> >> >> > >> unleashed the Exon chips on us and couldn't sell them > because > >>> >> >> > >> people > >>> >> >> > >> were > >>> >> >> > >> saying WTF do I do with this thing? > >>> >> >> > >> > >>> >> >> > >> I don't think there is an easy or obvious answer to your > >>> >> >> > >> question. > >>> >> >> > >> If > >>> >> >> > >> you > >>> >> >> > >> were to come up with what you think are reasonable answers > to > >>> >> >> > >> my > >>> >> >> > >> questions, > >>> >> >> > >> then it wouldn't be much work to extract the chr, start, end > >>> >> >> > >> from > >>> >> >> > >> the > >>> >> >> > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures > (e.g., > >>> >> >> > >> findOverlaps()) to decide what regions are being > >>> >> >> > >> interrogated, > >>> >> >> > >> and > >>> >> >> > >> annotate > >>> >> >> > >> from there. > >>> >> >> > >> > >>> >> >> > >> Best, > >>> >> >> > >> > >>> >> >> > >> Jim > >>> >> >> > >> > >>> >> >> > >> > >>> >> >> > >> > >>> >> >> > >>> > >>> >> >> > >>> I can import it as a AffyBatch and generate an > ExpressionSet > >>> >> >> > >>> with > >>> >> >> > >>> the help > >>> >> >> > >>> of the Xmap/exonmap supplied CDF, but there is no > annotation > >>> >> >> > >>> attached to > >>> >> >> > >>> it. > >>> >> >> > >>> > >>> >> >> > >>> OR > >>> >> >> > >>> > >>> >> >> > >>> I can import the CEL files with the "oligo" package as a > Exon > >>> >> >> > >>> Array > >>> >> >> > >>> object > >>> >> >> > >>> and generate an ExpressionSet from it. > >>> >> >> > >>> However in that case it still have no annotation. > >>> >> >> > >>> > >>> >> >> > >>> Surprisingly on the Bioconductor website there are all > >>> >> >> > >>> packages > >>> >> >> > >>> needed to > >>> >> >> > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to > work > >>> >> >> > >>> with > >>> >> >> > >>> Mouse > >>> >> >> > >>> Exon 1.0 ST arrays seems missing! > >>> >> >> > >>> > >>> >> >> > >>> What am I doing wrong here? Has someone else had such > >>> >> >> > >>> problems? > >>> >> >> > >>> > >>> >> >> > >>> Thanks in advance for your effort, > >>> >> >> > >>> Andreas > >>> >> >> > >>> > >>> >> >> > >>> [[alternative HTML version deleted]] > >>> >> >> > >>> > >>> >> >> > >>> _______________________________________________ > >>> >> >> > >>> Bioconductor mailing list > >>> >> >> > >>> Bioconductor@r-project.org > >>> >> >> > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> >> >> > >>> Search the archives: > >>> >> >> > >>> > >>> >> >> > >>> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> >> >> > >> > >>> >> >> > >> > >>> >> >> > >> -- > >>> >> >> > >> James W. MacDonald, M.S. > >>> >> >> > >> Biostatistician > >>> >> >> > >> University of Washington > >>> >> >> > >> Environmental and Occupational Health Sciences > >>> >> >> > >> 4225 Roosevelt Way NE, # 100 > >>> >> >> > >> Seattle WA 98105-6099 > >>> >> >> > >> > >>> >> >> > >> > >>> >> >> > >> _______________________________________________ > >>> >> >> > >> Bioconductor mailing list > >>> >> >> > >> Bioconductor@r-project.org > >>> >> >> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> >> >> > >> Search the archives: > >>> >> >> > >> > >>> >> >> > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> >> >> > > >>> >> >> > _______________________________________________ > >>> >> >> > Bioconductor mailing list > >>> >> >> > Bioconductor@r-project.org > >>> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> >> >> > Search the archives: > >>> >> >> > > http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> >> > > >>> >> > > >>> > > >>> > > >> > >> > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Jim, I'll make sure to add the comprehensive MPS as soon as I get more info about it from the specialists... However, note that the contents of the MPS files are not used by getNetAffx(), which only uses the probeset/transcript annotation file... Thanks, benilton On 27 June 2012 15:00, James Perkins <jperkins at="" biochem.ucl.ac.uk=""> wrote: > Hi, > > I wasn't sure if this was worth starting a new thread for this, since > my question is very much related to this thread... > > Is there any plan to include the "comprehensive" exon array mappings? > > E.g. for rat: > > If one goes here > > http://www.affymetrix.com/estore/browse/products.jsp?productId=13148 9&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 > > Then to Technical Documentation tab > > And downloads the > > "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full, > extended and comprehensive rn4" data > > http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx- 1_0-st-v1.r2.dt1.rn4.ps.zip > > There are the core/extended/full ps and mps files here. > > However there is also a comprehensive mps file. > > Full, core and extended are from 2006. > > The comprehensive is from 2010 (and gets updated more regularly), so > perhaps would be a better file to use for getNetAffx ? > > Apologies if this has been covered before. I am never sure of what is > the best way to analyse exon array data at the gene level. > > Thanks, > > Jim > > > > > On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: >> >> please correct the code below to: >> >> eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is available) >> >> and if you want results at the exon level >> >> eset = rma(raw, target='probeset') >> featureData(eset) = getNetAffx(raw, 'probeset') >> >> apologies for the mistake below. >> >> b >> >> On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: >> > FWIW, remember that you can obtain the contents of the annotation >> > files (the NA32 Affymetrix files) with: >> > >> > library(Biobase) >> > library(oligo) >> > raw = read.celfiles(list.celfiles()) >> > eset = rma(raw, target='transcript') >> > featureData(eset) = getNetAffx(eset, 'transcript') >> > head(fData(eset)) >> > >> > b >> > >> > On 13 June 2012 15:47, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >> >> Hi Andreas, >> >> >> >> >> >> On 6/13/2012 3:14 AM, Andreas Heider wrote: >> >>> >> >>> Dear mailing list, >> >>> I know this was on the list couple of times, and I think I read it all, >> >>> but >> >>> actually I still don't get it right. So here is my problem: >> >>> >> >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene >> >>> 1.0 >> >>> ST) in a similar fashion to eg. HG-U133 arrays. >> >>> That means, I want to finally have it accessible as an ExpressionSet >> >>> object >> >>> with a right Bioconductor annotation assigned. This should include GENE >> >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. >> >> >> >> >> >> The problem here is that you want to do something that AFAIK isn't easy to >> >> do. The Gene ST arrays allow you to summarize all the probes that >> >> interrogate a particular transcript (e.g., all the exon-level probesets are >> >> collapsed to transcript level, and then you summarize). However, for the >> >> Exon ST arrays that isn't the case, unless there is something in xps to >> >> allow for that - I know next to nothing about that package, so Cristian >> >> Stratowa will have to chime in if I am missing something. >> >> >> >> For the Exon chips, you are always summarizing at the same probeset level, >> >> where there are <= 4 probes per probeset, and there can be any number of >> >> probesets that interrogate a given exon. Lots of these probesets interrogate >> >> regions that aren't even transcribed, according to current knowledge of the >> >> genome. When you choose core, extended or full probesets, you are just >> >> changing the number of probesets being used, not summarizing at a different >> >> level as with the Gene ST chip. >> >> >> >> So when you say you want gene symbols, refseq ids and gene ids, what exactly >> >> are you after? If a given probeset is in the intron of a gene do you want to >> >> annotate it as being part of that gene? How about if it is in the UTR (or >> >> really close to the UTR)? What do you want to do with the probesets where >> >> one or more of the probes binds in multiple positions in the genome? These >> >> are all questions that the exonmap package tries to consider, and it gets >> >> really complicated. That's why Affy went with the Gene ST chips - they >> >> unleashed the Exon chips on us and couldn't sell them because people were >> >> saying WTF do I do with this thing? >> >> >> >> I don't think there is an easy or obvious answer to your question. If you >> >> were to come up with what you think are reasonable answers to my questions, >> >> then it wouldn't be much work to extract the chr, start, end from the >> >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., >> >> ?findOverlaps()) to decide what regions are being interrogated, and annotate >> >> from there. >> >> >> >> Best, >> >> >> >> Jim >> >> >> >> >> >> >> >>> >> >>> I can import it as a AffyBatch and generate an ExpressionSet with the help >> >>> of the Xmap/exonmap supplied CDF, but there is no annotation attached to >> >>> it. >> >>> >> >>> OR >> >>> >> >>> I can import the CEL files with the "oligo" package as a Exon Array object >> >>> and generate an ExpressionSet from it. >> >>> However in that case it still have no annotation. >> >>> >> >>> Surprisingly on the Bioconductor website there are all packages needed to >> >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse >> >>> Exon 1.0 ST arrays seems missing! >> >>> >> >>> What am I doing wrong here? Has someone else had such problems? >> >>> >> >>> Thanks in advance for your effort, >> >>> Andreas >> >>> >> >>> ? ? ? ?[[alternative HTML version deleted]] >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor at r-project.org >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> Search the archives: >> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> >> -- >> >> James W. MacDonald, M.S. >> >> Biostatistician >> >> University of Washington >> >> Environmental and Occupational Health Sciences >> >> 4225 Roosevelt Way NE, # 100 >> >> Seattle WA 98105-6099 >> >> >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Sorry, I meant at the rma(target=) level, not the getNetAffx level, which I *assume* uses the mps files to map between ps and transcripts? Cheers, Jim On 27 June 2012 17:27, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: > Hi Jim, > > I'll make sure to add the comprehensive MPS as soon as I get more info > about it from the specialists... > > However, note that the contents of the MPS files are not used by > getNetAffx(), which only uses the probeset/transcript annotation > file... > > Thanks, > > benilton > > On 27 June 2012 15:00, James Perkins <jperkins at="" biochem.ucl.ac.uk=""> wrote: >> Hi, >> >> I wasn't sure if this was worth starting a new thread for this, since >> my question is very much related to this thread... >> >> Is there any plan to include the "comprehensive" exon array mappings? >> >> E.g. for rat: >> >> If one goes here >> >> http://www.affymetrix.com/estore/browse/products.jsp?productId=1314 89&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 >> >> Then to Technical Documentation tab >> >> And downloads the >> >> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full, >> extended and comprehensive rn4" data >> >> http://www.affymetrix.com/Auth/support/downloads/library_files /RaEx-1_0-st-v1.r2.dt1.rn4.ps.zip >> >> There are the core/extended/full ps and mps files here. >> >> However there is also a comprehensive mps file. >> >> Full, core and extended are from 2006. >> >> The comprehensive is from 2010 (and gets updated more regularly), so >> perhaps would be a better file to use for getNetAffx ? >> >> Apologies if this has been covered before. I am never sure of what is >> the best way to analyse exon array data at the gene level. >> >> Thanks, >> >> Jim >> >> >> >> >> On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: >>> >>> please correct the code below to: >>> >>> eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is available) >>> >>> and if you want results at the exon level >>> >>> eset = rma(raw, target='probeset') >>> featureData(eset) = getNetAffx(raw, 'probeset') >>> >>> apologies for the mistake below. >>> >>> b >>> >>> On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: >>> > FWIW, remember that you can obtain the contents of the annotation >>> > files (the NA32 Affymetrix files) with: >>> > >>> > library(Biobase) >>> > library(oligo) >>> > raw = read.celfiles(list.celfiles()) >>> > eset = rma(raw, target='transcript') >>> > featureData(eset) = getNetAffx(eset, 'transcript') >>> > head(fData(eset)) >>> > >>> > b >>> > >>> > On 13 June 2012 15:47, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >>> >> Hi Andreas, >>> >> >>> >> >>> >> On 6/13/2012 3:14 AM, Andreas Heider wrote: >>> >>> >>> >>> Dear mailing list, >>> >>> I know this was on the list couple of times, and I think I read it all, >>> >>> but >>> >>> actually I still don't get it right. So here is my problem: >>> >>> >>> >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene >>> >>> 1.0 >>> >>> ST) in a similar fashion to eg. HG-U133 arrays. >>> >>> That means, I want to finally have it accessible as an ExpressionSet >>> >>> object >>> >>> with a right Bioconductor annotation assigned. This should include GENE >>> >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. >>> >> >>> >> >>> >> The problem here is that you want to do something that AFAIK isn't easy to >>> >> do. The Gene ST arrays allow you to summarize all the probes that >>> >> interrogate a particular transcript (e.g., all the exon-level probesets are >>> >> collapsed to transcript level, and then you summarize). However, for the >>> >> Exon ST arrays that isn't the case, unless there is something in xps to >>> >> allow for that - I know next to nothing about that package, so Cristian >>> >> Stratowa will have to chime in if I am missing something. >>> >> >>> >> For the Exon chips, you are always summarizing at the same probeset level, >>> >> where there are <= 4 probes per probeset, and there can be any number of >>> >> probesets that interrogate a given exon. Lots of these probesets interrogate >>> >> regions that aren't even transcribed, according to current knowledge of the >>> >> genome. When you choose core, extended or full probesets, you are just >>> >> changing the number of probesets being used, not summarizing at a different >>> >> level as with the Gene ST chip. >>> >> >>> >> So when you say you want gene symbols, refseq ids and gene ids, what exactly >>> >> are you after? If a given probeset is in the intron of a gene do you want to >>> >> annotate it as being part of that gene? How about if it is in the UTR (or >>> >> really close to the UTR)? What do you want to do with the probesets where >>> >> one or more of the probes binds in multiple positions in the genome? These >>> >> are all questions that the exonmap package tries to consider, and it gets >>> >> really complicated. That's why Affy went with the Gene ST chips - they >>> >> unleashed the Exon chips on us and couldn't sell them because people were >>> >> saying WTF do I do with this thing? >>> >> >>> >> I don't think there is an easy or obvious answer to your question. If you >>> >> were to come up with what you think are reasonable answers to my questions, >>> >> then it wouldn't be much work to extract the chr, start, end from the >>> >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., >>> >> ?findOverlaps()) to decide what regions are being interrogated, and annotate >>> >> from there. >>> >> >>> >> Best, >>> >> >>> >> Jim >>> >> >>> >> >>> >> >>> >>> >>> >>> I can import it as a AffyBatch and generate an ExpressionSet with the help >>> >>> of the Xmap/exonmap supplied CDF, but there is no annotation attached to >>> >>> it. >>> >>> >>> >>> OR >>> >>> >>> >>> I can import the CEL files with the "oligo" package as a Exon Array object >>> >>> and generate an ExpressionSet from it. >>> >>> However in that case it still have no annotation. >>> >>> >>> >>> Surprisingly on the Bioconductor website there are all packages needed to >>> >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse >>> >>> Exon 1.0 ST arrays seems missing! >>> >>> >>> >>> What am I doing wrong here? Has someone else had such problems? >>> >>> >>> >>> Thanks in advance for your effort, >>> >>> Andreas >>> >>> >>> >>> ? ? ? ?[[alternative HTML version deleted]] >>> >>> >>> >>> _______________________________________________ >>> >>> Bioconductor mailing list >>> >>> Bioconductor at r-project.org >>> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >>> Search the archives: >>> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >>> >> >>> >> -- >>> >> James W. MacDonald, M.S. >>> >> Biostatistician >>> >> University of Washington >>> >> Environmental and Occupational Health Sciences >>> >> 4225 Roosevelt Way NE, # 100 >>> >> Seattle WA 98105-6099 >>> >> >>> >> >>> >> _______________________________________________ >>> >> Bioconductor mailing list >>> >> Bioconductor at r-project.org >>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> Search the archives: >>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
That's correct... the summarisation step does use the MPS... and I'll add support for our next release. b On 27 June 2012 16:37, James Perkins <jperkins at="" biochem.ucl.ac.uk=""> wrote: > Sorry, I meant at the rma(target=) level, not the getNetAffx level, > which I *assume* uses the mps files to map between ps and transcripts? > > Cheers, > > Jim > > > On 27 June 2012 17:27, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: >> Hi Jim, >> >> I'll make sure to add the comprehensive MPS as soon as I get more info >> about it from the specialists... >> >> However, note that the contents of the MPS files are not used by >> getNetAffx(), which only uses the probeset/transcript annotation >> file... >> >> Thanks, >> >> benilton >> >> On 27 June 2012 15:00, James Perkins <jperkins at="" biochem.ucl.ac.uk=""> wrote: >>> Hi, >>> >>> I wasn't sure if this was worth starting a new thread for this, since >>> my question is very much related to this thread... >>> >>> Is there any plan to include the "comprehensive" exon array mappings? >>> >>> E.g. for rat: >>> >>> If one goes here >>> >>> http://www.affymetrix.com/estore/browse/products.jsp?productId=131 489&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 >>> >>> Then to Technical Documentation tab >>> >>> And downloads the >>> >>> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full, >>> extended and comprehensive rn4" data >>> >>> http://www.affymetrix.com/Auth/support/downloads/library_files /RaEx-1_0-st-v1.r2.dt1.rn4.ps.zip >>> >>> There are the core/extended/full ps and mps files here. >>> >>> However there is also a comprehensive mps file. >>> >>> Full, core and extended are from 2006. >>> >>> The comprehensive is from 2010 (and gets updated more regularly), so >>> perhaps would be a better file to use for getNetAffx ? >>> >>> Apologies if this has been covered before. I am never sure of what is >>> the best way to analyse exon array data at the gene level. >>> >>> Thanks, >>> >>> Jim >>> >>> >>> >>> >>> On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: >>>> >>>> please correct the code below to: >>>> >>>> eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is available) >>>> >>>> and if you want results at the exon level >>>> >>>> eset = rma(raw, target='probeset') >>>> featureData(eset) = getNetAffx(raw, 'probeset') >>>> >>>> apologies for the mistake below. >>>> >>>> b >>>> >>>> On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: >>>> > FWIW, remember that you can obtain the contents of the annotation >>>> > files (the NA32 Affymetrix files) with: >>>> > >>>> > library(Biobase) >>>> > library(oligo) >>>> > raw = read.celfiles(list.celfiles()) >>>> > eset = rma(raw, target='transcript') >>>> > featureData(eset) = getNetAffx(eset, 'transcript') >>>> > head(fData(eset)) >>>> > >>>> > b >>>> > >>>> > On 13 June 2012 15:47, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >>>> >> Hi Andreas, >>>> >> >>>> >> >>>> >> On 6/13/2012 3:14 AM, Andreas Heider wrote: >>>> >>> >>>> >>> Dear mailing list, >>>> >>> I know this was on the list couple of times, and I think I read it all, >>>> >>> but >>>> >>> actually I still don't get it right. So here is my problem: >>>> >>> >>>> >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene >>>> >>> 1.0 >>>> >>> ST) in a similar fashion to eg. HG-U133 arrays. >>>> >>> That means, I want to finally have it accessible as an ExpressionSet >>>> >>> object >>>> >>> with a right Bioconductor annotation assigned. This should include GENE >>>> >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. >>>> >> >>>> >> >>>> >> The problem here is that you want to do something that AFAIK isn't easy to >>>> >> do. The Gene ST arrays allow you to summarize all the probes that >>>> >> interrogate a particular transcript (e.g., all the exon-level probesets are >>>> >> collapsed to transcript level, and then you summarize). However, for the >>>> >> Exon ST arrays that isn't the case, unless there is something in xps to >>>> >> allow for that - I know next to nothing about that package, so Cristian >>>> >> Stratowa will have to chime in if I am missing something. >>>> >> >>>> >> For the Exon chips, you are always summarizing at the same probeset level, >>>> >> where there are <= 4 probes per probeset, and there can be any number of >>>> >> probesets that interrogate a given exon. Lots of these probesets interrogate >>>> >> regions that aren't even transcribed, according to current knowledge of the >>>> >> genome. When you choose core, extended or full probesets, you are just >>>> >> changing the number of probesets being used, not summarizing at a different >>>> >> level as with the Gene ST chip. >>>> >> >>>> >> So when you say you want gene symbols, refseq ids and gene ids, what exactly >>>> >> are you after? If a given probeset is in the intron of a gene do you want to >>>> >> annotate it as being part of that gene? How about if it is in the UTR (or >>>> >> really close to the UTR)? What do you want to do with the probesets where >>>> >> one or more of the probes binds in multiple positions in the genome? These >>>> >> are all questions that the exonmap package tries to consider, and it gets >>>> >> really complicated. That's why Affy went with the Gene ST chips - they >>>> >> unleashed the Exon chips on us and couldn't sell them because people were >>>> >> saying WTF do I do with this thing? >>>> >> >>>> >> I don't think there is an easy or obvious answer to your question. If you >>>> >> were to come up with what you think are reasonable answers to my questions, >>>> >> then it wouldn't be much work to extract the chr, start, end from the >>>> >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., >>>> >> ?findOverlaps()) to decide what regions are being interrogated, and annotate >>>> >> from there. >>>> >> >>>> >> Best, >>>> >> >>>> >> Jim >>>> >> >>>> >> >>>> >> >>>> >>> >>>> >>> I can import it as a AffyBatch and generate an ExpressionSet with the help >>>> >>> of the Xmap/exonmap supplied CDF, but there is no annotation attached to >>>> >>> it. >>>> >>> >>>> >>> OR >>>> >>> >>>> >>> I can import the CEL files with the "oligo" package as a Exon Array object >>>> >>> and generate an ExpressionSet from it. >>>> >>> However in that case it still have no annotation. >>>> >>> >>>> >>> Surprisingly on the Bioconductor website there are all packages needed to >>>> >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse >>>> >>> Exon 1.0 ST arrays seems missing! >>>> >>> >>>> >>> What am I doing wrong here? Has someone else had such problems? >>>> >>> >>>> >>> Thanks in advance for your effort, >>>> >>> Andreas >>>> >>> >>>> >>> ? ? ? ?[[alternative HTML version deleted]] >>>> >>> >>>> >>> _______________________________________________ >>>> >>> Bioconductor mailing list >>>> >>> Bioconductor at r-project.org >>>> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> >>> Search the archives: >>>> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >> >>>> >> >>>> >> -- >>>> >> James W. MacDonald, M.S. >>>> >> Biostatistician >>>> >> University of Washington >>>> >> Environmental and Occupational Health Sciences >>>> >> 4225 Roosevelt Way NE, # 100 >>>> >> Seattle WA 98105-6099 >>>> >> >>>> >> >>>> >> _______________________________________________ >>>> >> Bioconductor mailing list >>>> >> Bioconductor at r-project.org >>>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> >> Search the archives: >>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Great, Thanks, I'll look out for it! And thanks a lot Andreas for the suggestion of using ensembl exon ids, that sounds good, thanks for all your help. Cheers! Jim On 27 June 2012 17:44, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: > That's correct... the summarisation step does use the MPS... and I'll > add support for our next release. b > > On 27 June 2012 16:37, James Perkins <jperkins at="" biochem.ucl.ac.uk=""> wrote: >> Sorry, I meant at the rma(target=) level, not the getNetAffx level, >> which I *assume* uses the mps files to map between ps and transcripts? >> >> Cheers, >> >> Jim >> >> >> On 27 June 2012 17:27, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: >>> Hi Jim, >>> >>> I'll make sure to add the comprehensive MPS as soon as I get more info >>> about it from the specialists... >>> >>> However, note that the contents of the MPS files are not used by >>> getNetAffx(), which only uses the probeset/transcript annotation >>> file... >>> >>> Thanks, >>> >>> benilton >>> >>> On 27 June 2012 15:00, James Perkins <jperkins at="" biochem.ucl.ac.uk=""> wrote: >>>> Hi, >>>> >>>> I wasn't sure if this was worth starting a new thread for this, since >>>> my question is very much related to this thread... >>>> >>>> Is there any plan to include the "comprehensive" exon array mappings? >>>> >>>> E.g. for rat: >>>> >>>> If one goes here >>>> >>>> http://www.affymetrix.com/estore/browse/products.jsp?productId=13 1489&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1 >>>> >>>> Then to Technical Documentation tab >>>> >>>> And downloads the >>>> >>>> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full, >>>> extended and comprehensive rn4" data >>>> >>>> http://www.affymetrix.com/Auth/support/downloads/library_files /RaEx-1_0-st-v1.r2.dt1.rn4.ps.zip >>>> >>>> There are the core/extended/full ps and mps files here. >>>> >>>> However there is also a comprehensive mps file. >>>> >>>> Full, core and extended are from 2006. >>>> >>>> The comprehensive is from 2010 (and gets updated more regularly), so >>>> perhaps would be a better file to use for getNetAffx ? >>>> >>>> Apologies if this has been covered before. I am never sure of what is >>>> the best way to analyse exon array data at the gene level. >>>> >>>> Thanks, >>>> >>>> Jim >>>> >>>> >>>> >>>> >>>> On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: >>>>> >>>>> please correct the code below to: >>>>> >>>>> eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is available) >>>>> >>>>> and if you want results at the exon level >>>>> >>>>> eset = rma(raw, target='probeset') >>>>> featureData(eset) = getNetAffx(raw, 'probeset') >>>>> >>>>> apologies for the mistake below. >>>>> >>>>> b >>>>> >>>>> On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> wrote: >>>>> > FWIW, remember that you can obtain the contents of the annotation >>>>> > files (the NA32 Affymetrix files) with: >>>>> > >>>>> > library(Biobase) >>>>> > library(oligo) >>>>> > raw = read.celfiles(list.celfiles()) >>>>> > eset = rma(raw, target='transcript') >>>>> > featureData(eset) = getNetAffx(eset, 'transcript') >>>>> > head(fData(eset)) >>>>> > >>>>> > b >>>>> > >>>>> > On 13 June 2012 15:47, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >>>>> >> Hi Andreas, >>>>> >> >>>>> >> >>>>> >> On 6/13/2012 3:14 AM, Andreas Heider wrote: >>>>> >>> >>>>> >>> Dear mailing list, >>>>> >>> I know this was on the list couple of times, and I think I read it all, >>>>> >>> but >>>>> >>> actually I still don't get it right. So here is my problem: >>>>> >>> >>>>> >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene >>>>> >>> 1.0 >>>>> >>> ST) in a similar fashion to eg. HG-U133 arrays. >>>>> >>> That means, I want to finally have it accessible as an ExpressionSet >>>>> >>> object >>>>> >>> with a right Bioconductor annotation assigned. This should include GENE >>>>> >>> SYMBOLS, RefSeq IDs and ENTREZ IDs. >>>>> >> >>>>> >> >>>>> >> The problem here is that you want to do something that AFAIK isn't easy to >>>>> >> do. The Gene ST arrays allow you to summarize all the probes that >>>>> >> interrogate a particular transcript (e.g., all the exon-level probesets are >>>>> >> collapsed to transcript level, and then you summarize). However, for the >>>>> >> Exon ST arrays that isn't the case, unless there is something in xps to >>>>> >> allow for that - I know next to nothing about that package, so Cristian >>>>> >> Stratowa will have to chime in if I am missing something. >>>>> >> >>>>> >> For the Exon chips, you are always summarizing at the same probeset level, >>>>> >> where there are <= 4 probes per probeset, and there can be any number of >>>>> >> probesets that interrogate a given exon. Lots of these probesets interrogate >>>>> >> regions that aren't even transcribed, according to current knowledge of the >>>>> >> genome. When you choose core, extended or full probesets, you are just >>>>> >> changing the number of probesets being used, not summarizing at a different >>>>> >> level as with the Gene ST chip. >>>>> >> >>>>> >> So when you say you want gene symbols, refseq ids and gene ids, what exactly >>>>> >> are you after? If a given probeset is in the intron of a gene do you want to >>>>> >> annotate it as being part of that gene? How about if it is in the UTR (or >>>>> >> really close to the UTR)? What do you want to do with the probesets where >>>>> >> one or more of the probes binds in multiple positions in the genome? These >>>>> >> are all questions that the exonmap package tries to consider, and it gets >>>>> >> really complicated. That's why Affy went with the Gene ST chips - they >>>>> >> unleashed the Exon chips on us and couldn't sell them because people were >>>>> >> saying WTF do I do with this thing? >>>>> >> >>>>> >> I don't think there is an easy or obvious answer to your question. If you >>>>> >> were to come up with what you think are reasonable answers to my questions, >>>>> >> then it wouldn't be much work to extract the chr, start, end from the >>>>> >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g., >>>>> >> ?findOverlaps()) to decide what regions are being interrogated, and annotate >>>>> >> from there. >>>>> >> >>>>> >> Best, >>>>> >> >>>>> >> Jim >>>>> >> >>>>> >> >>>>> >> >>>>> >>> >>>>> >>> I can import it as a AffyBatch and generate an ExpressionSet with the help >>>>> >>> of the Xmap/exonmap supplied CDF, but there is no annotation attached to >>>>> >>> it. >>>>> >>> >>>>> >>> OR >>>>> >>> >>>>> >>> I can import the CEL files with the "oligo" package as a Exon Array object >>>>> >>> and generate an ExpressionSet from it. >>>>> >>> However in that case it still have no annotation. >>>>> >>> >>>>> >>> Surprisingly on the Bioconductor website there are all packages needed to >>>>> >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse >>>>> >>> Exon 1.0 ST arrays seems missing! >>>>> >>> >>>>> >>> What am I doing wrong here? Has someone else had such problems? >>>>> >>> >>>>> >>> Thanks in advance for your effort, >>>>> >>> Andreas >>>>> >>> >>>>> >>> ? ? ? ?[[alternative HTML version deleted]] >>>>> >>> >>>>> >>> _______________________________________________ >>>>> >>> Bioconductor mailing list >>>>> >>> Bioconductor at r-project.org >>>>> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> >>> Search the archives: >>>>> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> James W. MacDonald, M.S. >>>>> >> Biostatistician >>>>> >> University of Washington >>>>> >> Environmental and Occupational Health Sciences >>>>> >> 4225 Roosevelt Way NE, # 100 >>>>> >> Seattle WA 98105-6099 >>>>> >> >>>>> >> >>>>> >> _______________________________________________ >>>>> >> Bioconductor mailing list >>>>> >> Bioconductor at r-project.org >>>>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> >> Search the archives: >>>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6