Question

Loading quantiles normalized root data in XPS

0

Entering edit mode

Paul Geeleher ★ 1.3k

@paul-geeleher-2679

Last seen 10.7 years ago

Hi, I've used xps to quantiles normalize (at probe level) some Affy Exon Array data. I now have a "root" file called "exon_quantiles.root", but if I try to load it the same was I'd load my raw data (using the scheme file I created for Affy exon arrays) I get the error below? I can load my raw data just fine though. Any ideas? Do I perhaps need a different "root scheme" file for this normalized data? Unfortunately, I haven't been able to find an answer. > scheme.huex10stv2 <- root.scheme("huex10stv2.root") > data_qn <- root.data(scheme.huex10stv2, "exon_quantiles.root") Error in if (chipname != treetitle) { : argument is of length zero Hope someone can help, Paul. > sessionInfo() R version 2.11.0 (2010-04-22) x86_64-redhat-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com

probe affy xps probe affy xps • 2.3k views

ADD COMMENT • link updated 13.2 years ago by cstrato ★ 3.9k • written 13.2 years ago by Paul Geeleher ★ 1.3k

score 0 · Answer 1 · 2012-02-01

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.6 years ago

Austria

Dear Paul, Please have a look at the help ?root.expr. If I understand you correctly, you did only do quantile normalization? To see the tree names in your file you should do: > getTreeNames("exon_quantiles.root") You will probably see trees with extension "int", see help ?validTreetype. To load these trees you need to do: > data_qn <- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") Please let me know if this did solve your problem. Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 2/1/12 7:07 PM, Paul Geeleher wrote: > Hi, > > I've used xps to quantiles normalize (at probe level) some Affy Exon > Array data. I now have a "root" file called "exon_quantiles.root", but > if I try to load it the same was I'd load my raw data (using the > scheme file I created for Affy exon arrays) I get the error below? I > can load my raw data just fine though. Any ideas? Do I perhaps need a > different "root scheme" file for this normalized data? Unfortunately, > I haven't been able to find an answer. > >> scheme.huex10stv2<- root.scheme("huex10stv2.root") >> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") > Error in if (chipname != treetitle) { : argument is of length zero > > Hope someone can help, > > Paul. > > > >> sessionInfo() > R version 2.11.0 (2010-04-22) > x86_64-redhat-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > >

ADD COMMENT • link 13.2 years ago cstrato ★ 3.9k

0

Entering edit mode

Hi Christian, Thanks for your quick reply. I check what kind of trees I have using "getTreeNames()" as you'd suggested, it seems they are of type "cqu" rather than "int", this is presumably because my analysis required no background correction step? So I then tried: > data_qn <- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu") but that gives me a huge number of errors that look like this: Error in <tfile::cd>: Unknown directory PreprocesSet Error: Could not get directory <preprocesset>. Error in <tfile::cd>: Unknown directory PreprocesSet Error: Could not get directory <preprocesset>. Error in <tfile::cd>: Unknown directory PreprocesSet Error: Could not get directory <preprocesset>. Error: Could not get tree <exportset>. Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", : error in function ?ExportData? This file "exon_quantiles.root" definitely exists in the current working directory though... Thanks again for your help! Paul. On Wed, Feb 1, 2012 at 9:01 PM, cstrato <cstrato at="" aon.at=""> wrote: > Dear Paul, > > Please have a look at the help ?root.expr. > > If I understand you correctly, you did only do quantile normalization? > > To see the tree names in your file you should do: >> getTreeNames("exon_quantiles.root") > > You will probably see trees with extension "int", see help ?validTreetype. > > To load these trees you need to do: >> data_qn <- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") > > Please let me know if this did solve your problem. > > Best regards > Christian > _._._._._._._._._._._._._._._._._._ > C.h.r.i.s.t.i.a.n ? S.t.r.a.t.o.w.a > V.i.e.n.n.a ? ? ? ? ? A.u.s.t.r.i.a > e.m.a.i.l: ? ? ? ?cstrato at aon.at > _._._._._._._._._._._._._._._._._._ > > > > > On 2/1/12 7:07 PM, Paul Geeleher wrote: >> >> Hi, >> >> I've used xps to quantiles normalize (at probe level) some Affy Exon >> Array data. I now have a "root" file called "exon_quantiles.root", but >> if I try to load it the same was I'd load my raw data (using the >> scheme file I created for Affy exon arrays) I get the error below? I >> can load my raw data just fine though. Any ideas? Do I perhaps need a >> different "root scheme" file for this normalized data? Unfortunately, >> I haven't been able to find an answer. >> >>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >> >> Error in if (chipname != treetitle) { : argument is of length zero >> >> Hope someone can help, >> >> Paul. >> >> >> >>> sessionInfo() >> >> R version 2.11.0 (2010-04-22) >> x86_64-redhat-linux-gnu >> >> locale: >> ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C >> ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 >> ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8 >> ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C >> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> >> >> > -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com

ADD REPLY • link 13.2 years ago Paul Geeleher ★ 1.3k

0

Entering edit mode

Dear Paul, The functions root.data(), root.call() and root.expr() were created to allow you access to the corresponding root files just in case that you did not save your R session. In the cases where you compute expression levels stepwise, or only part of them such as normalize.quantiles(), as seems to be the matter in your case, there is no corresponding root.xxx() function to access the root file directly. In these cases you need to save your R session to have continued access to the resulting root file. Please note that saving the R session is the usual case to have access to the root files. Best regards Christian On 2/2/12 1:12 PM, Paul Geeleher wrote: > Hi Christian, > > Thanks for your quick reply. I check what kind of trees I have using > "getTreeNames()" as you'd suggested, it seems they are of type "cqu" > rather than "int", this is presumably because my analysis required no > background correction step? > > So I then tried: >> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu") > > but that gives me a huge number of errors that look like this: > > Error in<tfile::cd>: Unknown directory PreprocesSet > Error: Could not get directory<preprocesset>. > Error in<tfile::cd>: Unknown directory PreprocesSet > Error: Could not get directory<preprocesset>. > Error in<tfile::cd>: Unknown directory PreprocesSet > Error: Could not get directory<preprocesset>. > Error: Could not get tree<exportset>. > Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", : > error in function ?ExportData? > > > This file "exon_quantiles.root" definitely exists in the current > working directory though... Thanks again for your help! > > Paul. > > > > On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> wrote: >> Dear Paul, >> >> Please have a look at the help ?root.expr. >> >> If I understand you correctly, you did only do quantile normalization? >> >> To see the tree names in your file you should do: >>> getTreeNames("exon_quantiles.root") >> >> You will probably see trees with extension "int", see help ?validTreetype. >> >> To load these trees you need to do: >>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") >> >> Please let me know if this did solve your problem. >> >> Best regards >> Christian >> _._._._._._._._._._._._._._._._._._ >> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >> V.i.e.n.n.a A.u.s.t.r.i.a >> e.m.a.i.l: cstrato at aon.at >> _._._._._._._._._._._._._._._._._._ >> >> >> >> >> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>> >>> Hi, >>> >>> I've used xps to quantiles normalize (at probe level) some Affy Exon >>> Array data. I now have a "root" file called "exon_quantiles.root", but >>> if I try to load it the same was I'd load my raw data (using the >>> scheme file I created for Affy exon arrays) I get the error below? I >>> can load my raw data just fine though. Any ideas? Do I perhaps need a >>> different "root scheme" file for this normalized data? Unfortunately, >>> I haven't been able to find an answer. >>> >>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>> >>> Error in if (chipname != treetitle) { : argument is of length zero >>> >>> Hope someone can help, >>> >>> Paul. >>> >>> >>> >>>> sessionInfo() >>> >>> R version 2.11.0 (2010-04-22) >>> x86_64-redhat-linux-gnu >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> >>> >> > > >

ADD REPLY • link 13.2 years ago cstrato ★ 3.9k

0

Entering edit mode

Hi Christian, Thanks for your quick and informative reply. I have re-run the analysis and saved the R objects as you suggest. The next thing I'm trying to do is to obtain the expression levels of the probes, but this doesn't seem to be working for me: > a <- validData(data_hapMap) Error in .local(object, ...) : slot "data" has no data Based on the documentation I think validData() is the correct function. I've also performed probeset level DABG and I'm trying to set individual probes which belong to probesets with DABG < .05 to 0 in the expression matrix. But it seems I can't see the expression matrix using validData(). Perhaps there is another function. Any ideas? Thank you again for your help with this, I'm very grateful! Paul. On 2/2/12, cstrato <cstrato at="" aon.at=""> wrote: > Dear Paul, > > The functions root.data(), root.call() and root.expr() were created to > allow you access to the corresponding root files just in case that you > did not save your R session. > > In the cases where you compute expression levels stepwise, or only part > of them such as normalize.quantiles(), as seems to be the matter in your > case, there is no corresponding root.xxx() function to access the root > file directly. In these cases you need to save your R session to have > continued access to the resulting root file. > > Please note that saving the R session is the usual case to have access > to the root files. > > Best regards > Christian > > > On 2/2/12 1:12 PM, Paul Geeleher wrote: >> Hi Christian, >> >> Thanks for your quick reply. I check what kind of trees I have using >> "getTreeNames()" as you'd suggested, it seems they are of type "cqu" >> rather than "int", this is presumably because my analysis required no >> background correction step? >> >> So I then tried: >>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu") >> >> but that gives me a huge number of errors that look like this: >> >> Error in<tfile::cd>: Unknown directory PreprocesSet >> Error: Could not get directory<preprocesset>. >> Error in<tfile::cd>: Unknown directory PreprocesSet >> Error: Could not get directory<preprocesset>. >> Error in<tfile::cd>: Unknown directory PreprocesSet >> Error: Could not get directory<preprocesset>. >> Error: Could not get tree<exportset>. >> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", : >> error in function ?ExportData? >> >> >> This file "exon_quantiles.root" definitely exists in the current >> working directory though... Thanks again for your help! >> >> Paul. >> >> >> >> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> wrote: >>> Dear Paul, >>> >>> Please have a look at the help ?root.expr. >>> >>> If I understand you correctly, you did only do quantile normalization? >>> >>> To see the tree names in your file you should do: >>>> getTreeNames("exon_quantiles.root") >>> >>> You will probably see trees with extension "int", see help >>> ?validTreetype. >>> >>> To load these trees you need to do: >>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") >>> >>> Please let me know if this did solve your problem. >>> >>> Best regards >>> Christian >>> _._._._._._._._._._._._._._._._._._ >>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >>> V.i.e.n.n.a A.u.s.t.r.i.a >>> e.m.a.i.l: cstrato at aon.at >>> _._._._._._._._._._._._._._._._._._ >>> >>> >>> >>> >>> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>>> >>>> Hi, >>>> >>>> I've used xps to quantiles normalize (at probe level) some Affy Exon >>>> Array data. I now have a "root" file called "exon_quantiles.root", but >>>> if I try to load it the same was I'd load my raw data (using the >>>> scheme file I created for Affy exon arrays) I get the error below? I >>>> can load my raw data just fine though. Any ideas? Do I perhaps need a >>>> different "root scheme" file for this normalized data? Unfortunately, >>>> I haven't been able to find an answer. >>>> >>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>>> >>>> Error in if (chipname != treetitle) { : argument is of length zero >>>> >>>> Hope someone can help, >>>> >>>> Paul. >>>> >>>> >>>> >>>>> sessionInfo() >>>> >>>> R version 2.11.0 (2010-04-22) >>>> x86_64-redhat-linux-gnu >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> >>>> >>> >> >> >> > -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com

ADD REPLY • link 13.2 years ago Paul Geeleher ★ 1.3k

0

Entering edit mode

Dear Paul, I am afraid that it is not quite clear to me what you want to do. What do you mean with "obtain the expression levels of the probes"? Do you mean "probes" (i.e. oligos) or do you mean "probesets" as in your question about DABG? How did you create "data_hapMap"? Maybe you could send me your complete code otherwise I am only able to guess. I guess that "data_hapMap" contains the raw data. For these the slot "data" is empty to save memory. So you need to use either attachData() or attachInten(). However since you are using exon arrays you may not have enough RAM, so it would be better to use function export() or export.data(), or attach only a subset, see help ?attachData. See also vignette xps.pdf (chapter 2.3). When you talk about "expression matrix", how did you create it? Maybe you could use function validExpr(), but w/o seeing your code it is hard to tell. For DABG there are functions pvalData() and presCall(), see the examples in help ?dabg.call. Best regards Christian On 2/6/12 4:33 PM, Paul Geeleher wrote: > Hi Christian, > > Thanks for your quick and informative reply. > > I have re-run the analysis and saved the R objects as you suggest. The > next thing I'm trying to do is to obtain the expression levels of the > probes, but this doesn't seem to be working for me: > >> a<- validData(data_hapMap) > Error in .local(object, ...) : slot "data" has no data > > Based on the documentation I think validData() is the correct function. > > I've also performed probeset level DABG and I'm trying to set > individual probes which belong to probesets with DABG< .05 to 0 in > the expression matrix. > > But it seems I can't see the expression matrix using validData(). > Perhaps there is another function. Any ideas? > > Thank you again for your help with this, I'm very grateful! > > Paul. > > On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote: >> Dear Paul, >> >> The functions root.data(), root.call() and root.expr() were created to >> allow you access to the corresponding root files just in case that you >> did not save your R session. >> >> In the cases where you compute expression levels stepwise, or only part >> of them such as normalize.quantiles(), as seems to be the matter in your >> case, there is no corresponding root.xxx() function to access the root >> file directly. In these cases you need to save your R session to have >> continued access to the resulting root file. >> >> Please note that saving the R session is the usual case to have access >> to the root files. >> >> Best regards >> Christian >> >> >> On 2/2/12 1:12 PM, Paul Geeleher wrote: >>> Hi Christian, >>> >>> Thanks for your quick reply. I check what kind of trees I have using >>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu" >>> rather than "int", this is presumably because my analysis required no >>> background correction step? >>> >>> So I then tried: >>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu") >>> >>> but that gives me a huge number of errors that look like this: >>> >>> Error in<tfile::cd>: Unknown directory PreprocesSet >>> Error: Could not get directory<preprocesset>. >>> Error in<tfile::cd>: Unknown directory PreprocesSet >>> Error: Could not get directory<preprocesset>. >>> Error in<tfile::cd>: Unknown directory PreprocesSet >>> Error: Could not get directory<preprocesset>. >>> Error: Could not get tree<exportset>. >>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", : >>> error in function ?ExportData? >>> >>> >>> This file "exon_quantiles.root" definitely exists in the current >>> working directory though... Thanks again for your help! >>> >>> Paul. >>> >>> >>> >>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> wrote: >>>> Dear Paul, >>>> >>>> Please have a look at the help ?root.expr. >>>> >>>> If I understand you correctly, you did only do quantile normalization? >>>> >>>> To see the tree names in your file you should do: >>>>> getTreeNames("exon_quantiles.root") >>>> >>>> You will probably see trees with extension "int", see help >>>> ?validTreetype. >>>> >>>> To load these trees you need to do: >>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") >>>> >>>> Please let me know if this did solve your problem. >>>> >>>> Best regards >>>> Christian >>>> _._._._._._._._._._._._._._._._._._ >>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >>>> V.i.e.n.n.a A.u.s.t.r.i.a >>>> e.m.a.i.l: cstrato at aon.at >>>> _._._._._._._._._._._._._._._._._._ >>>> >>>> >>>> >>>> >>>> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>>>> >>>>> Hi, >>>>> >>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon >>>>> Array data. I now have a "root" file called "exon_quantiles.root", but >>>>> if I try to load it the same was I'd load my raw data (using the >>>>> scheme file I created for Affy exon arrays) I get the error below? I >>>>> can load my raw data just fine though. Any ideas? Do I perhaps need a >>>>> different "root scheme" file for this normalized data? Unfortunately, >>>>> I haven't been able to find an answer. >>>>> >>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>>>> >>>>> Error in if (chipname != treetitle) { : argument is of length zero >>>>> >>>>> Hope someone can help, >>>>> >>>>> Paul. >>>>> >>>>> >>>>> >>>>>> sessionInfo() >>>>> >>>>> R version 2.11.0 (2010-04-22) >>>>> x86_64-redhat-linux-gnu >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> > >

ADD REPLY • link 13.2 years ago cstrato ★ 3.9k

0

Entering edit mode

Hi Christian, apologies for the lack of clarity in my previous email, I'll try to clear this up, see replies below: On Mon, Feb 6, 2012 at 7:51 PM, cstrato <cstrato at="" aon.at=""> wrote: > Dear Paul, > > I am afraid that it is not quite clear to me what you want to do. > > What do you mean with "obtain the expression levels of the probes"? Do you > mean "probes" (i.e. oligos) or do you mean "probesets" as in your question > about DABG? I've read in and quantiles normalized probe level data for 176 arrays and what I'm trying to do now is set every probe (individual probe, not probeset) that is not detected above background to zero. But from what I've read dabg.call() can only create p-values for probeset level? So as compromise I'm now trying to set the expression of all *individual probes* which belong to a *probeset* not detected above background to zero. I hope that makes sense. > > How did you create "data_hapMap"? Maybe you could send me your complete code > otherwise I am only able to guess. "data_hapMap" was a mistake, sorry I copied the wrong object name it should have been "data_qn" which is quantiles normalized probe level data that I created like this: data_hapMap <- root.data(scheme.huex10stv2, "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.root ") #read raw data data_qn <- normalize.quantiles(data_hapMap, "exon_quantiles", filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe* level So "data_qn" is the object I want to work with, based on your advice I've figured out that to access the expression levels in "data_qn" I need to use "attachInten()", then I can use "intensities()" to access the expression levels: treenames <- unlist(treeNames(data_qn)) data_qn <- attachInten(data_qn, treenames=treenames[1:2]) #attach the first two samples > head(intensity(data_qn)) X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN 1 0 0 0.0000 0.000 2 1 0 0.0000 0.000 3 2 0 0.0000 0.000 4 3 0 0.0000 0.000 5 4 0 0.0000 0.000 6 5 0 85.3523 121.733 Does the X column in the output above represent the probe ID? If so and I have a mapping of probeset IDs to corresponding probe IDs, it should be fairly straightforward to set probes that are not detected above background to zero? Perhaps there is a more straightforward way of doing this though? > > I guess that "data_hapMap" contains the raw data. For these the slot "data" > is empty to save memory. So you need to use either attachData() or > attachInten(). However since you are using exon arrays you may not have > enough RAM, so it would be better to use function export() or export.data(), > or attach only a subset, see help ?attachData. See also vignette xps.pdf > (chapter 2.3). I think RAM shouldn't be an issue if I attach the samples one at a time? (I actually have access to a machine with 32gigs RAM but ideally would like to get what I'm doing to run on a standard desktop, which is actually why I'm using XPS!). > > When you talk about "expression matrix", how did you create it? Maybe you > could use function validExpr(), but w/o seeing your code it is hard to tell. > For DABG there are functions pvalData() and presCall(), see the examples in > help ?dabg.call. Yes I've managed to use dabg.call() at probeset level and access the p-values using pvalData() alright. Thanks again for all of your help and patience! Kind Regards, Paul. > > Best regards > Christian > > > > On 2/6/12 4:33 PM, Paul Geeleher wrote: >> >> Hi Christian, >> >> Thanks for your quick and informative reply. >> >> I have re-run the analysis and saved the R objects as you suggest. The >> next thing I'm trying to do is to obtain the expression levels of the >> probes, but this doesn't seem to be working for me: >> >>> a<- validData(data_hapMap) >> >> Error in .local(object, ...) : slot "data" has no data >> >> Based on the documentation I think validData() is the correct function. >> >> I've also performed probeset level DABG and I'm trying to set >> individual probes which belong to probesets with DABG< ?.05 to 0 in >> the expression matrix. >> >> But it seems I can't see the expression matrix using validData(). >> Perhaps there is another function. Any ideas? >> >> Thank you again for your help with this, I'm very grateful! >> >> Paul. >> >> On 2/2/12, cstrato<cstrato at="" aon.at=""> ?wrote: >>> >>> Dear Paul, >>> >>> The functions root.data(), root.call() and root.expr() were created to >>> allow you access to the corresponding root files just in case that you >>> did not save your R session. >>> >>> In the cases where you compute expression levels stepwise, or only part >>> of them such as normalize.quantiles(), as seems to be the matter in your >>> case, there is no corresponding root.xxx() function to access the root >>> file directly. In these cases you need to save your R session to have >>> continued access to the resulting root file. >>> >>> Please note that saving the R session is the usual case to have access >>> to the root files. >>> >>> Best regards >>> Christian >>> >>> >>> On 2/2/12 1:12 PM, Paul Geeleher wrote: >>>> >>>> Hi Christian, >>>> >>>> Thanks for your quick reply. I check what kind of trees I have using >>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu" >>>> rather than "int", this is presumably because my analysis required no >>>> background correction step? >>>> >>>> So I then tried: >>>>> >>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu") >>>> >>>> >>>> but that gives me a huge number of errors that look like this: >>>> >>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>> Error: Could not get directory<preprocesset>. >>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>> Error: Could not get directory<preprocesset>. >>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>> Error: Could not get directory<preprocesset>. >>>> Error: Could not get tree<exportset>. >>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", ?: >>>> ? ?error in function ?ExportData? >>>> >>>> >>>> This file "exon_quantiles.root" definitely exists in the current >>>> working directory though... Thanks again for your help! >>>> >>>> Paul. >>>> >>>> >>>> >>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> ? wrote: >>>>> >>>>> Dear Paul, >>>>> >>>>> Please have a look at the help ?root.expr. >>>>> >>>>> If I understand you correctly, you did only do quantile normalization? >>>>> >>>>> To see the tree names in your file you should do: >>>>>> >>>>>> getTreeNames("exon_quantiles.root") >>>>> >>>>> >>>>> You will probably see trees with extension "int", see help >>>>> ?validTreetype. >>>>> >>>>> To load these trees you need to do: >>>>>> >>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") >>>>> >>>>> >>>>> Please let me know if this did solve your problem. >>>>> >>>>> Best regards >>>>> Christian >>>>> _._._._._._._._._._._._._._._._._._ >>>>> C.h.r.i.s.t.i.a.n ? S.t.r.a.t.o.w.a >>>>> V.i.e.n.n.a ? ? ? ? ? A.u.s.t.r.i.a >>>>> e.m.a.i.l: ? ? ? ?cstrato at aon.at >>>>> _._._._._._._._._._._._._._._._._._ >>>>> >>>>> >>>>> >>>>> >>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon >>>>>> Array data. I now have a "root" file called "exon_quantiles.root", but >>>>>> if I try to load it the same was I'd load my raw data (using the >>>>>> scheme file I created for Affy exon arrays) I get the error below? I >>>>>> can load my raw data just fine though. Any ideas? Do I perhaps need a >>>>>> different "root scheme" file for this normalized data? Unfortunately, >>>>>> I haven't been able to find an answer. >>>>>> >>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>>>>> >>>>>> >>>>>> Error in if (chipname != treetitle) { : argument is of length zero >>>>>> >>>>>> Hope someone can help, >>>>>> >>>>>> Paul. >>>>>> >>>>>> >>>>>> >>>>>>> sessionInfo() >>>>>> >>>>>> >>>>>> R version 2.11.0 (2010-04-22) >>>>>> x86_64-redhat-linux-gnu >>>>>> >>>>>> locale: >>>>>> ? [1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C >>>>>> ? [3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 >>>>>> ? [5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8 >>>>>> ? [7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C >>>>>> ? [9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C >>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>> >>>>>> attached base packages: >>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> >> > -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com

ADD REPLY • link 13.2 years ago Paul Geeleher ★ 1.3k

0

Entering edit mode

Actually doing what I'm talking about relies on being able to modify values in "intensity(data_qn)", which I was assuming one could do but testing it I don't think I can!? > head(intensity(data_qn)) X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN 1 0 0 0.0000 0.000 2 1 0 0.0000 0.000 3 2 0 0.0000 0.000 4 3 0 0.0000 0.000 5 4 0 0.0000 0.000 6 5 0 85.3523 121.733 > intensity(data_qn)[6,3] [1] 85.3523 > intensity(data_qn)[6,3] <- 0 Error in dimnames(x) : 'x' is missing > head(intensity(data_qn)) X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN 1 0 0 0.0000 0.000 2 1 0 0.0000 0.000 3 2 0 0.0000 0.000 4 3 0 0.0000 0.000 5 4 0 0.0000 0.000 6 5 0 85.3523 121.733 This means a different approach may be necessary. What I can do is extract the probe level estimates for each gene, which I think you outlined in a previous email to the list (https://mailman.stat.ethz.ch/pipermail/bioconductor/2011-September/04 0963.html) and from there set those probes which aren't detected above background to zero (once I've figured out how to map probesets to probes). I can the do what I need to do with this data (which is basically to fit a model based on individual probe intensities for each gene) and do this for every gene, which won't involve modifying the intensities in data_qn and as I'm only "attaching" probe level data for one gene at a time, shouldn't use that much memory. Paul On Tue, Feb 7, 2012 at 12:52 PM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: > Hi Christian, apologies for the lack of clarity in my previous email, > I'll try to clear this up, see replies below: > > On Mon, Feb 6, 2012 at 7:51 PM, cstrato <cstrato at="" aon.at=""> wrote: >> Dear Paul, >> >> I am afraid that it is not quite clear to me what you want to do. >> >> What do you mean with "obtain the expression levels of the probes"? Do you >> mean "probes" (i.e. oligos) or do you mean "probesets" as in your question >> about DABG? > > I've read in and quantiles normalized probe level data for 176 arrays > and what I'm trying to do now is set every probe (individual probe, > not probeset) that is not detected above background to zero. But from > what I've read dabg.call() can only create p-values for probeset > level? So as compromise I'm now trying to set the expression of all > *individual probes* which belong to a *probeset* not detected above > background to zero. I hope that makes sense. > >> >> How did you create "data_hapMap"? Maybe you could send me your complete code >> otherwise I am only able to guess. > > "data_hapMap" was a mistake, sorry I copied the wrong object name it > should have been "data_qn" which is quantiles normalized probe level > data that I created like this: > > data_hapMap <- root.data(scheme.huex10stv2, > "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.ro ot") > #read raw data > data_qn <- normalize.quantiles(data_hapMap, "exon_quantiles", > filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe* > level > > So "data_qn" is the object I want to work with, based on your advice > I've figured out that to access the expression levels in "data_qn" I > need to use "attachInten()", then I can use "intensities()" to access > the expression levels: > > treenames <- unlist(treeNames(data_qn)) > data_qn <- attachInten(data_qn, treenames=treenames[1:2]) #attach the > first two samples > >> head(intensity(data_qn)) > ?X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN > 1 0 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000 > 2 1 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000 > 3 2 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000 > 4 3 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000 > 5 4 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000 > 6 5 0 ? ? ? ? ? ?85.3523 ? ? ? ? ? ?121.733 > > Does the X column in the output above represent the probe ID? If so > and I have a mapping of probeset IDs to corresponding probe IDs, it > should be fairly straightforward to set probes that are not detected > above background to zero? Perhaps there is a more straightforward way > of doing this though? > > >> >> I guess that "data_hapMap" contains the raw data. For these the slot "data" >> is empty to save memory. So you need to use either attachData() or >> attachInten(). However since you are using exon arrays you may not have >> enough RAM, so it would be better to use function export() or export.data(), >> or attach only a subset, see help ?attachData. See also vignette xps.pdf >> (chapter 2.3). > > I think RAM shouldn't be an issue if I attach the samples one at a > time? (I actually have access to a machine with 32gigs RAM but ideally > would like to get what I'm doing to run on a standard desktop, which > is actually why I'm using XPS!). > >> >> When you talk about "expression matrix", how did you create it? Maybe you >> could use function validExpr(), but w/o seeing your code it is hard to tell. >> For DABG there are functions pvalData() and presCall(), see the examples in >> help ?dabg.call. > > Yes I've managed to use dabg.call() at probeset level and access the > p-values using pvalData() alright. > > Thanks again for all of your help and patience! > > Kind Regards, > > Paul. > > >> >> Best regards >> Christian >> >> >> >> On 2/6/12 4:33 PM, Paul Geeleher wrote: >>> >>> Hi Christian, >>> >>> Thanks for your quick and informative reply. >>> >>> I have re-run the analysis and saved the R objects as you suggest. The >>> next thing I'm trying to do is to obtain the expression levels of the >>> probes, but this doesn't seem to be working for me: >>> >>>> a<- validData(data_hapMap) >>> >>> Error in .local(object, ...) : slot "data" has no data >>> >>> Based on the documentation I think validData() is the correct function. >>> >>> I've also performed probeset level DABG and I'm trying to set >>> individual probes which belong to probesets with DABG< ?.05 to 0 in >>> the expression matrix. >>> >>> But it seems I can't see the expression matrix using validData(). >>> Perhaps there is another function. Any ideas? >>> >>> Thank you again for your help with this, I'm very grateful! >>> >>> Paul. >>> >>> On 2/2/12, cstrato<cstrato at="" aon.at=""> ?wrote: >>>> >>>> Dear Paul, >>>> >>>> The functions root.data(), root.call() and root.expr() were created to >>>> allow you access to the corresponding root files just in case that you >>>> did not save your R session. >>>> >>>> In the cases where you compute expression levels stepwise, or only part >>>> of them such as normalize.quantiles(), as seems to be the matter in your >>>> case, there is no corresponding root.xxx() function to access the root >>>> file directly. In these cases you need to save your R session to have >>>> continued access to the resulting root file. >>>> >>>> Please note that saving the R session is the usual case to have access >>>> to the root files. >>>> >>>> Best regards >>>> Christian >>>> >>>> >>>> On 2/2/12 1:12 PM, Paul Geeleher wrote: >>>>> >>>>> Hi Christian, >>>>> >>>>> Thanks for your quick reply. I check what kind of trees I have using >>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu" >>>>> rather than "int", this is presumably because my analysis required no >>>>> background correction step? >>>>> >>>>> So I then tried: >>>>>> >>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu") >>>>> >>>>> >>>>> but that gives me a huge number of errors that look like this: >>>>> >>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>> Error: Could not get directory<preprocesset>. >>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>> Error: Could not get directory<preprocesset>. >>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>> Error: Could not get directory<preprocesset>. >>>>> Error: Could not get tree<exportset>. >>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", ?: >>>>> ? ?error in function ?ExportData? >>>>> >>>>> >>>>> This file "exon_quantiles.root" definitely exists in the current >>>>> working directory though... Thanks again for your help! >>>>> >>>>> Paul. >>>>> >>>>> >>>>> >>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> ? wrote: >>>>>> >>>>>> Dear Paul, >>>>>> >>>>>> Please have a look at the help ?root.expr. >>>>>> >>>>>> If I understand you correctly, you did only do quantile normalization? >>>>>> >>>>>> To see the tree names in your file you should do: >>>>>>> >>>>>>> getTreeNames("exon_quantiles.root") >>>>>> >>>>>> >>>>>> You will probably see trees with extension "int", see help >>>>>> ?validTreetype. >>>>>> >>>>>> To load these trees you need to do: >>>>>>> >>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") >>>>>> >>>>>> >>>>>> Please let me know if this did solve your problem. >>>>>> >>>>>> Best regards >>>>>> Christian >>>>>> _._._._._._._._._._._._._._._._._._ >>>>>> C.h.r.i.s.t.i.a.n ? S.t.r.a.t.o.w.a >>>>>> V.i.e.n.n.a ? ? ? ? ? A.u.s.t.r.i.a >>>>>> e.m.a.i.l: ? ? ? ?cstrato at aon.at >>>>>> _._._._._._._._._._._._._._._._._._ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon >>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", but >>>>>>> if I try to load it the same was I'd load my raw data (using the >>>>>>> scheme file I created for Affy exon arrays) I get the error below? I >>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps need a >>>>>>> different "root scheme" file for this normalized data? Unfortunately, >>>>>>> I haven't been able to find an answer. >>>>>>> >>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>>>>>> >>>>>>> >>>>>>> Error in if (chipname != treetitle) { : argument is of length zero >>>>>>> >>>>>>> Hope someone can help, >>>>>>> >>>>>>> Paul. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> sessionInfo() >>>>>>> >>>>>>> >>>>>>> R version 2.11.0 (2010-04-22) >>>>>>> x86_64-redhat-linux-gnu >>>>>>> >>>>>>> locale: >>>>>>> ? [1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C >>>>>>> ? [3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 >>>>>>> ? [5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8 >>>>>>> ? [7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C >>>>>>> ? [9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C >>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>>> >>>>>>> attached base packages: >>>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >> > > > > -- > Paul Geeleher (PhD Student) > School of Mathematics, Statistics and Applied Mathematics > National University of Ireland > Galway > Ireland > -- > www.bioinformaticstutorials.com -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com

ADD REPLY • link 13.2 years ago Paul Geeleher ★ 1.3k

0

Entering edit mode

Since I did not implement operators [] for function intensity() you need to use a dataframe to set values to zero and then use intensity()<-value to replace the object. However, please read the help file for intensity<- carfully, since this operation is dangerous (and I have not tested it for exon arrays). I think it is best to work with the dataframe itself, since you want to fit your own model, anyhow. Christian On 2/7/12 2:17 PM, Paul Geeleher wrote: > Actually doing what I'm talking about relies on being able to modify > values in "intensity(data_qn)", which I was assuming one could do but > testing it I don't think I can!? > >> head(intensity(data_qn)) > X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN > 1 0 0 0.0000 0.000 > 2 1 0 0.0000 0.000 > 3 2 0 0.0000 0.000 > 4 3 0 0.0000 0.000 > 5 4 0 0.0000 0.000 > 6 5 0 85.3523 121.733 > >> intensity(data_qn)[6,3] > [1] 85.3523 >> intensity(data_qn)[6,3]<- 0 > Error in dimnames(x) : 'x' is missing > >> head(intensity(data_qn)) > X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN > 1 0 0 0.0000 0.000 > 2 1 0 0.0000 0.000 > 3 2 0 0.0000 0.000 > 4 3 0 0.0000 0.000 > 5 4 0 0.0000 0.000 > 6 5 0 85.3523 121.733 > > This means a different approach may be necessary. > > What I can do is extract the probe level estimates for each gene, > which I think you outlined in a previous email to the list > (https://mailman.stat.ethz.ch/pipermail/bioconductor/2011-September/ 040963.html) > and from there set those probes which aren't detected above background > to zero (once I've figured out how to map probesets to probes). I can > the do what I need to do with this data (which is basically to fit a > model based on individual probe intensities for each gene) and do this > for every gene, which won't involve modifying the intensities in > data_qn and as I'm only "attaching" probe level data for one gene at a > time, shouldn't use that much memory. > > Paul > > On Tue, Feb 7, 2012 at 12:52 PM, Paul Geeleher<paulgeeleher at="" gmail.com=""> wrote: >> Hi Christian, apologies for the lack of clarity in my previous email, >> I'll try to clear this up, see replies below: >> >> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at=""> wrote: >>> Dear Paul, >>> >>> I am afraid that it is not quite clear to me what you want to do. >>> >>> What do you mean with "obtain the expression levels of the probes"? Do you >>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your question >>> about DABG? >> >> I've read in and quantiles normalized probe level data for 176 arrays >> and what I'm trying to do now is set every probe (individual probe, >> not probeset) that is not detected above background to zero. But from >> what I've read dabg.call() can only create p-values for probeset >> level? So as compromise I'm now trying to set the expression of all >> *individual probes* which belong to a *probeset* not detected above >> background to zero. I hope that makes sense. >> >>> >>> How did you create "data_hapMap"? Maybe you could send me your complete code >>> otherwise I am only able to guess. >> >> "data_hapMap" was a mistake, sorry I copied the wrong object name it >> should have been "data_qn" which is quantiles normalized probe level >> data that I created like this: >> >> data_hapMap<- root.data(scheme.huex10stv2, >> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.r oot") >> #read raw data >> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles", >> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe* >> level >> >> So "data_qn" is the object I want to work with, based on your advice >> I've figured out that to access the expression levels in "data_qn" I >> need to use "attachInten()", then I can use "intensities()" to access >> the expression levels: >> >> treenames<- unlist(treeNames(data_qn)) >> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach the >> first two samples >> >>> head(intensity(data_qn)) >> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN >> 1 0 0 0.0000 0.000 >> 2 1 0 0.0000 0.000 >> 3 2 0 0.0000 0.000 >> 4 3 0 0.0000 0.000 >> 5 4 0 0.0000 0.000 >> 6 5 0 85.3523 121.733 >> >> Does the X column in the output above represent the probe ID? If so >> and I have a mapping of probeset IDs to corresponding probe IDs, it >> should be fairly straightforward to set probes that are not detected >> above background to zero? Perhaps there is a more straightforward way >> of doing this though? >> >> >>> >>> I guess that "data_hapMap" contains the raw data. For these the slot "data" >>> is empty to save memory. So you need to use either attachData() or >>> attachInten(). However since you are using exon arrays you may not have >>> enough RAM, so it would be better to use function export() or export.data(), >>> or attach only a subset, see help ?attachData. See also vignette xps.pdf >>> (chapter 2.3). >> >> I think RAM shouldn't be an issue if I attach the samples one at a >> time? (I actually have access to a machine with 32gigs RAM but ideally >> would like to get what I'm doing to run on a standard desktop, which >> is actually why I'm using XPS!). >> >>> >>> When you talk about "expression matrix", how did you create it? Maybe you >>> could use function validExpr(), but w/o seeing your code it is hard to tell. >>> For DABG there are functions pvalData() and presCall(), see the examples in >>> help ?dabg.call. >> >> Yes I've managed to use dabg.call() at probeset level and access the >> p-values using pvalData() alright. >> >> Thanks again for all of your help and patience! >> >> Kind Regards, >> >> Paul. >> >> >>> >>> Best regards >>> Christian >>> >>> >>> >>> On 2/6/12 4:33 PM, Paul Geeleher wrote: >>>> >>>> Hi Christian, >>>> >>>> Thanks for your quick and informative reply. >>>> >>>> I have re-run the analysis and saved the R objects as you suggest. The >>>> next thing I'm trying to do is to obtain the expression levels of the >>>> probes, but this doesn't seem to be working for me: >>>> >>>>> a<- validData(data_hapMap) >>>> >>>> Error in .local(object, ...) : slot "data" has no data >>>> >>>> Based on the documentation I think validData() is the correct function. >>>> >>>> I've also performed probeset level DABG and I'm trying to set >>>> individual probes which belong to probesets with DABG< .05 to 0 in >>>> the expression matrix. >>>> >>>> But it seems I can't see the expression matrix using validData(). >>>> Perhaps there is another function. Any ideas? >>>> >>>> Thank you again for your help with this, I'm very grateful! >>>> >>>> Paul. >>>> >>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote: >>>>> >>>>> Dear Paul, >>>>> >>>>> The functions root.data(), root.call() and root.expr() were created to >>>>> allow you access to the corresponding root files just in case that you >>>>> did not save your R session. >>>>> >>>>> In the cases where you compute expression levels stepwise, or only part >>>>> of them such as normalize.quantiles(), as seems to be the matter in your >>>>> case, there is no corresponding root.xxx() function to access the root >>>>> file directly. In these cases you need to save your R session to have >>>>> continued access to the resulting root file. >>>>> >>>>> Please note that saving the R session is the usual case to have access >>>>> to the root files. >>>>> >>>>> Best regards >>>>> Christian >>>>> >>>>> >>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote: >>>>>> >>>>>> Hi Christian, >>>>>> >>>>>> Thanks for your quick reply. I check what kind of trees I have using >>>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu" >>>>>> rather than "int", this is presumably because my analysis required no >>>>>> background correction step? >>>>>> >>>>>> So I then tried: >>>>>>> >>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu") >>>>>> >>>>>> >>>>>> but that gives me a huge number of errors that look like this: >>>>>> >>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>> Error: Could not get directory<preprocesset>. >>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>> Error: Could not get directory<preprocesset>. >>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>> Error: Could not get directory<preprocesset>. >>>>>> Error: Could not get tree<exportset>. >>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", : >>>>>> error in function ?ExportData? >>>>>> >>>>>> >>>>>> This file "exon_quantiles.root" definitely exists in the current >>>>>> working directory though... Thanks again for your help! >>>>>> >>>>>> Paul. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> wrote: >>>>>>> >>>>>>> Dear Paul, >>>>>>> >>>>>>> Please have a look at the help ?root.expr. >>>>>>> >>>>>>> If I understand you correctly, you did only do quantile normalization? >>>>>>> >>>>>>> To see the tree names in your file you should do: >>>>>>>> >>>>>>>> getTreeNames("exon_quantiles.root") >>>>>>> >>>>>>> >>>>>>> You will probably see trees with extension "int", see help >>>>>>> ?validTreetype. >>>>>>> >>>>>>> To load these trees you need to do: >>>>>>>> >>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") >>>>>>> >>>>>>> >>>>>>> Please let me know if this did solve your problem. >>>>>>> >>>>>>> Best regards >>>>>>> Christian >>>>>>> _._._._._._._._._._._._._._._._._._ >>>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >>>>>>> V.i.e.n.n.a A.u.s.t.r.i.a >>>>>>> e.m.a.i.l: cstrato at aon.at >>>>>>> _._._._._._._._._._._._._._._._._._ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon >>>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", but >>>>>>>> if I try to load it the same was I'd load my raw data (using the >>>>>>>> scheme file I created for Affy exon arrays) I get the error below? I >>>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps need a >>>>>>>> different "root scheme" file for this normalized data? Unfortunately, >>>>>>>> I haven't been able to find an answer. >>>>>>>> >>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>>>>>>> >>>>>>>> >>>>>>>> Error in if (chipname != treetitle) { : argument is of length zero >>>>>>>> >>>>>>>> Hope someone can help, >>>>>>>> >>>>>>>> Paul. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> sessionInfo() >>>>>>>> >>>>>>>> >>>>>>>> R version 2.11.0 (2010-04-22) >>>>>>>> x86_64-redhat-linux-gnu >>>>>>>> >>>>>>>> locale: >>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>>>> >>>>>>>> attached base packages: >>>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >> >> -- >> Paul Geeleher (PhD Student) >> School of Mathematics, Statistics and Applied Mathematics >> National University of Ireland >> Galway >> Ireland >> -- >> www.bioinformaticstutorials.com > > >

ADD REPLY • link 13.2 years ago cstrato ★ 3.9k

0

Entering edit mode

Dear Paul, see replies below: On 2/7/12 1:52 PM, Paul Geeleher wrote: > Hi Christian, apologies for the lack of clarity in my previous email, > I'll try to clear this up, see replies below: > > On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at=""> wrote: >> Dear Paul, >> >> I am afraid that it is not quite clear to me what you want to do. >> >> What do you mean with "obtain the expression levels of the probes"? Do you >> mean "probes" (i.e. oligos) or do you mean "probesets" as in your question >> about DABG? > > I've read in and quantiles normalized probe level data for 176 arrays > and what I'm trying to do now is set every probe (individual probe, > not probeset) that is not detected above background to zero. But from > what I've read dabg.call() can only create p-values for probeset > level? So as compromise I'm now trying to set the expression of all > *individual probes* which belong to a *probeset* not detected above > background to zero. I hope that makes sense. > Did you read the exon array whitepapers which you can download from Affymetrix? Every probeset consists of 1-4 probes only, and every exon consists usually of 1-2 probesets. Each gene has a transcript_cluster_id, which consists of one or more probeset_ids. (You can see the mapping between ids using function export.scheme(..,treetype="anp",..) Since the smallest unit is the probeset, function dabg.call() will only work at the probeset (and transcript) level. If you set all probes of a probeset to zero you may loose an entire exon. >> >> How did you create "data_hapMap"? Maybe you could send me your complete code >> otherwise I am only able to guess. > > "data_hapMap" was a mistake, sorry I copied the wrong object name it > should have been "data_qn" which is quantiles normalized probe level > data that I created like this: > > data_hapMap<- root.data(scheme.huex10stv2, > "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.ro ot") > #read raw data > data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles", > filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe* > level > > So "data_qn" is the object I want to work with, based on your advice > I've figured out that to access the expression levels in "data_qn" I > need to use "attachInten()", then I can use "intensities()" to access > the expression levels: > > treenames<- unlist(treeNames(data_qn)) > data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach the > first two samples > >> head(intensity(data_qn)) > X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN > 1 0 0 0.0000 0.000 > 2 1 0 0.0000 0.000 > 3 2 0 0.0000 0.000 > 4 3 0 0.0000 0.000 > 5 4 0 0.0000 0.000 > 6 5 0 85.3523 121.733 > > Does the X column in the output above represent the probe ID? If so > and I have a mapping of probeset IDs to corresponding probe IDs, it > should be fairly straightforward to set probes that are not detected > above background to zero? Perhaps there is a more straightforward way > of doing this though? > No, (X,Y) are the coordinates of the single probes on the exon array. You can use function export.scheme(..,treetype="scm",..) to get the mapping between (X,Y) and an internal PROBESET_ID, e.g.: UNIT_ID X Y ProbeLength Mask EXON_ID PROBESET_ID 31 986 1674 25 512 31 31 31 1092 677 25 512 31 31 31 796 1862 25 512 31 31 31 917 193 25 512 31 31 31 341 1677 25 512 32 32 31 144 2250 25 512 32 32 31 689 262 25 512 32 32 31 579 1670 25 512 32 32 Then you can use export.scheme(..,treetype="pbs",..) to map PROBESET_ID (=UNIT_ID) to the ProbesetID, e.g.: UNIT_ID ProbesetID NumCells NumAtoms NumSubunits UnitType 31 2315101 4 4 1 512 32 2315102 4 4 1 512 As you see PROBESET_IDs 31 and 32 have each 4 probes and belong to the Affymetrix ProbesetIDs 2315101 and 2315102, respectively. You could also use functions indexUnits(), and unitID2probesetID() or unitID2transcriptID(), respectively. Best regards Christian > >> >> I guess that "data_hapMap" contains the raw data. For these the slot "data" >> is empty to save memory. So you need to use either attachData() or >> attachInten(). However since you are using exon arrays you may not have >> enough RAM, so it would be better to use function export() or export.data(), >> or attach only a subset, see help ?attachData. See also vignette xps.pdf >> (chapter 2.3). > > I think RAM shouldn't be an issue if I attach the samples one at a > time? (I actually have access to a machine with 32gigs RAM but ideally > would like to get what I'm doing to run on a standard desktop, which > is actually why I'm using XPS!). > >> >> When you talk about "expression matrix", how did you create it? Maybe you >> could use function validExpr(), but w/o seeing your code it is hard to tell. >> For DABG there are functions pvalData() and presCall(), see the examples in >> help ?dabg.call. > > Yes I've managed to use dabg.call() at probeset level and access the > p-values using pvalData() alright. > > Thanks again for all of your help and patience! > > Kind Regards, > > Paul. > > >> >> Best regards >> Christian >> >> >> >> On 2/6/12 4:33 PM, Paul Geeleher wrote: >>> >>> Hi Christian, >>> >>> Thanks for your quick and informative reply. >>> >>> I have re-run the analysis and saved the R objects as you suggest. The >>> next thing I'm trying to do is to obtain the expression levels of the >>> probes, but this doesn't seem to be working for me: >>> >>>> a<- validData(data_hapMap) >>> >>> Error in .local(object, ...) : slot "data" has no data >>> >>> Based on the documentation I think validData() is the correct function. >>> >>> I've also performed probeset level DABG and I'm trying to set >>> individual probes which belong to probesets with DABG< .05 to 0 in >>> the expression matrix. >>> >>> But it seems I can't see the expression matrix using validData(). >>> Perhaps there is another function. Any ideas? >>> >>> Thank you again for your help with this, I'm very grateful! >>> >>> Paul. >>> >>> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote: >>>> >>>> Dear Paul, >>>> >>>> The functions root.data(), root.call() and root.expr() were created to >>>> allow you access to the corresponding root files just in case that you >>>> did not save your R session. >>>> >>>> In the cases where you compute expression levels stepwise, or only part >>>> of them such as normalize.quantiles(), as seems to be the matter in your >>>> case, there is no corresponding root.xxx() function to access the root >>>> file directly. In these cases you need to save your R session to have >>>> continued access to the resulting root file. >>>> >>>> Please note that saving the R session is the usual case to have access >>>> to the root files. >>>> >>>> Best regards >>>> Christian >>>> >>>> >>>> On 2/2/12 1:12 PM, Paul Geeleher wrote: >>>>> >>>>> Hi Christian, >>>>> >>>>> Thanks for your quick reply. I check what kind of trees I have using >>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu" >>>>> rather than "int", this is presumably because my analysis required no >>>>> background correction step? >>>>> >>>>> So I then tried: >>>>>> >>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu") >>>>> >>>>> >>>>> but that gives me a huge number of errors that look like this: >>>>> >>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>> Error: Could not get directory<preprocesset>. >>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>> Error: Could not get directory<preprocesset>. >>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>> Error: Could not get directory<preprocesset>. >>>>> Error: Could not get tree<exportset>. >>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", : >>>>> error in function ?ExportData? >>>>> >>>>> >>>>> This file "exon_quantiles.root" definitely exists in the current >>>>> working directory though... Thanks again for your help! >>>>> >>>>> Paul. >>>>> >>>>> >>>>> >>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> wrote: >>>>>> >>>>>> Dear Paul, >>>>>> >>>>>> Please have a look at the help ?root.expr. >>>>>> >>>>>> If I understand you correctly, you did only do quantile normalization? >>>>>> >>>>>> To see the tree names in your file you should do: >>>>>>> >>>>>>> getTreeNames("exon_quantiles.root") >>>>>> >>>>>> >>>>>> You will probably see trees with extension "int", see help >>>>>> ?validTreetype. >>>>>> >>>>>> To load these trees you need to do: >>>>>>> >>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") >>>>>> >>>>>> >>>>>> Please let me know if this did solve your problem. >>>>>> >>>>>> Best regards >>>>>> Christian >>>>>> _._._._._._._._._._._._._._._._._._ >>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >>>>>> V.i.e.n.n.a A.u.s.t.r.i.a >>>>>> e.m.a.i.l: cstrato at aon.at >>>>>> _._._._._._._._._._._._._._._._._._ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon >>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", but >>>>>>> if I try to load it the same was I'd load my raw data (using the >>>>>>> scheme file I created for Affy exon arrays) I get the error below? I >>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps need a >>>>>>> different "root scheme" file for this normalized data? Unfortunately, >>>>>>> I haven't been able to find an answer. >>>>>>> >>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>>>>>> >>>>>>> >>>>>>> Error in if (chipname != treetitle) { : argument is of length zero >>>>>>> >>>>>>> Hope someone can help, >>>>>>> >>>>>>> Paul. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> sessionInfo() >>>>>>> >>>>>>> >>>>>>> R version 2.11.0 (2010-04-22) >>>>>>> x86_64-redhat-linux-gnu >>>>>>> >>>>>>> locale: >>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>>> >>>>>>> attached base packages: >>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >> > > >

ADD REPLY • link 13.2 years ago cstrato ★ 3.9k

0

Entering edit mode

Hi Christian, Thanks for your replies. As per your suggestions I've been been able to map between xy co-ordinates, probe IDs and PROBESET_IDS. I've been able to extract the expression level for a given gene (for a subset of samples) using this: id <- symbol2unitID(scheme.huex10stv2 , symbol="CDH11", unittype="transcript", as.list =TRUE) treenames <- unlist(treeNames(data_qn)) data_qn <- attachInten(data_qn, treenames=treenames[1:3]) xyIntens <- validData(data_qn, unitID=unlist(id)) With "validData()" is that it seems I can only get the expression levels for samples which are "attached". This means I'd probably have to attach each sample separately (which will take a long time) as I don't have enough memory to attach all samples. I'm wondering if there is a way to extract this data without having to attach the data? If something similar is possible for the dabg data (at probeset level) that would also be a big help in speeding up my pipeline (which will be complete if I can implement these last two steps)! Thanks again for all your help its much appreciated, Paul. On Tue, Feb 7, 2012 at 8:48 PM, cstrato <cstrato at="" aon.at=""> wrote: > Dear Paul, > > see replies below: > > > On 2/7/12 1:52 PM, Paul Geeleher wrote: >> >> Hi Christian, apologies for the lack of clarity in my previous email, >> I'll try to clear this up, see replies below: >> >> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at=""> ?wrote: >>> >>> Dear Paul, >>> >>> I am afraid that it is not quite clear to me what you want to do. >>> >>> What do you mean with "obtain the expression levels of the probes"? Do >>> you >>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your >>> question >>> about DABG? >> >> >> I've read in and quantiles normalized probe level data for 176 arrays >> and what I'm trying to do now is set every probe (individual probe, >> not probeset) that is not detected above background to zero. But from >> what I've read dabg.call() can only create p-values for probeset >> level? So as compromise I'm now trying to set the expression of all >> *individual probes* which belong to a *probeset* not detected above >> background to zero. I hope that makes sense. >> > > > Did you read the exon array whitepapers which you can download from > Affymetrix? > > Every probeset consists of 1-4 probes only, and every exon consists usually > of 1-2 probesets. Each gene has a transcript_cluster_id, which consists of > one or more probeset_ids. (You can see the mapping between ids using > function export.scheme(..,treetype="anp",..) > > Since the smallest unit is the probeset, function dabg.call() will only work > at the probeset (and transcript) level. If you set all probes of a probeset > to zero you may loose an entire exon. > > > >>> >>> How did you create "data_hapMap"? Maybe you could send me your complete >>> code >>> otherwise I am only able to guess. >> >> >> "data_hapMap" was a mistake, sorry I copied the wrong object name it >> should have been "data_qn" which is quantiles normalized probe level >> data that I created like this: >> >> data_hapMap<- root.data(scheme.huex10stv2, >> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.r oot") >> #read raw data >> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles", >> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe* >> level >> >> So "data_qn" is the object I want to work with, based on your advice >> I've figured out that to access the expression levels in "data_qn" I >> need to use "attachInten()", then I can use "intensities()" to access >> the expression levels: >> >> treenames<- unlist(treeNames(data_qn)) >> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach the >> first two samples >> >>> head(intensity(data_qn)) >> >> ? X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN >> 1 0 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000 >> 2 1 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000 >> 3 2 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000 >> 4 3 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000 >> 5 4 0 ? ? ? ? ? ? 0.0000 ? ? ? ? ? ? ?0.000 >> 6 5 0 ? ? ? ? ? ?85.3523 ? ? ? ? ? ?121.733 >> >> Does the X column in the output above represent the probe ID? If so >> and I have a mapping of probeset IDs to corresponding probe IDs, it >> should be fairly straightforward to set probes that are not detected >> above background to zero? Perhaps there is a more straightforward way >> of doing this though? >> > > No, (X,Y) are the coordinates of the single probes on the exon array. You > can use function export.scheme(..,treetype="scm",..) to get the mapping > between (X,Y) and an internal PROBESET_ID, e.g.: > > UNIT_ID X ? ? ? Y ? ? ? ProbeLength ? ? Mask ? ?EXON_ID PROBESET_ID > 31 ? ? ?986 ? ? 1674 ? ?25 ? ? ?512 ? ? 31 ? ? ?31 > 31 ? ? ?1092 ? ?677 ? ? 25 ? ? ?512 ? ? 31 ? ? ?31 > 31 ? ? ?796 ? ? 1862 ? ?25 ? ? ?512 ? ? 31 ? ? ?31 > 31 ? ? ?917 ? ? 193 ? ? 25 ? ? ?512 ? ? 31 ? ? ?31 > 31 ? ? ?341 ? ? 1677 ? ?25 ? ? ?512 ? ? 32 ? ? ?32 > 31 ? ? ?144 ? ? 2250 ? ?25 ? ? ?512 ? ? 32 ? ? ?32 > 31 ? ? ?689 ? ? 262 ? ? 25 ? ? ?512 ? ? 32 ? ? ?32 > 31 ? ? ?579 ? ? 1670 ? ?25 ? ? ?512 ? ? 32 ? ? ?32 > > Then you can use export.scheme(..,treetype="pbs",..) to map PROBESET_ID > (=UNIT_ID) to the ProbesetID, e.g.: > > UNIT_ID ProbesetID ? ? ?NumCells ? ? ? ?NumAtoms ? ? ? ?NumSubunits > UnitType > 31 ? ? ?2315101 4 ? ? ? 4 ? ? ? 1 ? ? ? 512 > 32 ? ? ?2315102 4 ? ? ? 4 ? ? ? 1 ? ? ? 512 > > As you see PROBESET_IDs 31 and 32 have each 4 probes and belong to the > Affymetrix ProbesetIDs 2315101 and 2315102, respectively. > > You could also use functions indexUnits(), and unitID2probesetID() or > unitID2transcriptID(), respectively. > > Best regards > Christian > > > >> >>> >>> I guess that "data_hapMap" contains the raw data. For these the slot >>> "data" >>> is empty to save memory. So you need to use either attachData() or >>> attachInten(). However since you are using exon arrays you may not have >>> enough RAM, so it would be better to use function export() or >>> export.data(), >>> or attach only a subset, see help ?attachData. See also vignette xps.pdf >>> (chapter 2.3). >> >> >> I think RAM shouldn't be an issue if I attach the samples one at a >> time? (I actually have access to a machine with 32gigs RAM but ideally >> would like to get what I'm doing to run on a standard desktop, which >> is actually why I'm using XPS!). >> >>> >>> When you talk about "expression matrix", how did you create it? Maybe you >>> could use function validExpr(), but w/o seeing your code it is hard to >>> tell. >>> For DABG there are functions pvalData() and presCall(), see the examples >>> in >>> help ?dabg.call. >> >> >> Yes I've managed to use dabg.call() at probeset level and access the >> p-values using pvalData() alright. >> >> Thanks again for all of your help and patience! >> >> Kind Regards, >> >> Paul. >> >> >>> >>> Best regards >>> Christian >>> >>> >>> >>> On 2/6/12 4:33 PM, Paul Geeleher wrote: >>>> >>>> >>>> Hi Christian, >>>> >>>> Thanks for your quick and informative reply. >>>> >>>> I have re-run the analysis and saved the R objects as you suggest. The >>>> next thing I'm trying to do is to obtain the expression levels of the >>>> probes, but this doesn't seem to be working for me: >>>> >>>>> a<- validData(data_hapMap) >>>> >>>> >>>> Error in .local(object, ...) : slot "data" has no data >>>> >>>> Based on the documentation I think validData() is the correct function. >>>> >>>> I've also performed probeset level DABG and I'm trying to set >>>> individual probes which belong to probesets with DABG< ? ?.05 to 0 in >>>> the expression matrix. >>>> >>>> But it seems I can't see the expression matrix using validData(). >>>> Perhaps there is another function. Any ideas? >>>> >>>> Thank you again for your help with this, I'm very grateful! >>>> >>>> Paul. >>>> >>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> ? ?wrote: >>>>> >>>>> >>>>> Dear Paul, >>>>> >>>>> The functions root.data(), root.call() and root.expr() were created to >>>>> allow you access to the corresponding root files just in case that you >>>>> did not save your R session. >>>>> >>>>> In the cases where you compute expression levels stepwise, or only part >>>>> of them such as normalize.quantiles(), as seems to be the matter in >>>>> your >>>>> case, there is no corresponding root.xxx() function to access the root >>>>> file directly. In these cases you need to save your R session to have >>>>> continued access to the resulting root file. >>>>> >>>>> Please note that saving the R session is the usual case to have access >>>>> to the root files. >>>>> >>>>> Best regards >>>>> Christian >>>>> >>>>> >>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote: >>>>>> >>>>>> >>>>>> Hi Christian, >>>>>> >>>>>> Thanks for your quick reply. I check what kind of trees I have using >>>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu" >>>>>> rather than "int", this is presumably because my analysis required no >>>>>> background correction step? >>>>>> >>>>>> So I then tried: >>>>>>> >>>>>>> >>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu") >>>>>> >>>>>> >>>>>> >>>>>> but that gives me a huge number of errors that look like this: >>>>>> >>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>> Error: Could not get directory<preprocesset>. >>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>> Error: Could not get directory<preprocesset>. >>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>> Error: Could not get directory<preprocesset>. >>>>>> Error: Could not get tree<exportset>. >>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", ?: >>>>>> ? ?error in function ?ExportData? >>>>>> >>>>>> >>>>>> This file "exon_quantiles.root" definitely exists in the current >>>>>> working directory though... Thanks again for your help! >>>>>> >>>>>> Paul. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> ? ? wrote: >>>>>>> >>>>>>> >>>>>>> Dear Paul, >>>>>>> >>>>>>> Please have a look at the help ?root.expr. >>>>>>> >>>>>>> If I understand you correctly, you did only do quantile >>>>>>> normalization? >>>>>>> >>>>>>> To see the tree names in your file you should do: >>>>>>>> >>>>>>>> >>>>>>>> getTreeNames("exon_quantiles.root") >>>>>>> >>>>>>> >>>>>>> >>>>>>> You will probably see trees with extension "int", see help >>>>>>> ?validTreetype. >>>>>>> >>>>>>> To load these trees you need to do: >>>>>>>> >>>>>>>> >>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") >>>>>>> >>>>>>> >>>>>>> >>>>>>> Please let me know if this did solve your problem. >>>>>>> >>>>>>> Best regards >>>>>>> Christian >>>>>>> _._._._._._._._._._._._._._._._._._ >>>>>>> C.h.r.i.s.t.i.a.n ? S.t.r.a.t.o.w.a >>>>>>> V.i.e.n.n.a ? ? ? ? ? A.u.s.t.r.i.a >>>>>>> e.m.a.i.l: ? ? ? ?cstrato at aon.at >>>>>>> _._._._._._._._._._._._._._._._._._ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon >>>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", >>>>>>>> but >>>>>>>> if I try to load it the same was I'd load my raw data (using the >>>>>>>> scheme file I created for Affy exon arrays) I get the error below? I >>>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps need >>>>>>>> a >>>>>>>> different "root scheme" file for this normalized data? >>>>>>>> Unfortunately, >>>>>>>> I haven't been able to find an answer. >>>>>>>> >>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Error in if (chipname != treetitle) { : argument is of length zero >>>>>>>> >>>>>>>> Hope someone can help, >>>>>>>> >>>>>>>> Paul. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> sessionInfo() >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> R version 2.11.0 (2010-04-22) >>>>>>>> x86_64-redhat-linux-gnu >>>>>>>> >>>>>>>> locale: >>>>>>>> ? [1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C >>>>>>>> ? [3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 >>>>>>>> ? [5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8 >>>>>>>> ? [7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C >>>>>>>> ? [9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C >>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>>>> >>>>>>>> attached base packages: >>>>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >> > -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com

ADD REPLY • link 13.2 years ago Paul Geeleher ★ 1.3k

0

Entering edit mode

Dear Paul, You could export the whole data as table using: export(data_qn, treename="*", treetype="cqu", varlist="fInten", outfile="data_cqu.txt",as.dataframe=FALSE) Then you have a huge text-file, which you can then use for your purposes. However, since you do not have enough memory to work with R you would need to either import only parts of the table into R, or use other languages such as perl for further analysis. For the DABG data you could use function export.call(). Alternatively, you could also work completely from within ROOT, using C++ macros. Examples how to use xps from within ROOT are shown in directory xps/examples/macro4XPS.C Best regards Christian On 2/14/12 1:54 PM, Paul Geeleher wrote: > Hi Christian, > > Thanks for your replies. > > As per your suggestions I've been been able to map between xy > co-ordinates, probe IDs and PROBESET_IDS. > > I've been able to extract the expression level for a given gene (for a > subset of samples) using this: > > id<- symbol2unitID(scheme.huex10stv2 , symbol="CDH11", > unittype="transcript", as.list =TRUE) > treenames<- unlist(treeNames(data_qn)) > data_qn<- attachInten(data_qn, treenames=treenames[1:3]) > xyIntens<- validData(data_qn, unitID=unlist(id)) > > With "validData()" is that it seems I can only get the expression > levels for samples which are "attached". This means I'd probably have > to attach each sample separately (which will take a long time) as I > don't have enough memory to attach all samples. > > I'm wondering if there is a way to extract this data without having to > attach the data? If something similar is possible for the dabg data > (at probeset level) that would also be a big help in speeding up my > pipeline (which will be complete if I can implement these last two > steps)! > > Thanks again for all your help its much appreciated, > > Paul. > > > On Tue, Feb 7, 2012 at 8:48 PM, cstrato<cstrato at="" aon.at=""> wrote: >> Dear Paul, >> >> see replies below: >> >> >> On 2/7/12 1:52 PM, Paul Geeleher wrote: >>> >>> Hi Christian, apologies for the lack of clarity in my previous email, >>> I'll try to clear this up, see replies below: >>> >>> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at=""> wrote: >>>> >>>> Dear Paul, >>>> >>>> I am afraid that it is not quite clear to me what you want to do. >>>> >>>> What do you mean with "obtain the expression levels of the probes"? Do >>>> you >>>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your >>>> question >>>> about DABG? >>> >>> >>> I've read in and quantiles normalized probe level data for 176 arrays >>> and what I'm trying to do now is set every probe (individual probe, >>> not probeset) that is not detected above background to zero. But from >>> what I've read dabg.call() can only create p-values for probeset >>> level? So as compromise I'm now trying to set the expression of all >>> *individual probes* which belong to a *probeset* not detected above >>> background to zero. I hope that makes sense. >>> >> >> >> Did you read the exon array whitepapers which you can download from >> Affymetrix? >> >> Every probeset consists of 1-4 probes only, and every exon consists usually >> of 1-2 probesets. Each gene has a transcript_cluster_id, which consists of >> one or more probeset_ids. (You can see the mapping between ids using >> function export.scheme(..,treetype="anp",..) >> >> Since the smallest unit is the probeset, function dabg.call() will only work >> at the probeset (and transcript) level. If you set all probes of a probeset >> to zero you may loose an entire exon. >> >> >> >>>> >>>> How did you create "data_hapMap"? Maybe you could send me your complete >>>> code >>>> otherwise I am only able to guess. >>> >>> >>> "data_hapMap" was a mistake, sorry I copied the wrong object name it >>> should have been "data_qn" which is quantiles normalized probe level >>> data that I created like this: >>> >>> data_hapMap<- root.data(scheme.huex10stv2, >>> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel. root") >>> #read raw data >>> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles", >>> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe* >>> level >>> >>> So "data_qn" is the object I want to work with, based on your advice >>> I've figured out that to access the expression levels in "data_qn" I >>> need to use "attachInten()", then I can use "intensities()" to access >>> the expression levels: >>> >>> treenames<- unlist(treeNames(data_qn)) >>> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach the >>> first two samples >>> >>>> head(intensity(data_qn)) >>> >>> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN >>> 1 0 0 0.0000 0.000 >>> 2 1 0 0.0000 0.000 >>> 3 2 0 0.0000 0.000 >>> 4 3 0 0.0000 0.000 >>> 5 4 0 0.0000 0.000 >>> 6 5 0 85.3523 121.733 >>> >>> Does the X column in the output above represent the probe ID? If so >>> and I have a mapping of probeset IDs to corresponding probe IDs, it >>> should be fairly straightforward to set probes that are not detected >>> above background to zero? Perhaps there is a more straightforward way >>> of doing this though? >>> >> >> No, (X,Y) are the coordinates of the single probes on the exon array. You >> can use function export.scheme(..,treetype="scm",..) to get the mapping >> between (X,Y) and an internal PROBESET_ID, e.g.: >> >> UNIT_ID X Y ProbeLength Mask EXON_ID PROBESET_ID >> 31 986 1674 25 512 31 31 >> 31 1092 677 25 512 31 31 >> 31 796 1862 25 512 31 31 >> 31 917 193 25 512 31 31 >> 31 341 1677 25 512 32 32 >> 31 144 2250 25 512 32 32 >> 31 689 262 25 512 32 32 >> 31 579 1670 25 512 32 32 >> >> Then you can use export.scheme(..,treetype="pbs",..) to map PROBESET_ID >> (=UNIT_ID) to the ProbesetID, e.g.: >> >> UNIT_ID ProbesetID NumCells NumAtoms NumSubunits >> UnitType >> 31 2315101 4 4 1 512 >> 32 2315102 4 4 1 512 >> >> As you see PROBESET_IDs 31 and 32 have each 4 probes and belong to the >> Affymetrix ProbesetIDs 2315101 and 2315102, respectively. >> >> You could also use functions indexUnits(), and unitID2probesetID() or >> unitID2transcriptID(), respectively. >> >> Best regards >> Christian >> >> >> >>> >>>> >>>> I guess that "data_hapMap" contains the raw data. For these the slot >>>> "data" >>>> is empty to save memory. So you need to use either attachData() or >>>> attachInten(). However since you are using exon arrays you may not have >>>> enough RAM, so it would be better to use function export() or >>>> export.data(), >>>> or attach only a subset, see help ?attachData. See also vignette xps.pdf >>>> (chapter 2.3). >>> >>> >>> I think RAM shouldn't be an issue if I attach the samples one at a >>> time? (I actually have access to a machine with 32gigs RAM but ideally >>> would like to get what I'm doing to run on a standard desktop, which >>> is actually why I'm using XPS!). >>> >>>> >>>> When you talk about "expression matrix", how did you create it? Maybe you >>>> could use function validExpr(), but w/o seeing your code it is hard to >>>> tell. >>>> For DABG there are functions pvalData() and presCall(), see the examples >>>> in >>>> help ?dabg.call. >>> >>> >>> Yes I've managed to use dabg.call() at probeset level and access the >>> p-values using pvalData() alright. >>> >>> Thanks again for all of your help and patience! >>> >>> Kind Regards, >>> >>> Paul. >>> >>> >>>> >>>> Best regards >>>> Christian >>>> >>>> >>>> >>>> On 2/6/12 4:33 PM, Paul Geeleher wrote: >>>>> >>>>> >>>>> Hi Christian, >>>>> >>>>> Thanks for your quick and informative reply. >>>>> >>>>> I have re-run the analysis and saved the R objects as you suggest. The >>>>> next thing I'm trying to do is to obtain the expression levels of the >>>>> probes, but this doesn't seem to be working for me: >>>>> >>>>>> a<- validData(data_hapMap) >>>>> >>>>> >>>>> Error in .local(object, ...) : slot "data" has no data >>>>> >>>>> Based on the documentation I think validData() is the correct function. >>>>> >>>>> I've also performed probeset level DABG and I'm trying to set >>>>> individual probes which belong to probesets with DABG< .05 to 0 in >>>>> the expression matrix. >>>>> >>>>> But it seems I can't see the expression matrix using validData(). >>>>> Perhaps there is another function. Any ideas? >>>>> >>>>> Thank you again for your help with this, I'm very grateful! >>>>> >>>>> Paul. >>>>> >>>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote: >>>>>> >>>>>> >>>>>> Dear Paul, >>>>>> >>>>>> The functions root.data(), root.call() and root.expr() were created to >>>>>> allow you access to the corresponding root files just in case that you >>>>>> did not save your R session. >>>>>> >>>>>> In the cases where you compute expression levels stepwise, or only part >>>>>> of them such as normalize.quantiles(), as seems to be the matter in >>>>>> your >>>>>> case, there is no corresponding root.xxx() function to access the root >>>>>> file directly. In these cases you need to save your R session to have >>>>>> continued access to the resulting root file. >>>>>> >>>>>> Please note that saving the R session is the usual case to have access >>>>>> to the root files. >>>>>> >>>>>> Best regards >>>>>> Christian >>>>>> >>>>>> >>>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote: >>>>>>> >>>>>>> >>>>>>> Hi Christian, >>>>>>> >>>>>>> Thanks for your quick reply. I check what kind of trees I have using >>>>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu" >>>>>>> rather than "int", this is presumably because my analysis required no >>>>>>> background correction step? >>>>>>> >>>>>>> So I then tried: >>>>>>>> >>>>>>>> >>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu") >>>>>>> >>>>>>> >>>>>>> >>>>>>> but that gives me a huge number of errors that look like this: >>>>>>> >>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>>> Error: Could not get directory<preprocesset>. >>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>>> Error: Could not get directory<preprocesset>. >>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>>> Error: Could not get directory<preprocesset>. >>>>>>> Error: Could not get tree<exportset>. >>>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", : >>>>>>> error in function ?ExportData? >>>>>>> >>>>>>> >>>>>>> This file "exon_quantiles.root" definitely exists in the current >>>>>>> working directory though... Thanks again for your help! >>>>>>> >>>>>>> Paul. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Dear Paul, >>>>>>>> >>>>>>>> Please have a look at the help ?root.expr. >>>>>>>> >>>>>>>> If I understand you correctly, you did only do quantile >>>>>>>> normalization? >>>>>>>> >>>>>>>> To see the tree names in your file you should do: >>>>>>>>> >>>>>>>>> >>>>>>>>> getTreeNames("exon_quantiles.root") >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> You will probably see trees with extension "int", see help >>>>>>>> ?validTreetype. >>>>>>>> >>>>>>>> To load these trees you need to do: >>>>>>>>> >>>>>>>>> >>>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int") >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Please let me know if this did solve your problem. >>>>>>>> >>>>>>>> Best regards >>>>>>>> Christian >>>>>>>> _._._._._._._._._._._._._._._._._._ >>>>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >>>>>>>> V.i.e.n.n.a A.u.s.t.r.i.a >>>>>>>> e.m.a.i.l: cstrato at aon.at >>>>>>>> _._._._._._._._._._._._._._._._._._ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon >>>>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", >>>>>>>>> but >>>>>>>>> if I try to load it the same was I'd load my raw data (using the >>>>>>>>> scheme file I created for Affy exon arrays) I get the error below? I >>>>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps need >>>>>>>>> a >>>>>>>>> different "root scheme" file for this normalized data? >>>>>>>>> Unfortunately, >>>>>>>>> I haven't been able to find an answer. >>>>>>>>> >>>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Error in if (chipname != treetitle) { : argument is of length zero >>>>>>>>> >>>>>>>>> Hope someone can help, >>>>>>>>> >>>>>>>>> Paul. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> sessionInfo() >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> R version 2.11.0 (2010-04-22) >>>>>>>>> x86_64-redhat-linux-gnu >>>>>>>>> >>>>>>>>> locale: >>>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>>>>> >>>>>>>>> attached base packages: >>>>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >>> >> > > >

ADD REPLY • link 13.2 years ago cstrato ★ 3.9k

0

Entering edit mode

Hi Christian, Thanks again for your very helpful replies. I think using the C++ macros might be the way I go once I get everything ironed out. I have what I think is one last question. When try to convert gene symbol to internal ID: allInternalGeneIds[[i]] <- symbol2unitID(scheme.huex10stv2 , symbol=geneSym, unittype="transcript", as.list =TRUE) I get the following message: slot ?unitname? is empty, importing data from ?HuEx-1_0-st-v2.idx? ... Which leads me to believe there is a way to attach "unitname"? I've been unable to find how to to this though? I think if you know if this is possible it could speed up my pipeline quite a bit, as I have to repeat this step for every gene! Thanks again, Paul. On 2/14/12, cstrato <cstrato at="" aon.at=""> wrote: > Dear Paul, > > You could export the whole data as table using: > > export(data_qn, treename="*", treetype="cqu", varlist="fInten", > outfile="data_cqu.txt",as.dataframe=FALSE) > > Then you have a huge text-file, which you can then use for your > purposes. However, since you do not have enough memory to work with R > you would need to either import only parts of the table into R, or use > other languages such as perl for further analysis. > > For the DABG data you could use function export.call(). > > Alternatively, you could also work completely from within ROOT, using > C++ macros. Examples how to use xps from within ROOT are shown in > directory xps/examples/macro4XPS.C > > Best regards > Christian > > > On 2/14/12 1:54 PM, Paul Geeleher wrote: >> Hi Christian, >> >> Thanks for your replies. >> >> As per your suggestions I've been been able to map between xy >> co-ordinates, probe IDs and PROBESET_IDS. >> >> I've been able to extract the expression level for a given gene (for a >> subset of samples) using this: >> >> id<- symbol2unitID(scheme.huex10stv2 , symbol="CDH11", >> unittype="transcript", as.list =TRUE) >> treenames<- unlist(treeNames(data_qn)) >> data_qn<- attachInten(data_qn, treenames=treenames[1:3]) >> xyIntens<- validData(data_qn, unitID=unlist(id)) >> >> With "validData()" is that it seems I can only get the expression >> levels for samples which are "attached". This means I'd probably have >> to attach each sample separately (which will take a long time) as I >> don't have enough memory to attach all samples. >> >> I'm wondering if there is a way to extract this data without having to >> attach the data? If something similar is possible for the dabg data >> (at probeset level) that would also be a big help in speeding up my >> pipeline (which will be complete if I can implement these last two >> steps)! >> >> Thanks again for all your help its much appreciated, >> >> Paul. >> >> >> On Tue, Feb 7, 2012 at 8:48 PM, cstrato<cstrato at="" aon.at=""> wrote: >>> Dear Paul, >>> >>> see replies below: >>> >>> >>> On 2/7/12 1:52 PM, Paul Geeleher wrote: >>>> >>>> Hi Christian, apologies for the lack of clarity in my previous email, >>>> I'll try to clear this up, see replies below: >>>> >>>> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at=""> wrote: >>>>> >>>>> Dear Paul, >>>>> >>>>> I am afraid that it is not quite clear to me what you want to do. >>>>> >>>>> What do you mean with "obtain the expression levels of the probes"? Do >>>>> you >>>>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your >>>>> question >>>>> about DABG? >>>> >>>> >>>> I've read in and quantiles normalized probe level data for 176 arrays >>>> and what I'm trying to do now is set every probe (individual probe, >>>> not probeset) that is not detected above background to zero. But from >>>> what I've read dabg.call() can only create p-values for probeset >>>> level? So as compromise I'm now trying to set the expression of all >>>> *individual probes* which belong to a *probeset* not detected above >>>> background to zero. I hope that makes sense. >>>> >>> >>> >>> Did you read the exon array whitepapers which you can download from >>> Affymetrix? >>> >>> Every probeset consists of 1-4 probes only, and every exon consists >>> usually >>> of 1-2 probesets. Each gene has a transcript_cluster_id, which consists >>> of >>> one or more probeset_ids. (You can see the mapping between ids using >>> function export.scheme(..,treetype="anp",..) >>> >>> Since the smallest unit is the probeset, function dabg.call() will only >>> work >>> at the probeset (and transcript) level. If you set all probes of a >>> probeset >>> to zero you may loose an entire exon. >>> >>> >>> >>>>> >>>>> How did you create "data_hapMap"? Maybe you could send me your complete >>>>> code >>>>> otherwise I am only able to guess. >>>> >>>> >>>> "data_hapMap" was a mistake, sorry I copied the wrong object name it >>>> should have been "data_qn" which is quantiles normalized probe level >>>> data that I created like this: >>>> >>>> data_hapMap<- root.data(scheme.huex10stv2, >>>> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel .root") >>>> #read raw data >>>> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles", >>>> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe* >>>> level >>>> >>>> So "data_qn" is the object I want to work with, based on your advice >>>> I've figured out that to access the expression levels in "data_qn" I >>>> need to use "attachInten()", then I can use "intensities()" to access >>>> the expression levels: >>>> >>>> treenames<- unlist(treeNames(data_qn)) >>>> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach the >>>> first two samples >>>> >>>>> head(intensity(data_qn)) >>>> >>>> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN >>>> 1 0 0 0.0000 0.000 >>>> 2 1 0 0.0000 0.000 >>>> 3 2 0 0.0000 0.000 >>>> 4 3 0 0.0000 0.000 >>>> 5 4 0 0.0000 0.000 >>>> 6 5 0 85.3523 121.733 >>>> >>>> Does the X column in the output above represent the probe ID? If so >>>> and I have a mapping of probeset IDs to corresponding probe IDs, it >>>> should be fairly straightforward to set probes that are not detected >>>> above background to zero? Perhaps there is a more straightforward way >>>> of doing this though? >>>> >>> >>> No, (X,Y) are the coordinates of the single probes on the exon array. You >>> can use function export.scheme(..,treetype="scm",..) to get the mapping >>> between (X,Y) and an internal PROBESET_ID, e.g.: >>> >>> UNIT_ID X Y ProbeLength Mask EXON_ID PROBESET_ID >>> 31 986 1674 25 512 31 31 >>> 31 1092 677 25 512 31 31 >>> 31 796 1862 25 512 31 31 >>> 31 917 193 25 512 31 31 >>> 31 341 1677 25 512 32 32 >>> 31 144 2250 25 512 32 32 >>> 31 689 262 25 512 32 32 >>> 31 579 1670 25 512 32 32 >>> >>> Then you can use export.scheme(..,treetype="pbs",..) to map PROBESET_ID >>> (=UNIT_ID) to the ProbesetID, e.g.: >>> >>> UNIT_ID ProbesetID NumCells NumAtoms NumSubunits >>> UnitType >>> 31 2315101 4 4 1 512 >>> 32 2315102 4 4 1 512 >>> >>> As you see PROBESET_IDs 31 and 32 have each 4 probes and belong to the >>> Affymetrix ProbesetIDs 2315101 and 2315102, respectively. >>> >>> You could also use functions indexUnits(), and unitID2probesetID() or >>> unitID2transcriptID(), respectively. >>> >>> Best regards >>> Christian >>> >>> >>> >>>> >>>>> >>>>> I guess that "data_hapMap" contains the raw data. For these the slot >>>>> "data" >>>>> is empty to save memory. So you need to use either attachData() or >>>>> attachInten(). However since you are using exon arrays you may not have >>>>> enough RAM, so it would be better to use function export() or >>>>> export.data(), >>>>> or attach only a subset, see help ?attachData. See also vignette >>>>> xps.pdf >>>>> (chapter 2.3). >>>> >>>> >>>> I think RAM shouldn't be an issue if I attach the samples one at a >>>> time? (I actually have access to a machine with 32gigs RAM but ideally >>>> would like to get what I'm doing to run on a standard desktop, which >>>> is actually why I'm using XPS!). >>>> >>>>> >>>>> When you talk about "expression matrix", how did you create it? Maybe >>>>> you >>>>> could use function validExpr(), but w/o seeing your code it is hard to >>>>> tell. >>>>> For DABG there are functions pvalData() and presCall(), see the >>>>> examples >>>>> in >>>>> help ?dabg.call. >>>> >>>> >>>> Yes I've managed to use dabg.call() at probeset level and access the >>>> p-values using pvalData() alright. >>>> >>>> Thanks again for all of your help and patience! >>>> >>>> Kind Regards, >>>> >>>> Paul. >>>> >>>> >>>>> >>>>> Best regards >>>>> Christian >>>>> >>>>> >>>>> >>>>> On 2/6/12 4:33 PM, Paul Geeleher wrote: >>>>>> >>>>>> >>>>>> Hi Christian, >>>>>> >>>>>> Thanks for your quick and informative reply. >>>>>> >>>>>> I have re-run the analysis and saved the R objects as you suggest. The >>>>>> next thing I'm trying to do is to obtain the expression levels of the >>>>>> probes, but this doesn't seem to be working for me: >>>>>> >>>>>>> a<- validData(data_hapMap) >>>>>> >>>>>> >>>>>> Error in .local(object, ...) : slot "data" has no data >>>>>> >>>>>> Based on the documentation I think validData() is the correct >>>>>> function. >>>>>> >>>>>> I've also performed probeset level DABG and I'm trying to set >>>>>> individual probes which belong to probesets with DABG< .05 to 0 >>>>>> in >>>>>> the expression matrix. >>>>>> >>>>>> But it seems I can't see the expression matrix using validData(). >>>>>> Perhaps there is another function. Any ideas? >>>>>> >>>>>> Thank you again for your help with this, I'm very grateful! >>>>>> >>>>>> Paul. >>>>>> >>>>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote: >>>>>>> >>>>>>> >>>>>>> Dear Paul, >>>>>>> >>>>>>> The functions root.data(), root.call() and root.expr() were created >>>>>>> to >>>>>>> allow you access to the corresponding root files just in case that >>>>>>> you >>>>>>> did not save your R session. >>>>>>> >>>>>>> In the cases where you compute expression levels stepwise, or only >>>>>>> part >>>>>>> of them such as normalize.quantiles(), as seems to be the matter in >>>>>>> your >>>>>>> case, there is no corresponding root.xxx() function to access the >>>>>>> root >>>>>>> file directly. In these cases you need to save your R session to have >>>>>>> continued access to the resulting root file. >>>>>>> >>>>>>> Please note that saving the R session is the usual case to have >>>>>>> access >>>>>>> to the root files. >>>>>>> >>>>>>> Best regards >>>>>>> Christian >>>>>>> >>>>>>> >>>>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Christian, >>>>>>>> >>>>>>>> Thanks for your quick reply. I check what kind of trees I have using >>>>>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu" >>>>>>>> rather than "int", this is presumably because my analysis required >>>>>>>> no >>>>>>>> background correction step? >>>>>>>> >>>>>>>> So I then tried: >>>>>>>>> >>>>>>>>> >>>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", >>>>>>>>> "cqu") >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> but that gives me a huge number of errors that look like this: >>>>>>>> >>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>>>> Error: Could not get directory<preprocesset>. >>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>>>> Error: Could not get directory<preprocesset>. >>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>>>> Error: Could not get directory<preprocesset>. >>>>>>>> Error: Could not get tree<exportset>. >>>>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", : >>>>>>>> error in function ?ExportData? >>>>>>>> >>>>>>>> >>>>>>>> This file "exon_quantiles.root" definitely exists in the current >>>>>>>> working directory though... Thanks again for your help! >>>>>>>> >>>>>>>> Paul. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Dear Paul, >>>>>>>>> >>>>>>>>> Please have a look at the help ?root.expr. >>>>>>>>> >>>>>>>>> If I understand you correctly, you did only do quantile >>>>>>>>> normalization? >>>>>>>>> >>>>>>>>> To see the tree names in your file you should do: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> getTreeNames("exon_quantiles.root") >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> You will probably see trees with extension "int", see help >>>>>>>>> ?validTreetype. >>>>>>>>> >>>>>>>>> To load these trees you need to do: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", >>>>>>>>>> "int") >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Please let me know if this did solve your problem. >>>>>>>>> >>>>>>>>> Best regards >>>>>>>>> Christian >>>>>>>>> _._._._._._._._._._._._._._._._._._ >>>>>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >>>>>>>>> V.i.e.n.n.a A.u.s.t.r.i.a >>>>>>>>> e.m.a.i.l: cstrato at aon.at >>>>>>>>> _._._._._._._._._._._._._._._._._._ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I've used xps to quantiles normalize (at probe level) some Affy >>>>>>>>>> Exon >>>>>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", >>>>>>>>>> but >>>>>>>>>> if I try to load it the same was I'd load my raw data (using the >>>>>>>>>> scheme file I created for Affy exon arrays) I get the error below? >>>>>>>>>> I >>>>>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps >>>>>>>>>> need >>>>>>>>>> a >>>>>>>>>> different "root scheme" file for this normalized data? >>>>>>>>>> Unfortunately, >>>>>>>>>> I haven't been able to find an answer. >>>>>>>>>> >>>>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>>>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Error in if (chipname != treetitle) { : argument is of length zero >>>>>>>>>> >>>>>>>>>> Hope someone can help, >>>>>>>>>> >>>>>>>>>> Paul. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> sessionInfo() >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> R version 2.11.0 (2010-04-22) >>>>>>>>>> x86_64-redhat-linux-gnu >>>>>>>>>> >>>>>>>>>> locale: >>>>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>>>>>> >>>>>>>>>> attached base packages: >>>>>>>>>> [1] stats graphics grDevices utils datasets methods >>>>>>>>>> base >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> > -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com

ADD REPLY • link 13.2 years ago Paul Geeleher ★ 1.3k

0

Entering edit mode

Dear Paul, From your mail it is not quite clear to me what you mean: where do you want to attach which unitname? Please try the following: scheme.test3 <- root.scheme(paste(.path.package("xps"),"schemes/SchemeTest3.root",sep= "/")) data.test3 <- root.data(scheme.test3, paste(.path.package("xps"),"rootdata/DataTest3_cel.root",sep="/")) id <- indexUnits(data.test3, which="", unitID="*") id[id[,1]==34,] Do you mean this? Best regards Christian On 2/21/12 1:23 PM, Paul Geeleher wrote: > Hi Christian, > > Thanks again for your very helpful replies. I think using the C++ > macros might be the way I go once I get everything ironed out. > > I have what I think is one last question. When try to convert gene > symbol to internal ID: > > allInternalGeneIds[[i]]<- symbol2unitID(scheme.huex10stv2 , > symbol=geneSym, unittype="transcript", as.list =TRUE) > > I get the following message: > slot ?unitname? is empty, importing data from ?HuEx-1_0-st-v2.idx? ... > > Which leads me to believe there is a way to attach "unitname"? I've > been unable to find how to to this though? I think if you know if this > is possible it could speed up my pipeline quite a bit, as I have to > repeat this step for every gene! > > Thanks again, > > Paul. > > On 2/14/12, cstrato<cstrato at="" aon.at=""> wrote: >> Dear Paul, >> >> You could export the whole data as table using: >> >> export(data_qn, treename="*", treetype="cqu", varlist="fInten", >> outfile="data_cqu.txt",as.dataframe=FALSE) >> >> Then you have a huge text-file, which you can then use for your >> purposes. However, since you do not have enough memory to work with R >> you would need to either import only parts of the table into R, or use >> other languages such as perl for further analysis. >> >> For the DABG data you could use function export.call(). >> >> Alternatively, you could also work completely from within ROOT, using >> C++ macros. Examples how to use xps from within ROOT are shown in >> directory xps/examples/macro4XPS.C >> >> Best regards >> Christian >> >> >> On 2/14/12 1:54 PM, Paul Geeleher wrote: >>> Hi Christian, >>> >>> Thanks for your replies. >>> >>> As per your suggestions I've been been able to map between xy >>> co-ordinates, probe IDs and PROBESET_IDS. >>> >>> I've been able to extract the expression level for a given gene (for a >>> subset of samples) using this: >>> >>> id<- symbol2unitID(scheme.huex10stv2 , symbol="CDH11", >>> unittype="transcript", as.list =TRUE) >>> treenames<- unlist(treeNames(data_qn)) >>> data_qn<- attachInten(data_qn, treenames=treenames[1:3]) >>> xyIntens<- validData(data_qn, unitID=unlist(id)) >>> >>> With "validData()" is that it seems I can only get the expression >>> levels for samples which are "attached". This means I'd probably have >>> to attach each sample separately (which will take a long time) as I >>> don't have enough memory to attach all samples. >>> >>> I'm wondering if there is a way to extract this data without having to >>> attach the data? If something similar is possible for the dabg data >>> (at probeset level) that would also be a big help in speeding up my >>> pipeline (which will be complete if I can implement these last two >>> steps)! >>> >>> Thanks again for all your help its much appreciated, >>> >>> Paul. >>> >>> >>> On Tue, Feb 7, 2012 at 8:48 PM, cstrato<cstrato at="" aon.at=""> wrote: >>>> Dear Paul, >>>> >>>> see replies below: >>>> >>>> >>>> On 2/7/12 1:52 PM, Paul Geeleher wrote: >>>>> >>>>> Hi Christian, apologies for the lack of clarity in my previous email, >>>>> I'll try to clear this up, see replies below: >>>>> >>>>> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at="" aon.at=""> wrote: >>>>>> >>>>>> Dear Paul, >>>>>> >>>>>> I am afraid that it is not quite clear to me what you want to do. >>>>>> >>>>>> What do you mean with "obtain the expression levels of the probes"? Do >>>>>> you >>>>>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your >>>>>> question >>>>>> about DABG? >>>>> >>>>> >>>>> I've read in and quantiles normalized probe level data for 176 arrays >>>>> and what I'm trying to do now is set every probe (individual probe, >>>>> not probeset) that is not detected above background to zero. But from >>>>> what I've read dabg.call() can only create p-values for probeset >>>>> level? So as compromise I'm now trying to set the expression of all >>>>> *individual probes* which belong to a *probeset* not detected above >>>>> background to zero. I hope that makes sense. >>>>> >>>> >>>> >>>> Did you read the exon array whitepapers which you can download from >>>> Affymetrix? >>>> >>>> Every probeset consists of 1-4 probes only, and every exon consists >>>> usually >>>> of 1-2 probesets. Each gene has a transcript_cluster_id, which consists >>>> of >>>> one or more probeset_ids. (You can see the mapping between ids using >>>> function export.scheme(..,treetype="anp",..) >>>> >>>> Since the smallest unit is the probeset, function dabg.call() will only >>>> work >>>> at the probeset (and transcript) level. If you set all probes of a >>>> probeset >>>> to zero you may loose an entire exon. >>>> >>>> >>>> >>>>>> >>>>>> How did you create "data_hapMap"? Maybe you could send me your complete >>>>>> code >>>>>> otherwise I am only able to guess. >>>>> >>>>> >>>>> "data_hapMap" was a mistake, sorry I copied the wrong object name it >>>>> should have been "data_qn" which is quantiles normalized probe level >>>>> data that I created like this: >>>>> >>>>> data_hapMap<- root.data(scheme.huex10stv2, >>>>> "/data2/paul/normalization_project/root_data/hapMap_root_data_ce l.root") >>>>> #read raw data >>>>> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles", >>>>> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe* >>>>> level >>>>> >>>>> So "data_qn" is the object I want to work with, based on your advice >>>>> I've figured out that to access the expression levels in "data_qn" I >>>>> need to use "attachInten()", then I can use "intensities()" to access >>>>> the expression levels: >>>>> >>>>> treenames<- unlist(treeNames(data_qn)) >>>>> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach the >>>>> first two samples >>>>> >>>>>> head(intensity(data_qn)) >>>>> >>>>> X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN >>>>> 1 0 0 0.0000 0.000 >>>>> 2 1 0 0.0000 0.000 >>>>> 3 2 0 0.0000 0.000 >>>>> 4 3 0 0.0000 0.000 >>>>> 5 4 0 0.0000 0.000 >>>>> 6 5 0 85.3523 121.733 >>>>> >>>>> Does the X column in the output above represent the probe ID? If so >>>>> and I have a mapping of probeset IDs to corresponding probe IDs, it >>>>> should be fairly straightforward to set probes that are not detected >>>>> above background to zero? Perhaps there is a more straightforward way >>>>> of doing this though? >>>>> >>>> >>>> No, (X,Y) are the coordinates of the single probes on the exon array. You >>>> can use function export.scheme(..,treetype="scm",..) to get the mapping >>>> between (X,Y) and an internal PROBESET_ID, e.g.: >>>> >>>> UNIT_ID X Y ProbeLength Mask EXON_ID PROBESET_ID >>>> 31 986 1674 25 512 31 31 >>>> 31 1092 677 25 512 31 31 >>>> 31 796 1862 25 512 31 31 >>>> 31 917 193 25 512 31 31 >>>> 31 341 1677 25 512 32 32 >>>> 31 144 2250 25 512 32 32 >>>> 31 689 262 25 512 32 32 >>>> 31 579 1670 25 512 32 32 >>>> >>>> Then you can use export.scheme(..,treetype="pbs",..) to map PROBESET_ID >>>> (=UNIT_ID) to the ProbesetID, e.g.: >>>> >>>> UNIT_ID ProbesetID NumCells NumAtoms NumSubunits >>>> UnitType >>>> 31 2315101 4 4 1 512 >>>> 32 2315102 4 4 1 512 >>>> >>>> As you see PROBESET_IDs 31 and 32 have each 4 probes and belong to the >>>> Affymetrix ProbesetIDs 2315101 and 2315102, respectively. >>>> >>>> You could also use functions indexUnits(), and unitID2probesetID() or >>>> unitID2transcriptID(), respectively. >>>> >>>> Best regards >>>> Christian >>>> >>>> >>>> >>>>> >>>>>> >>>>>> I guess that "data_hapMap" contains the raw data. For these the slot >>>>>> "data" >>>>>> is empty to save memory. So you need to use either attachData() or >>>>>> attachInten(). However since you are using exon arrays you may not have >>>>>> enough RAM, so it would be better to use function export() or >>>>>> export.data(), >>>>>> or attach only a subset, see help ?attachData. See also vignette >>>>>> xps.pdf >>>>>> (chapter 2.3). >>>>> >>>>> >>>>> I think RAM shouldn't be an issue if I attach the samples one at a >>>>> time? (I actually have access to a machine with 32gigs RAM but ideally >>>>> would like to get what I'm doing to run on a standard desktop, which >>>>> is actually why I'm using XPS!). >>>>> >>>>>> >>>>>> When you talk about "expression matrix", how did you create it? Maybe >>>>>> you >>>>>> could use function validExpr(), but w/o seeing your code it is hard to >>>>>> tell. >>>>>> For DABG there are functions pvalData() and presCall(), see the >>>>>> examples >>>>>> in >>>>>> help ?dabg.call. >>>>> >>>>> >>>>> Yes I've managed to use dabg.call() at probeset level and access the >>>>> p-values using pvalData() alright. >>>>> >>>>> Thanks again for all of your help and patience! >>>>> >>>>> Kind Regards, >>>>> >>>>> Paul. >>>>> >>>>> >>>>>> >>>>>> Best regards >>>>>> Christian >>>>>> >>>>>> >>>>>> >>>>>> On 2/6/12 4:33 PM, Paul Geeleher wrote: >>>>>>> >>>>>>> >>>>>>> Hi Christian, >>>>>>> >>>>>>> Thanks for your quick and informative reply. >>>>>>> >>>>>>> I have re-run the analysis and saved the R objects as you suggest. The >>>>>>> next thing I'm trying to do is to obtain the expression levels of the >>>>>>> probes, but this doesn't seem to be working for me: >>>>>>> >>>>>>>> a<- validData(data_hapMap) >>>>>>> >>>>>>> >>>>>>> Error in .local(object, ...) : slot "data" has no data >>>>>>> >>>>>>> Based on the documentation I think validData() is the correct >>>>>>> function. >>>>>>> >>>>>>> I've also performed probeset level DABG and I'm trying to set >>>>>>> individual probes which belong to probesets with DABG< .05 to 0 >>>>>>> in >>>>>>> the expression matrix. >>>>>>> >>>>>>> But it seems I can't see the expression matrix using validData(). >>>>>>> Perhaps there is another function. Any ideas? >>>>>>> >>>>>>> Thank you again for your help with this, I'm very grateful! >>>>>>> >>>>>>> Paul. >>>>>>> >>>>>>> On 2/2/12, cstrato<cstrato at="" aon.at=""> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Dear Paul, >>>>>>>> >>>>>>>> The functions root.data(), root.call() and root.expr() were created >>>>>>>> to >>>>>>>> allow you access to the corresponding root files just in case that >>>>>>>> you >>>>>>>> did not save your R session. >>>>>>>> >>>>>>>> In the cases where you compute expression levels stepwise, or only >>>>>>>> part >>>>>>>> of them such as normalize.quantiles(), as seems to be the matter in >>>>>>>> your >>>>>>>> case, there is no corresponding root.xxx() function to access the >>>>>>>> root >>>>>>>> file directly. In these cases you need to save your R session to have >>>>>>>> continued access to the resulting root file. >>>>>>>> >>>>>>>> Please note that saving the R session is the usual case to have >>>>>>>> access >>>>>>>> to the root files. >>>>>>>> >>>>>>>> Best regards >>>>>>>> Christian >>>>>>>> >>>>>>>> >>>>>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Christian, >>>>>>>>> >>>>>>>>> Thanks for your quick reply. I check what kind of trees I have using >>>>>>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu" >>>>>>>>> rather than "int", this is presumably because my analysis required >>>>>>>>> no >>>>>>>>> background correction step? >>>>>>>>> >>>>>>>>> So I then tried: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", >>>>>>>>>> "cqu") >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> but that gives me a huge number of errors that look like this: >>>>>>>>> >>>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>>>>> Error: Could not get directory<preprocesset>. >>>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>>>>> Error: Could not get directory<preprocesset>. >>>>>>>>> Error in<tfile::cd>: Unknown directory PreprocesSet >>>>>>>>> Error: Could not get directory<preprocesset>. >>>>>>>>> Error: Could not get tree<exportset>. >>>>>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root", : >>>>>>>>> error in function ?ExportData? >>>>>>>>> >>>>>>>>> >>>>>>>>> This file "exon_quantiles.root" definitely exists in the current >>>>>>>>> working directory though... Thanks again for your help! >>>>>>>>> >>>>>>>>> Paul. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at="" aon.at=""> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Dear Paul, >>>>>>>>>> >>>>>>>>>> Please have a look at the help ?root.expr. >>>>>>>>>> >>>>>>>>>> If I understand you correctly, you did only do quantile >>>>>>>>>> normalization? >>>>>>>>>> >>>>>>>>>> To see the tree names in your file you should do: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> getTreeNames("exon_quantiles.root") >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You will probably see trees with extension "int", see help >>>>>>>>>> ?validTreetype. >>>>>>>>>> >>>>>>>>>> To load these trees you need to do: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", >>>>>>>>>>> "int") >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Please let me know if this did solve your problem. >>>>>>>>>> >>>>>>>>>> Best regards >>>>>>>>>> Christian >>>>>>>>>> _._._._._._._._._._._._._._._._._._ >>>>>>>>>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >>>>>>>>>> V.i.e.n.n.a A.u.s.t.r.i.a >>>>>>>>>> e.m.a.i.l: cstrato at aon.at >>>>>>>>>> _._._._._._._._._._._._._._._._._._ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I've used xps to quantiles normalize (at probe level) some Affy >>>>>>>>>>> Exon >>>>>>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", >>>>>>>>>>> but >>>>>>>>>>> if I try to load it the same was I'd load my raw data (using the >>>>>>>>>>> scheme file I created for Affy exon arrays) I get the error below? >>>>>>>>>>> I >>>>>>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps >>>>>>>>>>> need >>>>>>>>>>> a >>>>>>>>>>> different "root scheme" file for this normalized data? >>>>>>>>>>> Unfortunately, >>>>>>>>>>> I haven't been able to find an answer. >>>>>>>>>>> >>>>>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root") >>>>>>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root") >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Error in if (chipname != treetitle) { : argument is of length zero >>>>>>>>>>> >>>>>>>>>>> Hope someone can help, >>>>>>>>>>> >>>>>>>>>>> Paul. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> sessionInfo() >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> R version 2.11.0 (2010-04-22) >>>>>>>>>>> x86_64-redhat-linux-gnu >>>>>>>>>>> >>>>>>>>>>> locale: >>>>>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>>>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>>>>>>> >>>>>>>>>>> attached base packages: >>>>>>>>>>> [1] stats graphics grDevices utils datasets methods >>>>>>>>>>> base >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> > >

ADD REPLY • link 13.2 years ago cstrato ★ 3.9k