Oligo package: paCalls question
1
0
Entering edit mode
@alison-ziesel-2031
Last seen 9.4 years ago
United States
Hello all, I?ve got very basic familiarity with the oligo package, but I?ve ran into a bit of trouble with the paCalls function. I?d like to restrict the probes in my eSet object to only those called present. Here?s what I?ve done thus far: >library(?oligo?) >library("pd.hugene.2.0.st?) >allchips <- read.celfiles(?my list of cel files?) >allchipspa <-paCalls(allchips, method=?PSDABG?, verbose=TRUE) >present <- rownames(allchipspa) >allchips_present <- exprs(allchips[present,]) Error in exprs(allchips[present, ]) : error in evaluating the argument 'object' in selecting a method for function 'exprs': Error in orig[[nm]][i, , ..., drop = drop] : subscript out of bounds Clearly my last line doesn?t do what I want it to do: filter the object allchips versus the list of present probes in the object present. I suppose at this point I should also confirm that my object allchipspa is just the present probes! What?s the correct way to perform this filter step? I?d be grateful for any insight you may have. > sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] pd.hugene.2.0.st_3.8.1 RSQLite_0.11.4 DBI_0.2-7 oligo_1.28.2 [5] Biostrings_2.32.1 XVector_0.4.0 IRanges_1.22.10 Biobase_2.24.0 [9] oligoClasses_1.26.0 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] affxparser_1.36.0 affyio_1.32.0 BiocInstaller_1.14.2 bit_1.1-12 [5] codetools_0.2-8 ff_2.2-13 foreach_1.4.2 GenomeInfoDb_1.0.2 [9] GenomicRanges_1.16.4 iterators_1.0.7 preprocessCore_1.26.1 splines_3.1.1 [13] stats4_3.1.1 tools_3.1.1 zlibbioc_1.10.0 alison Alison Ziesel Department of Ophthalmology, Emory University aziesel at emory.edu
oligo • 1.8k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 15 hours ago
United States
HI Alison, When you do paCalls with method = "PSDABG", you are getting P/A calls at the 'probeset' level (which in Affymetrix terms is the probe set region or PSR, which roughly corresponds to exon-level). But when you do rma with target = "core", you are summarizing data at the 'core' or transcript level (e.g., at this level all the PSRs that interrogate a given gene are summarized together). So you cannot use oligo to generate P/A calls at the transcript level. It's my understanding that you can do this with xps, but I am unfamiliar with that package. However, a quick search of the BioC list using 'stratowa pacalls' got as the first hit this message: https://stat.ethz.ch/pipermail/bioconductor/2014-August/060977.html Best, Jim On Thu, Sep 4, 2014 at 1:36 PM, Alison Ziesel <aziesel at="" emory.edu=""> wrote: > Hello all, > > I?ve got very basic familiarity with the oligo package, but I?ve ran into > a bit of trouble with the paCalls function. I?d like to restrict the probes > in my eSet object to only those called present. Here?s what I?ve done thus > far: > > >library(?oligo?) > >library("pd.hugene.2.0.st?) > >allchips <- read.celfiles(?my list of cel files?) > > >allchipspa <-paCalls(allchips, method=?PSDABG?, verbose=TRUE) > >present <- rownames(allchipspa) > >allchips_present <- exprs(allchips[present,]) > > Error in exprs(allchips[present, ]) : > error in evaluating the argument 'object' in selecting a method for > function 'exprs': Error in orig[[nm]][i, , ..., drop = drop] : subscript > out of bounds > > Clearly my last line doesn?t do what I want it to do: filter the object > allchips versus the list of present probes in the object present. I suppose > at this point I should also confirm that my object allchipspa is just the > present probes! What?s the correct way to perform this filter step? I?d be > grateful for any insight you may have. > > > sessionInfo() > R version 3.1.1 (2014-07-10) > Platform: x86_64-apple-darwin13.1.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] pd.hugene.2.0.st_3.8.1 RSQLite_0.11.4 DBI_0.2-7 > oligo_1.28.2 > [5] Biostrings_2.32.1 XVector_0.4.0 IRanges_1.22.10 > Biobase_2.24.0 > [9] oligoClasses_1.26.0 BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] affxparser_1.36.0 affyio_1.32.0 BiocInstaller_1.14.2 > bit_1.1-12 > [5] codetools_0.2-8 ff_2.2-13 foreach_1.4.2 > GenomeInfoDb_1.0.2 > [9] GenomicRanges_1.16.4 iterators_1.0.7 preprocessCore_1.26.1 > splines_3.1.1 > [13] stats4_3.1.1 tools_3.1.1 zlibbioc_1.10.0 > > alison > > Alison Ziesel > Department of Ophthalmology, Emory University > aziesel at emory.edu > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hello Jim, Thanks so much for the speedy reply, I appreciate it. At this point in my workflow I haven?t yet performed RMA. That sounds like mistake #1 on my part. I too had found that post by Dr. Stratowa but I confess I didn?t entirely understand it. I am willing to exclude absent probes at the probeset level rather than transcript level for this project. So something like: >library(?oligo?) >library(?pd.hugene.2.0.st?) >allchips <- read.celfiles(?my list of cel files?) >eset <- rma(allchips_present, target='core') >featureData(eset) <- getNetAffx(eset, 'transcript') >present <- paCalls(eset, ?DABG?) >index <- apply(present, 1, function(x) sum(x < 0.01) > 5) >eset_present <- eset[index,] If I?m understanding this correctly, the ?index? object should be those probes whose X (p-value?) is less than 0.01? Thank you again for your advice! alison On Sep 4, 2014, at 2:05 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > HI Alison, > > When you do paCalls with method = "PSDABG", you are getting P/A calls at the 'probeset' level (which in Affymetrix terms is the probe set region or PSR, which roughly corresponds to exon-level). But when you do rma with target = "core", you are summarizing data at the 'core' or transcript level (e.g., at this level all the PSRs that interrogate a given gene are summarized together). > > So you cannot use oligo to generate P/A calls at the transcript level. It's my understanding that you can do this with xps, but I am unfamiliar with that package. However, a quick search of the BioC list using 'stratowa pacalls' got as the first hit this message: https://stat.ethz.ch/pipermail/bioconductor/2014-August/060977.html > > Best, > > Jim > > > On Thu, Sep 4, 2014 at 1:36 PM, Alison Ziesel <aziesel at="" emory.edu=""> wrote: > Hello all, > > I?ve got very basic familiarity with the oligo package, but I?ve ran into a bit of trouble with the paCalls function. I?d like to restrict the probes in my eSet object to only those called present. Here?s what I?ve done thus far: > > >library(?oligo?) > >library("pd.hugene.2.0.st?) > >allchips <- read.celfiles(?my list of cel files?) > > >allchipspa <-paCalls(allchips, method=?PSDABG?, verbose=TRUE) > >present <- rownames(allchipspa) > >allchips_present <- exprs(allchips[present,]) > > Error in exprs(allchips[present, ]) : > error in evaluating the argument 'object' in selecting a method for function 'exprs': Error in orig[[nm]][i, , ..., drop = drop] : subscript out of bounds > > Clearly my last line doesn?t do what I want it to do: filter the object allchips versus the list of present probes in the object present. I suppose at this point I should also confirm that my object allchipspa is just the present probes! What?s the correct way to perform this filter step? I?d be grateful for any insight you may have. > > > sessionInfo() > R version 3.1.1 (2014-07-10) > Platform: x86_64-apple-darwin13.1.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] pd.hugene.2.0.st_3.8.1 RSQLite_0.11.4 DBI_0.2-7 oligo_1.28.2 > [5] Biostrings_2.32.1 XVector_0.4.0 IRanges_1.22.10 Biobase_2.24.0 > [9] oligoClasses_1.26.0 BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] affxparser_1.36.0 affyio_1.32.0 BiocInstaller_1.14.2 bit_1.1-12 > [5] codetools_0.2-8 ff_2.2-13 foreach_1.4.2 GenomeInfoDb_1.0.2 > [9] GenomicRanges_1.16.4 iterators_1.0.7 preprocessCore_1.26.1 splines_3.1.1 > [13] stats4_3.1.1 tools_3.1.1 zlibbioc_1.10.0 > > alison > > Alison Ziesel > Department of Ophthalmology, Emory University > aziesel at emory.edu > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Alison, On Thu, Sep 4, 2014 at 2:50 PM, Alison Ziesel <aziesel at="" emory.edu=""> wrote: > Hello Jim, > > Thanks so much for the speedy reply, I appreciate it. > > At this point in my workflow I haven?t yet performed RMA. That sounds like > mistake #1 on my part. > > I too had found that post by Dr. Stratowa but I confess I didn?t entirely > understand it. I am willing to exclude absent probes at the probeset level > rather than transcript level for this project. So something like: > > >library(?oligo?) > >library(?pd.hugene.2.0.st?) > >allchips <- read.celfiles(?my list of cel files?) > >eset <- rma(allchips_present, target='core') > >featureData(eset) <- getNetAffx(eset, 'transcript') > > >present <- paCalls(eset, ?DABG?) > >index <- apply(present, 1, function(x) sum(x < 0.01) > 5) > >eset_present <- eset[index,] > Where did you get this code? It's not doing what you think. Please note that 'DABG' is computing P/A calls at a lower level than 'PSDABG' (e.g., DAGB is computing at the probe level, and PSDABG is computing at the probeset level), so this is even more wrong than what you were doing before. As an example, let's look at this using some Mouse Gene ST 2.0 arrays I have in hand: > dat <- read.celfiles(filenames = list.celfiles()) Loading required package: pd.mogene.2.0.st Loading required package: RSQLite Loading required package: DBI Platform design info loaded. Reading in : MM_2528.CEL Reading in : MM_2532.CEL Reading in : MM_2536.CEL Reading in : MM_2583.CEL Reading in : MM_2588.CEL Reading in : MM_2591.CEL > dim(paCalls(dat, "DABG")) Computing DABG calls... OK [1] 767405 6 > dim(paCalls(dat, "PSDABG")) Computing DABG calls... OK [1] 269751 6 > eset <- rma(dat) Background correcting Normalizing Calculating Expression > dim(eset) Features Samples 41345 6 So the 'present' matrix that you are using above has something like 700K rows, whereas your expression set has around 50K rows. Let's back up a bit and clarify what you are doing, and how these arrays work. At the most basic level there is the probe, which is a 25-mer intended to measure the expression of a particular gene. Why they are 25-mers is beyond the scope of this email, but note that this is a ridiculously short chunk of DNA to reliably hybridize to its matching sequence. Affy knows this as well, so they use a set of these probes to measure a given transcript (or portion thereof). The next level is known in Affy parlance as the probe set region (PSR), which is usually made up of 4 probes (at least on the Exon arrays), but for the Gene ST arrays a given PSR may only contain one or two probes (note that the Gene ST arrays came out after the Exon ST arrays, and contain the more reliable probes that they used on the Exon arrays. The PSR comes from the Exon array format, and was originally intended to measure an exon, or a portion thereof.). The PSR level is also known as the 'probeset' level, which was the original summarization level for the Exon arrays. Since there are many (many!) probesets on the Gene ST arrays that have only one or two probes, I would argue against summarizing at this level. The next level is the transcript level (or 'core' level in oligo). At this level, all probes that interrogate any portion of a gene are combined together, and then summarized, to give one single measure for the expression level of that gene. This is IMO the only trustworthy level of summarization for these arrays. When you do paCalls with method = 'DABG', what you are doing is comparing the intensity for each probe to some measure of background, and trying to decide if the given probe is binding transcript at a level that is arguably higher than background. If you use method = "PSDABG", then you are taking all the P/A calls from the 'DABG' level, combining them at the PSR (probeset) level, and seeing if *in aggregate* they appear to be binding transcript at a level that is arguably higher than background. In the oligo package, there is no way to compute P/A calls at the transcript (or core) level, so you cannot use P/A calls to filter probesets if you have summarized at the transcript level. What you have done with your code above is to compute P/A calls at the probe level, then made an index where you say at least five of the probes have to have a p < 0.01. This is fine, I suppose, but you cannot use that index vector for anything. In other words, the ExpressionSet called 'eset' contains the data after summarizing the original 700k probes into about 50k gene-level probesets. So you can't use the P/A calls for the individual probes to subset the ExpressionSet because there isn't a one-to-one correspondence between the two. If you are really wedded to the idea of using P/A calls to filter your data, you will have to switch to the xps package. Otherwise you will have to use some other way of filtering out low/unexpressed data. Best, Jim > > If I?m understanding this correctly, the ?index? object should be those > probes whose X (p-value?) is less than 0.01? > > Thank you again for your advice! > > alison > > On Sep 4, 2014, at 2:05 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > > HI Alison, > > When you do paCalls with method = "PSDABG", you are getting P/A calls at > the 'probeset' level (which in Affymetrix terms is the probe set region or > PSR, which roughly corresponds to exon-level). But when you do rma with > target = "core", you are summarizing data at the 'core' or transcript level > (e.g., at this level all the PSRs that interrogate a given gene are > summarized together). > > So you cannot use oligo to generate P/A calls at the transcript level. > It's my understanding that you can do this with xps, but I am unfamiliar > with that package. However, a quick search of the BioC list using 'stratowa > pacalls' got as the first hit this message: > https://stat.ethz.ch/pipermail/bioconductor/2014-August/060977.html > > Best, > > Jim > > > On Thu, Sep 4, 2014 at 1:36 PM, Alison Ziesel <aziesel at="" emory.edu=""> wrote: > >> Hello all, >> >> I?ve got very basic familiarity with the oligo package, but I?ve ran into >> a bit of trouble with the paCalls function. I?d like to restrict the probes >> in my eSet object to only those called present. Here?s what I?ve done thus >> far: >> >> >library(?oligo?) >> >library("pd.hugene.2.0.st?) >> >allchips <- read.celfiles(?my list of cel files?) >> >> >allchipspa <-paCalls(allchips, method=?PSDABG?, verbose=TRUE) >> >present <- rownames(allchipspa) >> >allchips_present <- exprs(allchips[present,]) >> >> Error in exprs(allchips[present, ]) : >> error in evaluating the argument 'object' in selecting a method for >> function 'exprs': Error in orig[[nm]][i, , ..., drop = drop] : subscript >> out of bounds >> >> Clearly my last line doesn?t do what I want it to do: filter the object >> allchips versus the list of present probes in the object present. I suppose >> at this point I should also confirm that my object allchipspa is just the >> present probes! What?s the correct way to perform this filter step? I?d be >> grateful for any insight you may have. >> >> > sessionInfo() >> R version 3.1.1 (2014-07-10) >> Platform: x86_64-apple-darwin13.1.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] pd.hugene.2.0.st_3.8.1 RSQLite_0.11.4 DBI_0.2-7 >> oligo_1.28.2 >> [5] Biostrings_2.32.1 XVector_0.4.0 IRanges_1.22.10 >> Biobase_2.24.0 >> [9] oligoClasses_1.26.0 BiocGenerics_0.10.0 >> >> loaded via a namespace (and not attached): >> [1] affxparser_1.36.0 affyio_1.32.0 BiocInstaller_1.14.2 >> bit_1.1-12 >> [5] codetools_0.2-8 ff_2.2-13 foreach_1.4.2 >> GenomeInfoDb_1.0.2 >> [9] GenomicRanges_1.16.4 iterators_1.0.7 preprocessCore_1.26.1 >> splines_3.1.1 >> [13] stats4_3.1.1 tools_3.1.1 zlibbioc_1.10.0 >> >> alison >> >> Alison Ziesel >> Department of Ophthalmology, Emory University >> aziesel at emory.edu >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hello again Jim, I adapted that second segment of code (poorly) from the page you had linked earlier, the August 2014 commentary from Dr. Stratowa. I confess I was reaching a bit with it. Thank you for clarifying where I was going wrong; I was operating on some very poor assumptions, it seems. I have had some difficulty with getting xps running, but it did have both the P/A and I/NI call functions I was interested in using. I usually don?t employ either of those calls, but I have been given an unusual data set with very low quality RNA and high variability within groups, so I?m looking at every way I can to tamp down noise in the dataset. Thank you again for the quick and detailed responses! alison On Sep 4, 2014, at 3:41 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Alison, > > > On Thu, Sep 4, 2014 at 2:50 PM, Alison Ziesel <aziesel at="" emory.edu=""> wrote: > Hello Jim, > > Thanks so much for the speedy reply, I appreciate it. > > At this point in my workflow I haven?t yet performed RMA. That sounds like mistake #1 on my part. > > I too had found that post by Dr. Stratowa but I confess I didn?t entirely understand it. I am willing to exclude absent probes at the probeset level rather than transcript level for this project. So something like: > > >library(?oligo?) > >library(?pd.hugene.2.0.st?) > >allchips <- read.celfiles(?my list of cel files?) > >eset <- rma(allchips_present, target='core') > >featureData(eset) <- getNetAffx(eset, 'transcript') > > >present <- paCalls(eset, ?DABG?) > >index <- apply(present, 1, function(x) sum(x < 0.01) > 5) > >eset_present <- eset[index,] > > Where did you get this code? It's not doing what you think. > > Please note that 'DABG' is computing P/A calls at a lower level than 'PSDABG' (e.g., DAGB is computing at the probe level, and PSDABG is computing at the probeset level), so this is even more wrong than what you were doing before. > > As an example, let's look at this using some Mouse Gene ST 2.0 arrays I have in hand: > > > dat <- read.celfiles(filenames = list.celfiles()) > Loading required package: pd.mogene.2.0.st > Loading required package: RSQLite > Loading required package: DBI > Platform design info loaded. > Reading in : MM_2528.CEL > Reading in : MM_2532.CEL > Reading in : MM_2536.CEL > Reading in : MM_2583.CEL > Reading in : MM_2588.CEL > Reading in : MM_2591.CEL > > dim(paCalls(dat, "DABG")) > Computing DABG calls... OK > [1] 767405 6 > > dim(paCalls(dat, "PSDABG")) > Computing DABG calls... OK > [1] 269751 6 > > eset <- rma(dat) > Background correcting > Normalizing > Calculating Expression > > dim(eset) > Features Samples > 41345 6 > > So the 'present' matrix that you are using above has something like 700K rows, whereas your expression set has around 50K rows. > > Let's back up a bit and clarify what you are doing, and how these arrays work. > > At the most basic level there is the probe, which is a 25-mer intended to measure the expression of a particular gene. Why they are 25-mers is beyond the scope of this email, but note that this is a ridiculously short chunk of DNA to reliably hybridize to its matching sequence. Affy knows this as well, so they use a set of these probes to measure a given transcript (or portion thereof). > > The next level is known in Affy parlance as the probe set region (PSR), which is usually made up of 4 probes (at least on the Exon arrays), but for the Gene ST arrays a given PSR may only contain one or two probes (note that the Gene ST arrays came out after the Exon ST arrays, and contain the more reliable probes that they used on the Exon arrays. The PSR comes from the Exon array format, and was originally intended to measure an exon, or a portion thereof.). > > The PSR level is also known as the 'probeset' level, which was the original summarization level for the Exon arrays. Since there are many (many!) probesets on the Gene ST arrays that have only one or two probes, I would argue against summarizing at this level. The next level is the transcript level (or 'core' level in oligo). At this level, all probes that interrogate any portion of a gene are combined together, and then summarized, to give one single measure for the expression level of that gene. This is IMO the only trustworthy level of summarization for these arrays. > > When you do paCalls with method = 'DABG', what you are doing is comparing the intensity for each probe to some measure of background, and trying to decide if the given probe is binding transcript at a level that is arguably higher than background. If you use method = "PSDABG", then you are taking all the P/A calls from the 'DABG' level, combining them at the PSR (probeset) level, and seeing if in aggregate they appear to be binding transcript at a level that is arguably higher than background. In the oligo package, there is no way to compute P/A calls at the transcript (or core) level, so you cannot use P/A calls to filter probesets if you have summarized at the transcript level. > > What you have done with your code above is to compute P/A calls at the probe level, then made an index where you say at least five of the probes have to have a p < 0.01. This is fine, I suppose, but you cannot use that index vector for anything. In other words, the ExpressionSet called 'eset' contains the data after summarizing the original 700k probes into about 50k gene-level probesets. So you can't use the P/A calls for the individual probes to subset the ExpressionSet because there isn't a one-to-one correspondence between the two. > > If you are really wedded to the idea of using P/A calls to filter your data, you will have to switch to the xps package. Otherwise you will have to use some other way of filtering out low/unexpressed data. > > Best, > > Jim > > > > > > > > > If I?m understanding this correctly, the ?index? object should be those probes whose X (p-value?) is less than 0.01? > > Thank you again for your advice! > > alison > > On Sep 4, 2014, at 2:05 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > >> HI Alison, >> >> When you do paCalls with method = "PSDABG", you are getting P/A calls at the 'probeset' level (which in Affymetrix terms is the probe set region or PSR, which roughly corresponds to exon-level). But when you do rma with target = "core", you are summarizing data at the 'core' or transcript level (e.g., at this level all the PSRs that interrogate a given gene are summarized together). >> >> So you cannot use oligo to generate P/A calls at the transcript level. It's my understanding that you can do this with xps, but I am unfamiliar with that package. However, a quick search of the BioC list using 'stratowa pacalls' got as the first hit this message: https://stat.ethz.ch/pipermail/bioconductor/2014-August/060977.html >> >> Best, >> >> Jim >> >> >> On Thu, Sep 4, 2014 at 1:36 PM, Alison Ziesel <aziesel at="" emory.edu=""> wrote: >> Hello all, >> >> I?ve got very basic familiarity with the oligo package, but I?ve ran into a bit of trouble with the paCalls function. I?d like to restrict the probes in my eSet object to only those called present. Here?s what I?ve done thus far: >> >> >library(?oligo?) >> >library("pd.hugene.2.0.st?) >> >allchips <- read.celfiles(?my list of cel files?) >> >> >allchipspa <-paCalls(allchips, method=?PSDABG?, verbose=TRUE) >> >present <- rownames(allchipspa) >> >allchips_present <- exprs(allchips[present,]) >> >> Error in exprs(allchips[present, ]) : >> error in evaluating the argument 'object' in selecting a method for function 'exprs': Error in orig[[nm]][i, , ..., drop = drop] : subscript out of bounds >> >> Clearly my last line doesn?t do what I want it to do: filter the object allchips versus the list of present probes in the object present. I suppose at this point I should also confirm that my object allchipspa is just the present probes! What?s the correct way to perform this filter step? I?d be grateful for any insight you may have. >> >> > sessionInfo() >> R version 3.1.1 (2014-07-10) >> Platform: x86_64-apple-darwin13.1.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] pd.hugene.2.0.st_3.8.1 RSQLite_0.11.4 DBI_0.2-7 oligo_1.28.2 >> [5] Biostrings_2.32.1 XVector_0.4.0 IRanges_1.22.10 Biobase_2.24.0 >> [9] oligoClasses_1.26.0 BiocGenerics_0.10.0 >> >> loaded via a namespace (and not attached): >> [1] affxparser_1.36.0 affyio_1.32.0 BiocInstaller_1.14.2 bit_1.1-12 >> [5] codetools_0.2-8 ff_2.2-13 foreach_1.4.2 GenomeInfoDb_1.0.2 >> [9] GenomicRanges_1.16.4 iterators_1.0.7 preprocessCore_1.26.1 splines_3.1.1 >> [13] stats4_3.1.1 tools_3.1.1 zlibbioc_1.10.0 >> >> alison >> >> Alison Ziesel >> Department of Ophthalmology, Emory University >> aziesel at emory.edu >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear Alison, Since you mention that you had some difficulty with getting xps running, could you please tell me what your problem was. From your sessionInfo() I see that you are using a Mac running x86_64-apple-darwin13.1.0. If this is Yosemite, as I assume, you know that it is not even available yet, and thus you need to compile root yourself and install xps from source. Best regards, Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 9/4/14 9:49 PM, Alison Ziesel wrote: > Hello again Jim, > > I adapted that second segment of code (poorly) from the page you had linked earlier, the August 2014 commentary from Dr. Stratowa. I confess I was reaching a bit with it. > > Thank you for clarifying where I was going wrong; I was operating on some very poor assumptions, it seems. I have had some difficulty with getting xps running, but it did have both the P/A and I/NI call functions I was interested in using. I usually don?t employ either of those calls, but I have been given an unusual data set with very low quality RNA and high variability within groups, so I?m looking at every way I can to tamp down noise in the dataset. > > Thank you again for the quick and detailed responses! > > alison > > > On Sep 4, 2014, at 3:41 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > >> Hi Alison, >> >> >> On Thu, Sep 4, 2014 at 2:50 PM, Alison Ziesel <aziesel at="" emory.edu=""> wrote: >> Hello Jim, >> >> Thanks so much for the speedy reply, I appreciate it. >> >> At this point in my workflow I haven?t yet performed RMA. That sounds like mistake #1 on my part. >> >> I too had found that post by Dr. Stratowa but I confess I didn?t entirely understand it. I am willing to exclude absent probes at the probeset level rather than transcript level for this project. So something like: >> >>> library(?oligo?) >>> library(?pd.hugene.2.0.st?) >>> allchips <- read.celfiles(?my list of cel files?) >>> eset <- rma(allchips_present, target='core') >>> featureData(eset) <- getNetAffx(eset, 'transcript') >> >>> present <- paCalls(eset, ?DABG?) >>> index <- apply(present, 1, function(x) sum(x < 0.01) > 5) >>> eset_present <- eset[index,] >> >> Where did you get this code? It's not doing what you think. >> >> Please note that 'DABG' is computing P/A calls at a lower level than 'PSDABG' (e.g., DAGB is computing at the probe level, and PSDABG is computing at the probeset level), so this is even more wrong than what you were doing before. >> >> As an example, let's look at this using some Mouse Gene ST 2.0 arrays I have in hand: >> >>> dat <- read.celfiles(filenames = list.celfiles()) >> Loading required package: pd.mogene.2.0.st >> Loading required package: RSQLite >> Loading required package: DBI >> Platform design info loaded. >> Reading in : MM_2528.CEL >> Reading in : MM_2532.CEL >> Reading in : MM_2536.CEL >> Reading in : MM_2583.CEL >> Reading in : MM_2588.CEL >> Reading in : MM_2591.CEL >>> dim(paCalls(dat, "DABG")) >> Computing DABG calls... OK >> [1] 767405 6 >>> dim(paCalls(dat, "PSDABG")) >> Computing DABG calls... OK >> [1] 269751 6 >>> eset <- rma(dat) >> Background correcting >> Normalizing >> Calculating Expression >>> dim(eset) >> Features Samples >> 41345 6 >> >> So the 'present' matrix that you are using above has something like 700K rows, whereas your expression set has around 50K rows. >> >> Let's back up a bit and clarify what you are doing, and how these arrays work. >> >> At the most basic level there is the probe, which is a 25-mer intended to measure the expression of a particular gene. Why they are 25-mers is beyond the scope of this email, but note that this is a ridiculously short chunk of DNA to reliably hybridize to its matching sequence. Affy knows this as well, so they use a set of these probes to measure a given transcript (or portion thereof). >> >> The next level is known in Affy parlance as the probe set region (PSR), which is usually made up of 4 probes (at least on the Exon arrays), but for the Gene ST arrays a given PSR may only contain one or two probes (note that the Gene ST arrays came out after the Exon ST arrays, and contain the more reliable probes that they used on the Exon arrays. The PSR comes from the Exon array format, and was originally intended to measure an exon, or a portion thereof.). >> >> The PSR level is also known as the 'probeset' level, which was the original summarization level for the Exon arrays. Since there are many (many!) probesets on the Gene ST arrays that have only one or two probes, I would argue against summarizing at this level. The next level is the transcript level (or 'core' level in oligo). At this level, all probes that interrogate any portion of a gene are combined together, and then summarized, to give one single measure for the expression level of that gene. This is IMO the only trustworthy level of summarization for these arrays. >> >> When you do paCalls with method = 'DABG', what you are doing is comparing the intensity for each probe to some measure of background, and trying to decide if the given probe is binding transcript at a level that is arguably higher than background. If you use method = "PSDABG", then you are taking all the P/A calls from the 'DABG' level, combining them at the PSR (probeset) level, and seeing if in aggregate they appear to be binding transcript at a level that is arguably higher than background. In the oligo package, there is no way to compute P/A calls at the transcript (or core) level, so you cannot use P/A calls to filter probesets if you have summarized at the transcript level. >> >> What you have done with your code above is to compute P/A calls at the probe level, then made an index where you say at least five of the probes have to have a p < 0.01. This is fine, I suppose, but you cannot use that index vector for anything. In other words, the ExpressionSet called 'eset' contains the data after summarizing the original 700k probes into about 50k gene-level probesets. So you can't use the P/A calls for the individual probes to subset the ExpressionSet because there isn't a one-to-one correspondence between the two. >> >> If you are really wedded to the idea of using P/A calls to filter your data, you will have to switch to the xps package. Otherwise you will have to use some other way of filtering out low/unexpressed data. >> >> Best, >> >> Jim >> >> >> >> >> >> >> >> >> If I?m understanding this correctly, the ?index? object should be those probes whose X (p-value?) is less than 0.01? >> >> Thank you again for your advice! >> >> alison >> >> On Sep 4, 2014, at 2:05 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >> >>> HI Alison, >>> >>> When you do paCalls with method = "PSDABG", you are getting P/A calls at the 'probeset' level (which in Affymetrix terms is the probe set region or PSR, which roughly corresponds to exon-level). But when you do rma with target = "core", you are summarizing data at the 'core' or transcript level (e.g., at this level all the PSRs that interrogate a given gene are summarized together). >>> >>> So you cannot use oligo to generate P/A calls at the transcript level. It's my understanding that you can do this with xps, but I am unfamiliar with that package. However, a quick search of the BioC list using 'stratowa pacalls' got as the first hit this message: https://stat.ethz.ch/pipermail/bioconductor/2014-August/060977.html >>> >>> Best, >>> >>> Jim >>> >>> >>> On Thu, Sep 4, 2014 at 1:36 PM, Alison Ziesel <aziesel at="" emory.edu=""> wrote: >>> Hello all, >>> >>> I?ve got very basic familiarity with the oligo package, but I?ve ran into a bit of trouble with the paCalls function. I?d like to restrict the probes in my eSet object to only those called present. Here?s what I?ve done thus far: >>> >>>> library(?oligo?) >>>> library("pd.hugene.2.0.st?) >>>> allchips <- read.celfiles(?my list of cel files?) >>> >>>> allchipspa <-paCalls(allchips, method=?PSDABG?, verbose=TRUE) >>>> present <- rownames(allchipspa) >>>> allchips_present <- exprs(allchips[present,]) >>> >>> Error in exprs(allchips[present, ]) : >>> error in evaluating the argument 'object' in selecting a method for function 'exprs': Error in orig[[nm]][i, , ..., drop = drop] : subscript out of bounds >>> >>> Clearly my last line doesn?t do what I want it to do: filter the object allchips versus the list of present probes in the object present. I suppose at this point I should also confirm that my object allchipspa is just the present probes! What?s the correct way to perform this filter step? I?d be grateful for any insight you may have. >>> >>>> sessionInfo() >>> R version 3.1.1 (2014-07-10) >>> Platform: x86_64-apple-darwin13.1.0 (64-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] pd.hugene.2.0.st_3.8.1 RSQLite_0.11.4 DBI_0.2-7 oligo_1.28.2 >>> [5] Biostrings_2.32.1 XVector_0.4.0 IRanges_1.22.10 Biobase_2.24.0 >>> [9] oligoClasses_1.26.0 BiocGenerics_0.10.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affxparser_1.36.0 affyio_1.32.0 BiocInstaller_1.14.2 bit_1.1-12 >>> [5] codetools_0.2-8 ff_2.2-13 foreach_1.4.2 GenomeInfoDb_1.0.2 >>> [9] GenomicRanges_1.16.4 iterators_1.0.7 preprocessCore_1.26.1 splines_3.1.1 >>> [13] stats4_3.1.1 tools_3.1.1 zlibbioc_1.10.0 >>> >>> alison >>> >>> Alison Ziesel >>> Department of Ophthalmology, Emory University >>> aziesel at emory.edu >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >> >> >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6