filtering probes in affymetrix data
1
1
Entering edit mode
@sabet-julia-a-6404
Last seen 10.2 years ago
Hello all, I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: librarypd.mogene.2.0.st) eset <- rma(affyRaw) and added gene annotation and I am following the limma user's guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? Thank you! Julia Sabet [[alternative HTML version deleted]]
Annotation limma Annotation limma • 4.6k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Hi Julia, There are several different things you can do. I'll show you one possibility. First, note that there are multiple different control probes on this array that aren't intended to measure differential expression, and should be excluded. So first let's look at the possible types of probesets: > librarypd.mogene.2.0.st) > con <- dbpd.mogene.2.0.st) > dbGetQuery(con, "select * from type_dict;") type type_id 1 1 main 2 2 control->affx 3 3 control->chip 4 4 control->bgp->antigenomic 5 5 control->bgp->genomic 6 6 normgene->exon 7 7 normgene->intron 8 8 rescue->FLmRNA->unmapped 9 9 control->affx->bac_spike 10 10 oligo_spike_in 11 11 r1_bac_spike_at These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: > table(dbGetQuery(con, "select type from featureSet;")[,1]) 1 2 4 7 9 263551 18 23 5331 18 So we only have these probeset types: 1 1 main 2 2 control->affx 4 4 control->bgp->antigenomic 7 7 normgene->intron 9 9 control->affx->bac_spike And the 'main' probesets are those that we want to use for differential expression. Now one thing you could do is to say that the antigenomic probesets should give a good measure of background, as they are supposed to have sequences that don't exist in mice. So you could just extract those probesets, get some measure and use that as the lower limit of what you think is expressed or not. That's pretty naive, as a probe with higher GC content will have higher background than one with a lower GC content, but worrying about that is way beyond what I am prepared to go into. Now we can get the probeset IDs for the antigenomic probesets antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join featureSet on core_mps.fsetid=featureSet.fsetid where featureSet.type='4';") And then extract those probesets and get a summary statistic. bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, probs=0.95) Which will give us the 95th percentile of these background probes. You could then use the kOverA function in genefilter to filter out any probesets where all samples are below the background values. The idea being that you want to filter out any probesets unless k samples have expression levels >= A. So if you have 10 samples, where 5 are controls and 5 are treated, you would do something like minval <- max(bkg) ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- eset[ind,] You should also filter out all the non-main probesets. You can do that using getMainProbes() in the affycoretools package eset.filt <- getMainProbes(eset.filt) Best, Jim On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: > Hello all, > I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: > > librarypd.mogene.2.0.st) > eset <- rma(affyRaw) > > and added gene annotation and I am following the limma user's guide, which recommends removing "probes that appear not be > expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? > > Thank you! > Julia Sabet > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: eset.filt <- getMainProbes(eset.filt) This error resulted: Error: could not find function "getMainProbes" What should I do? Thanks! Julia -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Thursday, February 13, 2014 9:36 AM To: Sabet, Julia A Cc: bioconductor at r-project.org Subject: Re: [BioC] filtering probes in affymetrix data Hi Julia, There are several different things you can do. I'll show you one possibility. First, note that there are multiple different control probes on this array that aren't intended to measure differential expression, and should be excluded. So first let's look at the possible types of probesets: > librarypd.mogene.2.0.st) > con <- dbpd.mogene.2.0.st) > dbGetQuery(con, "select * from type_dict;") type type_id 1 1 main 2 2 control->affx 3 3 control->chip 4 4 control->bgp->antigenomic 5 5 control->bgp->genomic 6 6 normgene->exon 7 7 normgene->intron 8 8 rescue->FLmRNA->unmapped 9 9 control->affx->bac_spike 10 10 oligo_spike_in 11 11 r1_bac_spike_at These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: > table(dbGetQuery(con, "select type from featureSet;")[,1]) 1 2 4 7 9 263551 18 23 5331 18 So we only have these probeset types: 1 1 main 2 2 control->affx 4 4 control->bgp->antigenomic 7 7 normgene->intron 9 9 control->affx->bac_spike And the 'main' probesets are those that we want to use for differential expression. Now one thing you could do is to say that the antigenomic probesets should give a good measure of background, as they are supposed to have sequences that don't exist in mice. So you could just extract those probesets, get some measure and use that as the lower limit of what you think is expressed or not. That's pretty naive, as a probe with higher GC content will have higher background than one with a lower GC content, but worrying about that is way beyond what I am prepared to go into. Now we can get the probeset IDs for the antigenomic probesets antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join featureSet on core_mps.fsetid=featureSet.fsetid where featureSet.type='4';") And then extract those probesets and get a summary statistic. bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, probs=0.95) Which will give us the 95th percentile of these background probes. You could then use the kOverA function in genefilter to filter out any probesets where all samples are below the background values. The idea being that you want to filter out any probesets unless k samples have expression levels >= A. So if you have 10 samples, where 5 are controls and 5 are treated, you would do something like minval <- max(bkg) ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- eset[ind,] You should also filter out all the non-main probesets. You can do that using getMainProbes() in the affycoretools package eset.filt <- getMainProbes(eset.filt) Best, Jim On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: > Hello all, > I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: > > librarypd.mogene.2.0.st) > eset <- rma(affyRaw) > > and added gene annotation and I am following the limma user's guide, which recommends removing "probes that appear not be > expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? > > Thank you! > Julia Sabet > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi Julia, You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). Best, Jim On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: > Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: > eset.filt <- getMainProbes(eset.filt) > > This error resulted: > Error: could not find function "getMainProbes" > > What should I do? > Thanks! > Julia > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 9:36 AM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > Hi Julia, > > There are several different things you can do. I'll show you one possibility. > > First, note that there are multiple different control probes on this array that aren't intended to measure differential expression, and should be excluded. So first let's look at the possible types of > probesets: > >> librarypd.mogene.2.0.st) >> con <- dbpd.mogene.2.0.st) >> dbGetQuery(con, "select * from type_dict;") > type type_id > 1 1 main > 2 2 control->affx > 3 3 control->chip > 4 4 control->bgp->antigenomic > 5 5 control->bgp->genomic > 6 6 normgene->exon > 7 7 normgene->intron > 8 8 rescue->FLmRNA->unmapped > 9 9 control->affx->bac_spike > 10 10 oligo_spike_in > 11 11 r1_bac_spike_at > > These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: > > >> table(dbGetQuery(con, "select type from featureSet;")[,1]) > > 1 2 4 7 9 > 263551 18 23 5331 18 > > So we only have these probeset types: > > 1 1 main > 2 2 control->affx > 4 4 control->bgp->antigenomic > 7 7 normgene->intron > 9 9 control->affx->bac_spike > > And the 'main' probesets are those that we want to use for differential > expression. Now one thing you could do is to say that the antigenomic > probesets should give a good measure of background, as they are > supposed to have sequences that don't exist in mice. So you could just > extract those probesets, get some measure and use that as the lower > limit of what you think is expressed or not. That's pretty naive, as a > probe with higher GC content will have higher background than one with > a lower GC content, but worrying about that is way beyond what I am > prepared to go into. > > Now we can get the probeset IDs for the antigenomic probesets > > antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join > featureSet on core_mps.fsetid=featureSet.fsetid where > featureSet.type='4';") > > And then extract those probesets and get a summary statistic. > > bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, > probs=0.95) > > Which will give us the 95th percentile of these background probes. You > could then use the kOverA function in genefilter to filter out any > probesets where all samples are below the background values. The idea > being that you want to filter out any probesets unless k samples have > expression levels >= A. So if you have 10 samples, where 5 are controls > and 5 are treated, you would do something like > > minval <- max(bkg) > ind <- genefilter(eset, filterfun(kOverA(5, minval))) > eset.filt <- eset[ind,] > > You should also filter out all the non-main probesets. You can do that > using getMainProbes() in the affycoretools package > > eset.filt <- getMainProbes(eset.filt) > > Best, > > Jim > > > > > On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >> Hello all, >> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >> >> librarypd.mogene.2.0.st) >> eset <- rma(affyRaw) >> >> and added gene annotation and I am following the limma user's guide, which recommends removing "probes that appear not be >> expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >> >> Thank you! >> Julia Sabet >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: librarypd.mogene.2.0.st) > con <- dbpd.mogene.2.0.st) > dbGetQuery(con, "select * from type_dict;") type type_id 1 1 main 2 2 control->affx 3 3 control->chip 4 4 control->bgp->antigenomic 5 5 control->bgp->genomic 6 6 normgene->exon 7 7 normgene->intron 8 8 rescue->FLmRNA->unmapped 9 9 control->affx->bac_spike 10 10 oligo_spike_in 11 11 r1_bac_spike_at > table(dbGetQuery(con, "select type from featureSet;")[,1]) 1 2 4 7 9 263551 18 23 5331 18 > antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join + featureSet on core_mps.fsetid=featureSet.fsetid where + featureSet.type='4';") > bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, + probs=0.95) > library(genefilter) > minval <- max(bkg) > ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" > ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- eset[ind,] Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" > ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" > Here is my sessionInfo() output: R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 > I appreciate your help... Julia -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Thursday, February 13, 2014 12:08 PM To: Sabet, Julia A Cc: bioconductor at r-project.org Subject: Re: [BioC] filtering probes in affymetrix data Hi Julia, You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). Best, Jim On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: > Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: > eset.filt <- getMainProbes(eset.filt) > > This error resulted: > Error: could not find function "getMainProbes" > > What should I do? > Thanks! > Julia > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 9:36 AM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > Hi Julia, > > There are several different things you can do. I'll show you one possibility. > > First, note that there are multiple different control probes on this > array that aren't intended to measure differential expression, and > should be excluded. So first let's look at the possible types of > probesets: > >> librarypd.mogene.2.0.st) >> con <- dbpd.mogene.2.0.st) >> dbGetQuery(con, "select * from type_dict;") > type type_id > 1 1 main > 2 2 control->affx > 3 3 control->chip > 4 4 control->bgp->antigenomic > 5 5 control->bgp->genomic > 6 6 normgene->exon > 7 7 normgene->intron > 8 8 rescue->FLmRNA->unmapped > 9 9 control->affx->bac_spike > 10 10 oligo_spike_in > 11 11 r1_bac_spike_at > > These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: > > >> table(dbGetQuery(con, "select type from featureSet;")[,1]) > > 1 2 4 7 9 > 263551 18 23 5331 18 > > So we only have these probeset types: > > 1 1 main > 2 2 control->affx > 4 4 control->bgp->antigenomic > 7 7 normgene->intron > 9 9 control->affx->bac_spike > > And the 'main' probesets are those that we want to use for > differential expression. Now one thing you could do is to say that the > antigenomic probesets should give a good measure of background, as > they are supposed to have sequences that don't exist in mice. So you > could just extract those probesets, get some measure and use that as > the lower limit of what you think is expressed or not. That's pretty > naive, as a probe with higher GC content will have higher background > than one with a lower GC content, but worrying about that is way > beyond what I am prepared to go into. > > Now we can get the probeset IDs for the antigenomic probesets > > antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join > featureSet on core_mps.fsetid=featureSet.fsetid where > featureSet.type='4';") > > And then extract those probesets and get a summary statistic. > > bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, > probs=0.95) > > Which will give us the 95th percentile of these background probes. You > could then use the kOverA function in genefilter to filter out any > probesets where all samples are below the background values. The idea > being that you want to filter out any probesets unless k samples have > expression levels >= A. So if you have 10 samples, where 5 are > controls and 5 are treated, you would do something like > > minval <- max(bkg) > ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- > eset[ind,] > > You should also filter out all the non-main probesets. You can do that > using getMainProbes() in the affycoretools package > > eset.filt <- getMainProbes(eset.filt) > > Best, > > Jim > > > > > On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >> Hello all, >> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >> >> librarypd.mogene.2.0.st) >> eset <- rma(affyRaw) >> >> and added gene annotation and I am following the limma user's guide, >> which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >> >> Thank you! >> Julia Sabet >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
On Thu, Feb 13, 2014 at 3:23 PM, Sabet, Julia A <julia.sabet@tufts.edu>wrote: > Thank you Jim. I think my R version is up to date and I am making sure to > use "library()". I started the whole thing over and now I have this new > error message, at an earlier step: > > librarypd.mogene.2.0.st) > > con <- dbpd.mogene.2.0.st) > > dbGetQuery(con, "select * from type_dict;") > type type_id > 1 1 main > 2 2 control->affx > 3 3 control->chip > 4 4 control->bgp->antigenomic > 5 5 control->bgp->genomic > 6 6 normgene->exon > 7 7 normgene->intron > 8 8 rescue->FLmRNA->unmapped > 9 9 control->affx->bac_spike > 10 10 oligo_spike_in > 11 11 r1_bac_spike_at > > table(dbGetQuery(con, "select type from featureSet;")[,1]) > > 1 2 4 7 9 > 263551 18 23 5331 18 > > antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join > + featureSet on core_mps.fsetid=featureSet.fsetid where > + featureSet.type='4';") > > bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, > + probs=0.95) > > library(genefilter) > > minval <- max(bkg) > > ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- > Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, > minval))) eset.filt" > > ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- > eset[ind,] > Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, > minval))) eset.filt" > > ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- > Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, > minval))) eset.filt" > > > Hi, Julia. These errors are due to the fact that the command you typed is not syntactically correct. It looks like you might be cutting-and-pasting two lines and they are ending up together. My guess is that you meant to do something like: ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt = eset[ind,] Hope that helps. Sean > > Here is my sessionInfo() output: > > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] BiocInstaller_1.12.0 genefilter_1.44.0 > mogene20sttranscriptcluster.db_2.13.0 > [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 > pd.mogene.2.0.st_2.12.0 > [7] RSQLite_0.11.4 DBI_0.2-7 > oligo_1.26.1 > [10] Biostrings_2.30.1 XVector_0.2.0 > IRanges_1.20.6 > [13] Biobase_2.22.0 oligoClasses_1.24.0 > BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 > bit_1.1-11 codetools_0.2-8 ff_2.2-12 > [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 > preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 > [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 > xtable_1.7-1 zlibbioc_1.8.0 > > > > I appreciate your help... > Julia > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon@uw.edu] > Sent: Thursday, February 13, 2014 12:08 PM > To: Sabet, Julia A > Cc: bioconductor@r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > Hi Julia, > > You should always include the output from sessionInfo() with any > questions, so we can see what versions you are running, and what you have > loaded. > > My guess is you are using an old version of R, prior to the introduction > of that function, or you forgot to do library(affycoretools). > > Best, > > Jim > > On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: > > Thank you so much, Jim. I did everything you recommended and everything > seemed to be working and then I installed the affycoretools package and > when I did: > > eset.filt <- getMainProbes(eset.filt) > > > > This error resulted: > > Error: could not find function "getMainProbes" > > > > What should I do? > > Thanks! > > Julia > > > > > > -----Original Message----- > > From: James W. MacDonald [mailto:jmacdon@uw.edu] > > Sent: Thursday, February 13, 2014 9:36 AM > > To: Sabet, Julia A > > Cc: bioconductor@r-project.org > > Subject: Re: [BioC] filtering probes in affymetrix data > > > > Hi Julia, > > > > There are several different things you can do. I'll show you one > possibility. > > > > First, note that there are multiple different control probes on this > > array that aren't intended to measure differential expression, and > > should be excluded. So first let's look at the possible types of > > probesets: > > > >> librarypd.mogene.2.0.st) > >> con <- dbpd.mogene.2.0.st) > >> dbGetQuery(con, "select * from type_dict;") > > type type_id > > 1 1 main > > 2 2 control->affx > > 3 3 control->chip > > 4 4 control->bgp->antigenomic > > 5 5 control->bgp->genomic > > 6 6 normgene->exon > > 7 7 normgene->intron > > 8 8 rescue->FLmRNA->unmapped > > 9 9 control->affx->bac_spike > > 10 10 oligo_spike_in > > 11 11 r1_bac_spike_at > > > > These are all the possible types of probesets, but we don't have all of > them on this array. To see which ones we do have we can do: > > > > > >> table(dbGetQuery(con, "select type from featureSet;")[,1]) > > > > 1 2 4 7 9 > > 263551 18 23 5331 18 > > > > So we only have these probeset types: > > > > 1 1 main > > 2 2 control->affx > > 4 4 control->bgp->antigenomic > > 7 7 normgene->intron > > 9 9 control->affx->bac_spike > > > > And the 'main' probesets are those that we want to use for > > differential expression. Now one thing you could do is to say that the > > antigenomic probesets should give a good measure of background, as > > they are supposed to have sequences that don't exist in mice. So you > > could just extract those probesets, get some measure and use that as > > the lower limit of what you think is expressed or not. That's pretty > > naive, as a probe with higher GC content will have higher background > > than one with a lower GC content, but worrying about that is way > > beyond what I am prepared to go into. > > > > Now we can get the probeset IDs for the antigenomic probesets > > > > antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join > > featureSet on core_mps.fsetid=featureSet.fsetid where > > featureSet.type='4';") > > > > And then extract those probesets and get a summary statistic. > > > > bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, > > probs=0.95) > > > > Which will give us the 95th percentile of these background probes. You > > could then use the kOverA function in genefilter to filter out any > > probesets where all samples are below the background values. The idea > > being that you want to filter out any probesets unless k samples have > > expression levels >= A. So if you have 10 samples, where 5 are > > controls and 5 are treated, you would do something like > > > > minval <- max(bkg) > > ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- > > eset[ind,] > > > > You should also filter out all the non-main probesets. You can do that > > using getMainProbes() in the affycoretools package > > > > eset.filt <- getMainProbes(eset.filt) > > > > Best, > > > > Jim > > > > > > > > > > On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: > >> Hello all, > >> I am totally new to R/Bioconductor and have begun processing data from > my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: > >> > >> librarypd.mogene.2.0.st) > >> eset <- rma(affyRaw) > >> > >> and added gene annotation and I am following the limma user's guide, > >> which recommends removing "probes that appear not be expressed in any > of the experimental conditions." I have read on previous posts that > filtering may not be necessary. Should I filter, and if so, how? Using > what code? > >> > >> Thank you! > >> Julia Sabet > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Julia, On 2/13/2014 3:23 PM, Sabet, Julia A wrote: > Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: > > librarypd.mogene.2.0.st) >> con <- dbpd.mogene.2.0.st) >> dbGetQuery(con, "select * from type_dict;") > type type_id > 1 1 main > 2 2 control->affx > 3 3 control->chip > 4 4 control->bgp->antigenomic > 5 5 control->bgp->genomic > 6 6 normgene->exon > 7 7 normgene->intron > 8 8 rescue->FLmRNA->unmapped > 9 9 control->affx->bac_spike > 10 10 oligo_spike_in > 11 11 r1_bac_spike_at >> table(dbGetQuery(con, "select type from featureSet;")[,1]) > 1 2 4 7 9 > 263551 18 23 5331 18 >> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join > + featureSet on core_mps.fsetid=featureSet.fsetid where > + featureSet.type='4';") >> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, > + probs=0.95) >> library(genefilter) >> minval <- max(bkg) >> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- The above line has a bit extra at the end that R doesn't like. > Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" And this is your hint. Error messages are your friends. Best, Jim >> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- eset[ind,] > Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- > Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" > Here is my sessionInfo() output: > > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 > [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 > [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 > [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 > [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 > [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 > [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 > I appreciate your help... > Julia > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 12:08 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > Hi Julia, > > You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. > > My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). > > Best, > > Jim > > On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >> eset.filt <- getMainProbes(eset.filt) >> >> This error resulted: >> Error: could not find function "getMainProbes" >> >> What should I do? >> Thanks! >> Julia >> >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 9:36 AM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> Hi Julia, >> >> There are several different things you can do. I'll show you one possibility. >> >> First, note that there are multiple different control probes on this >> array that aren't intended to measure differential expression, and >> should be excluded. So first let's look at the possible types of >> probesets: >> >>> librarypd.mogene.2.0.st) >>> con <- dbpd.mogene.2.0.st) >>> dbGetQuery(con, "select * from type_dict;") >> type type_id >> 1 1 main >> 2 2 control->affx >> 3 3 control->chip >> 4 4 control->bgp->antigenomic >> 5 5 control->bgp->genomic >> 6 6 normgene->exon >> 7 7 normgene->intron >> 8 8 rescue->FLmRNA->unmapped >> 9 9 control->affx->bac_spike >> 10 10 oligo_spike_in >> 11 11 r1_bac_spike_at >> >> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >> >> >>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >> 1 2 4 7 9 >> 263551 18 23 5331 18 >> >> So we only have these probeset types: >> >> 1 1 main >> 2 2 control->affx >> 4 4 control->bgp->antigenomic >> 7 7 normgene->intron >> 9 9 control->affx->bac_spike >> >> And the 'main' probesets are those that we want to use for >> differential expression. Now one thing you could do is to say that the >> antigenomic probesets should give a good measure of background, as >> they are supposed to have sequences that don't exist in mice. So you >> could just extract those probesets, get some measure and use that as >> the lower limit of what you think is expressed or not. That's pretty >> naive, as a probe with higher GC content will have higher background >> than one with a lower GC content, but worrying about that is way >> beyond what I am prepared to go into. >> >> Now we can get the probeset IDs for the antigenomic probesets >> >> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join >> featureSet on core_mps.fsetid=featureSet.fsetid where >> featureSet.type='4';") >> >> And then extract those probesets and get a summary statistic. >> >> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >> probs=0.95) >> >> Which will give us the 95th percentile of these background probes. You >> could then use the kOverA function in genefilter to filter out any >> probesets where all samples are below the background values. The idea >> being that you want to filter out any probesets unless k samples have >> expression levels >= A. So if you have 10 samples, where 5 are >> controls and 5 are treated, you would do something like >> >> minval <- max(bkg) >> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >> eset[ind,] >> >> You should also filter out all the non-main probesets. You can do that >> using getMainProbes() in the affycoretools package >> >> eset.filt <- getMainProbes(eset.filt) >> >> Best, >> >> Jim >> >> >> >> >> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>> Hello all, >>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>> >>> librarypd.mogene.2.0.st) >>> eset <- rma(affyRaw) >>> >>> and added gene annotation and I am following the limma user's guide, >>> which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>> >>> Thank you! >>> Julia Sabet >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: > eset.filt <- getMainProbes(eset.filt) Error in orig[[nm]][i, , ..., drop = drop] : (subscript) logical subscript too long -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Thursday, February 13, 2014 3:44 PM To: Sabet, Julia A Cc: bioconductor at r-project.org Subject: Re: [BioC] filtering probes in affymetrix data Hi Julia, On 2/13/2014 3:23 PM, Sabet, Julia A wrote: > Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: > > librarypd.mogene.2.0.st) >> con <- dbpd.mogene.2.0.st) >> dbGetQuery(con, "select * from type_dict;") > type type_id > 1 1 main > 2 2 control->affx > 3 3 control->chip > 4 4 control->bgp->antigenomic > 5 5 control->bgp->genomic > 6 6 normgene->exon > 7 7 normgene->intron > 8 8 rescue->FLmRNA->unmapped > 9 9 control->affx->bac_spike > 10 10 oligo_spike_in > 11 11 r1_bac_spike_at >> table(dbGetQuery(con, "select type from featureSet;")[,1]) > 1 2 4 7 9 > 263551 18 23 5331 18 >> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >> join > + featureSet on core_mps.fsetid=featureSet.fsetid where > + featureSet.type='4';") >> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, > + probs=0.95) >> library(genefilter) >> minval <- max(bkg) >> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- The above line has a bit extra at the end that R doesn't like. > Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" And this is your hint. Error messages are your friends. Best, Jim >> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >> eset[ind,] > Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- > Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" > Here is my sessionInfo() output: > > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 > [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 > [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 > [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 > [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 > [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 > [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 > I appreciate your help... > Julia > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 12:08 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > Hi Julia, > > You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. > > My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). > > Best, > > Jim > > On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >> eset.filt <- getMainProbes(eset.filt) >> >> This error resulted: >> Error: could not find function "getMainProbes" >> >> What should I do? >> Thanks! >> Julia >> >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 9:36 AM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> Hi Julia, >> >> There are several different things you can do. I'll show you one possibility. >> >> First, note that there are multiple different control probes on this >> array that aren't intended to measure differential expression, and >> should be excluded. So first let's look at the possible types of >> probesets: >> >>> librarypd.mogene.2.0.st) >>> con <- dbpd.mogene.2.0.st) >>> dbGetQuery(con, "select * from type_dict;") >> type type_id >> 1 1 main >> 2 2 control->affx >> 3 3 control->chip >> 4 4 control->bgp->antigenomic >> 5 5 control->bgp->genomic >> 6 6 normgene->exon >> 7 7 normgene->intron >> 8 8 rescue->FLmRNA->unmapped >> 9 9 control->affx->bac_spike >> 10 10 oligo_spike_in >> 11 11 r1_bac_spike_at >> >> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >> >> >>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >> 1 2 4 7 9 >> 263551 18 23 5331 18 >> >> So we only have these probeset types: >> >> 1 1 main >> 2 2 control->affx >> 4 4 control->bgp->antigenomic >> 7 7 normgene->intron >> 9 9 control->affx->bac_spike >> >> And the 'main' probesets are those that we want to use for >> differential expression. Now one thing you could do is to say that >> the antigenomic probesets should give a good measure of background, >> as they are supposed to have sequences that don't exist in mice. So >> you could just extract those probesets, get some measure and use that >> as the lower limit of what you think is expressed or not. That's >> pretty naive, as a probe with higher GC content will have higher >> background than one with a lower GC content, but worrying about that >> is way beyond what I am prepared to go into. >> >> Now we can get the probeset IDs for the antigenomic probesets >> >> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >> join featureSet on core_mps.fsetid=featureSet.fsetid where >> featureSet.type='4';") >> >> And then extract those probesets and get a summary statistic. >> >> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >> probs=0.95) >> >> Which will give us the 95th percentile of these background probes. >> You could then use the kOverA function in genefilter to filter out >> any probesets where all samples are below the background values. The >> idea being that you want to filter out any probesets unless k samples >> have expression levels >= A. So if you have 10 samples, where 5 are >> controls and 5 are treated, you would do something like >> >> minval <- max(bkg) >> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >> eset[ind,] >> >> You should also filter out all the non-main probesets. You can do >> that using getMainProbes() in the affycoretools package >> >> eset.filt <- getMainProbes(eset.filt) >> >> Best, >> >> Jim >> >> >> >> >> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>> Hello all, >>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>> >>> librarypd.mogene.2.0.st) >>> eset <- rma(affyRaw) >>> >>> and added gene annotation and I am following the limma user's guide, >>> which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>> >>> Thank you! >>> Julia Sabet >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
What do you get when you run traceback() right after that error? Jim On 2/13/2014 3:51 PM, Sabet, Julia A wrote: > Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: > >> eset.filt <- getMainProbes(eset.filt) > Error in orig[[nm]][i, , ..., drop = drop] : > (subscript) logical subscript too long > > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 3:44 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > Hi Julia, > > On 2/13/2014 3:23 PM, Sabet, Julia A wrote: >> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: >> >> librarypd.mogene.2.0.st) >>> con <- dbpd.mogene.2.0.st) >>> dbGetQuery(con, "select * from type_dict;") >> type type_id >> 1 1 main >> 2 2 control->affx >> 3 3 control->chip >> 4 4 control->bgp->antigenomic >> 5 5 control->bgp->genomic >> 6 6 normgene->exon >> 7 7 normgene->intron >> 8 8 rescue->FLmRNA->unmapped >> 9 9 control->affx->bac_spike >> 10 10 oligo_spike_in >> 11 11 r1_bac_spike_at >>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >> 1 2 4 7 9 >> 263551 18 23 5331 18 >>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>> join >> + featureSet on core_mps.fsetid=featureSet.fsetid where >> + featureSet.type='4';") >>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >> + probs=0.95) >>> library(genefilter) >>> minval <- max(bkg) >>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- > The above line has a bit extra at the end that R doesn't like. > >> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" > And this is your hint. Error messages are your friends. > > Best, > > Jim > > >>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >>> eset[ind,] >> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >> Here is my sessionInfo() output: >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 >> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 >> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 >> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 >> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 >> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 >> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 >> I appreciate your help... >> Julia >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 12:08 PM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> Hi Julia, >> >> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. >> >> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). >> >> Best, >> >> Jim >> >> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >>> eset.filt <- getMainProbes(eset.filt) >>> >>> This error resulted: >>> Error: could not find function "getMainProbes" >>> >>> What should I do? >>> Thanks! >>> Julia >>> >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: Thursday, February 13, 2014 9:36 AM >>> To: Sabet, Julia A >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] filtering probes in affymetrix data >>> >>> Hi Julia, >>> >>> There are several different things you can do. I'll show you one possibility. >>> >>> First, note that there are multiple different control probes on this >>> array that aren't intended to measure differential expression, and >>> should be excluded. So first let's look at the possible types of >>> probesets: >>> >>>> librarypd.mogene.2.0.st) >>>> con <- dbpd.mogene.2.0.st) >>>> dbGetQuery(con, "select * from type_dict;") >>> type type_id >>> 1 1 main >>> 2 2 control->affx >>> 3 3 control->chip >>> 4 4 control->bgp->antigenomic >>> 5 5 control->bgp->genomic >>> 6 6 normgene->exon >>> 7 7 normgene->intron >>> 8 8 rescue->FLmRNA->unmapped >>> 9 9 control->affx->bac_spike >>> 10 10 oligo_spike_in >>> 11 11 r1_bac_spike_at >>> >>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >>> >>> >>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>> 1 2 4 7 9 >>> 263551 18 23 5331 18 >>> >>> So we only have these probeset types: >>> >>> 1 1 main >>> 2 2 control->affx >>> 4 4 control->bgp->antigenomic >>> 7 7 normgene->intron >>> 9 9 control->affx->bac_spike >>> >>> And the 'main' probesets are those that we want to use for >>> differential expression. Now one thing you could do is to say that >>> the antigenomic probesets should give a good measure of background, >>> as they are supposed to have sequences that don't exist in mice. So >>> you could just extract those probesets, get some measure and use that >>> as the lower limit of what you think is expressed or not. That's >>> pretty naive, as a probe with higher GC content will have higher >>> background than one with a lower GC content, but worrying about that >>> is way beyond what I am prepared to go into. >>> >>> Now we can get the probeset IDs for the antigenomic probesets >>> >>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>> join featureSet on core_mps.fsetid=featureSet.fsetid where >>> featureSet.type='4';") >>> >>> And then extract those probesets and get a summary statistic. >>> >>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>> probs=0.95) >>> >>> Which will give us the 95th percentile of these background probes. >>> You could then use the kOverA function in genefilter to filter out >>> any probesets where all samples are below the background values. The >>> idea being that you want to filter out any probesets unless k samples >>> have expression levels >= A. So if you have 10 samples, where 5 are >>> controls and 5 are treated, you would do something like >>> >>> minval <- max(bkg) >>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>> eset[ind,] >>> >>> You should also filter out all the non-main probesets. You can do >>> that using getMainProbes() in the affycoretools package >>> >>> eset.filt <- getMainProbes(eset.filt) >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> >>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>>> Hello all, >>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>>> >>>> librarypd.mogene.2.0.st) >>>> eset <- rma(affyRaw) >>>> >>>> and added gene annotation and I am following the limma user's guide, >>>> which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>>> >>>> Thank you! >>>> Julia Sabet >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
I get this: > traceback() 3: input[ind, ] 2: input[ind, ] 1: getMainProbes(eset.filt) > -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Thursday, February 13, 2014 3:57 PM To: Sabet, Julia A Cc: bioconductor at r-project.org Subject: Re: [BioC] filtering probes in affymetrix data What do you get when you run traceback() right after that error? Jim On 2/13/2014 3:51 PM, Sabet, Julia A wrote: > Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: > >> eset.filt <- getMainProbes(eset.filt) > Error in orig[[nm]][i, , ..., drop = drop] : > (subscript) logical subscript too long > > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 3:44 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > Hi Julia, > > On 2/13/2014 3:23 PM, Sabet, Julia A wrote: >> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: >> >> librarypd.mogene.2.0.st) >>> con <- dbpd.mogene.2.0.st) >>> dbGetQuery(con, "select * from type_dict;") >> type type_id >> 1 1 main >> 2 2 control->affx >> 3 3 control->chip >> 4 4 control->bgp->antigenomic >> 5 5 control->bgp->genomic >> 6 6 normgene->exon >> 7 7 normgene->intron >> 8 8 rescue->FLmRNA->unmapped >> 9 9 control->affx->bac_spike >> 10 10 oligo_spike_in >> 11 11 r1_bac_spike_at >>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >> 1 2 4 7 9 >> 263551 18 23 5331 18 >>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>> join >> + featureSet on core_mps.fsetid=featureSet.fsetid where >> + featureSet.type='4';") >>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >> + probs=0.95) >>> library(genefilter) >>> minval <- max(bkg) >>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- > The above line has a bit extra at the end that R doesn't like. > >> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" > And this is your hint. Error messages are your friends. > > Best, > > Jim > > >>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >>> eset[ind,] >> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >> Here is my sessionInfo() output: >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 >> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 >> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 >> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 >> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 >> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 >> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 >> I appreciate your help... >> Julia >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 12:08 PM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> Hi Julia, >> >> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. >> >> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). >> >> Best, >> >> Jim >> >> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >>> eset.filt <- getMainProbes(eset.filt) >>> >>> This error resulted: >>> Error: could not find function "getMainProbes" >>> >>> What should I do? >>> Thanks! >>> Julia >>> >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: Thursday, February 13, 2014 9:36 AM >>> To: Sabet, Julia A >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] filtering probes in affymetrix data >>> >>> Hi Julia, >>> >>> There are several different things you can do. I'll show you one possibility. >>> >>> First, note that there are multiple different control probes on this >>> array that aren't intended to measure differential expression, and >>> should be excluded. So first let's look at the possible types of >>> probesets: >>> >>>> librarypd.mogene.2.0.st) >>>> con <- dbpd.mogene.2.0.st) >>>> dbGetQuery(con, "select * from type_dict;") >>> type type_id >>> 1 1 main >>> 2 2 control->affx >>> 3 3 control->chip >>> 4 4 control->bgp->antigenomic >>> 5 5 control->bgp->genomic >>> 6 6 normgene->exon >>> 7 7 normgene->intron >>> 8 8 rescue->FLmRNA->unmapped >>> 9 9 control->affx->bac_spike >>> 10 10 oligo_spike_in >>> 11 11 r1_bac_spike_at >>> >>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >>> >>> >>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>> 1 2 4 7 9 >>> 263551 18 23 5331 18 >>> >>> So we only have these probeset types: >>> >>> 1 1 main >>> 2 2 control->affx >>> 4 4 control->bgp->antigenomic >>> 7 7 normgene->intron >>> 9 9 control->affx->bac_spike >>> >>> And the 'main' probesets are those that we want to use for >>> differential expression. Now one thing you could do is to say that >>> the antigenomic probesets should give a good measure of background, >>> as they are supposed to have sequences that don't exist in mice. So >>> you could just extract those probesets, get some measure and use >>> that as the lower limit of what you think is expressed or not. >>> That's pretty naive, as a probe with higher GC content will have >>> higher background than one with a lower GC content, but worrying >>> about that is way beyond what I am prepared to go into. >>> >>> Now we can get the probeset IDs for the antigenomic probesets >>> >>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>> join featureSet on core_mps.fsetid=featureSet.fsetid where >>> featureSet.type='4';") >>> >>> And then extract those probesets and get a summary statistic. >>> >>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>> probs=0.95) >>> >>> Which will give us the 95th percentile of these background probes. >>> You could then use the kOverA function in genefilter to filter out >>> any probesets where all samples are below the background values. The >>> idea being that you want to filter out any probesets unless k >>> samples have expression levels >= A. So if you have 10 samples, >>> where 5 are controls and 5 are treated, you would do something like >>> >>> minval <- max(bkg) >>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>> eset[ind,] >>> >>> You should also filter out all the non-main probesets. You can do >>> that using getMainProbes() in the affycoretools package >>> >>> eset.filt <- getMainProbes(eset.filt) >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> >>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>>> Hello all, >>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>>> >>>> librarypd.mogene.2.0.st) >>>> eset <- rma(affyRaw) >>>> >>>> and added gene annotation and I am following the limma user's >>>> guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>>> >>>> Thank you! >>>> Julia Sabet >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
OK, what do you get if you type getMainProbes at an R prompt? It shouldn't be subsetting the ExpressionSet twice like that. Jim On Thursday, February 13, 2014 3:59:38 PM, Sabet, Julia A wrote: > I get this: > >> traceback() > 3: input[ind, ] > 2: input[ind, ] > 1: getMainProbes(eset.filt) >> > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 3:57 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > What do you get when you run > > traceback() > > right after that error? > > Jim > > > On 2/13/2014 3:51 PM, Sabet, Julia A wrote: >> Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: >> >>> eset.filt <- getMainProbes(eset.filt) >> Error in orig[[nm]][i, , ..., drop = drop] : >> (subscript) logical subscript too long >> >> >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 3:44 PM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> Hi Julia, >> >> On 2/13/2014 3:23 PM, Sabet, Julia A wrote: >>> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: >>> >>> librarypd.mogene.2.0.st) >>>> con <- dbpd.mogene.2.0.st) >>>> dbGetQuery(con, "select * from type_dict;") >>> type type_id >>> 1 1 main >>> 2 2 control->affx >>> 3 3 control->chip >>> 4 4 control->bgp->antigenomic >>> 5 5 control->bgp->genomic >>> 6 6 normgene->exon >>> 7 7 normgene->intron >>> 8 8 rescue->FLmRNA->unmapped >>> 9 9 control->affx->bac_spike >>> 10 10 oligo_spike_in >>> 11 11 r1_bac_spike_at >>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>> 1 2 4 7 9 >>> 263551 18 23 5331 18 >>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>> join >>> + featureSet on core_mps.fsetid=featureSet.fsetid where >>> + featureSet.type='4';") >>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>> + probs=0.95) >>>> library(genefilter) >>>> minval <- max(bkg) >>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >> The above line has a bit extra at the end that R doesn't like. >> >>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" >> And this is your hint. Error messages are your friends. >> >> Best, >> >> Jim >> >> >>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >>>> eset[ind,] >>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>> Here is my sessionInfo() output: >>> >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 >>> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 >>> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 >>> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 >>> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 >>> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 >>> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 >>> I appreciate your help... >>> Julia >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: Thursday, February 13, 2014 12:08 PM >>> To: Sabet, Julia A >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] filtering probes in affymetrix data >>> >>> Hi Julia, >>> >>> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. >>> >>> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). >>> >>> Best, >>> >>> Jim >>> >>> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >>>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >>>> eset.filt <- getMainProbes(eset.filt) >>>> >>>> This error resulted: >>>> Error: could not find function "getMainProbes" >>>> >>>> What should I do? >>>> Thanks! >>>> Julia >>>> >>>> >>>> -----Original Message----- >>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>> Sent: Thursday, February 13, 2014 9:36 AM >>>> To: Sabet, Julia A >>>> Cc: bioconductor at r-project.org >>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>> >>>> Hi Julia, >>>> >>>> There are several different things you can do. I'll show you one possibility. >>>> >>>> First, note that there are multiple different control probes on this >>>> array that aren't intended to measure differential expression, and >>>> should be excluded. So first let's look at the possible types of >>>> probesets: >>>> >>>>> librarypd.mogene.2.0.st) >>>>> con <- dbpd.mogene.2.0.st) >>>>> dbGetQuery(con, "select * from type_dict;") >>>> type type_id >>>> 1 1 main >>>> 2 2 control->affx >>>> 3 3 control->chip >>>> 4 4 control->bgp->antigenomic >>>> 5 5 control->bgp->genomic >>>> 6 6 normgene->exon >>>> 7 7 normgene->intron >>>> 8 8 rescue->FLmRNA->unmapped >>>> 9 9 control->affx->bac_spike >>>> 10 10 oligo_spike_in >>>> 11 11 r1_bac_spike_at >>>> >>>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >>>> >>>> >>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>> 1 2 4 7 9 >>>> 263551 18 23 5331 18 >>>> >>>> So we only have these probeset types: >>>> >>>> 1 1 main >>>> 2 2 control->affx >>>> 4 4 control->bgp->antigenomic >>>> 7 7 normgene->intron >>>> 9 9 control->affx->bac_spike >>>> >>>> And the 'main' probesets are those that we want to use for >>>> differential expression. Now one thing you could do is to say that >>>> the antigenomic probesets should give a good measure of background, >>>> as they are supposed to have sequences that don't exist in mice. So >>>> you could just extract those probesets, get some measure and use >>>> that as the lower limit of what you think is expressed or not. >>>> That's pretty naive, as a probe with higher GC content will have >>>> higher background than one with a lower GC content, but worrying >>>> about that is way beyond what I am prepared to go into. >>>> >>>> Now we can get the probeset IDs for the antigenomic probesets >>>> >>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>> join featureSet on core_mps.fsetid=featureSet.fsetid where >>>> featureSet.type='4';") >>>> >>>> And then extract those probesets and get a summary statistic. >>>> >>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>> probs=0.95) >>>> >>>> Which will give us the 95th percentile of these background probes. >>>> You could then use the kOverA function in genefilter to filter out >>>> any probesets where all samples are below the background values. The >>>> idea being that you want to filter out any probesets unless k >>>> samples have expression levels >= A. So if you have 10 samples, >>>> where 5 are controls and 5 are treated, you would do something like >>>> >>>> minval <- max(bkg) >>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>>> eset[ind,] >>>> >>>> You should also filter out all the non-main probesets. You can do >>>> that using getMainProbes() in the affycoretools package >>>> >>>> eset.filt <- getMainProbes(eset.filt) >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>> >>>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>>>> Hello all, >>>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>>>> >>>>> librarypd.mogene.2.0.st) >>>>> eset <- rma(affyRaw) >>>>> >>>>> and added gene annotation and I am following the limma user's >>>>> guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>>>> >>>>> Thank you! >>>>> Julia Sabet >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
I get this: function (input) { if (is(input, "ExpressionSet")) { pdinfo <- annotation(input) if (length(grep("^pd", pdinfo)) != 1) stop(paste("The file", pdinfo, "does not appear to have been processed using", "the oligo package.\nIn this case the argument to this function should", "be the name of the correct pd.info package (e.g., pd.hugene.1.0.st.v1.\n"), call. = FALSE) } else { if (is.character(input)) pdinfo <- input else if (!is.character(input)) stop(paste("The input argument for this function should either be an ExpressionSet", "that was generated using oligo, or the name of the pd.info package", "that corresponds to your data.\n"), call. = FALSE) } require(pdinfo, character.only = TRUE) con <- db(get(pdinfo)) types <- dbGetQuery(con, paste("select distinct meta_fsetid, type from featureSet inner join core_mps", "on featureSet.fsetid=core_mps.fsetid;")) dbDisconnect(con) if (is(input, "ExpressionSet")) { types <- types[match(featureNames(eset), types[, 1]), ] ind <- types[, 2] %in% 1 return(input[ind, ]) } else return(types) } <environment: namespace:affycoretools=""> > -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Thursday, February 13, 2014 4:05 PM To: Sabet, Julia A Cc: bioconductor at r-project.org Subject: Re: [BioC] filtering probes in affymetrix data OK, what do you get if you type getMainProbes at an R prompt? It shouldn't be subsetting the ExpressionSet twice like that. Jim On Thursday, February 13, 2014 3:59:38 PM, Sabet, Julia A wrote: > I get this: > >> traceback() > 3: input[ind, ] > 2: input[ind, ] > 1: getMainProbes(eset.filt) >> > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 3:57 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > What do you get when you run > > traceback() > > right after that error? > > Jim > > > On 2/13/2014 3:51 PM, Sabet, Julia A wrote: >> Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: >> >>> eset.filt <- getMainProbes(eset.filt) >> Error in orig[[nm]][i, , ..., drop = drop] : >> (subscript) logical subscript too long >> >> >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 3:44 PM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> Hi Julia, >> >> On 2/13/2014 3:23 PM, Sabet, Julia A wrote: >>> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: >>> >>> librarypd.mogene.2.0.st) >>>> con <- dbpd.mogene.2.0.st) >>>> dbGetQuery(con, "select * from type_dict;") >>> type type_id >>> 1 1 main >>> 2 2 control->affx >>> 3 3 control->chip >>> 4 4 control->bgp->antigenomic >>> 5 5 control->bgp->genomic >>> 6 6 normgene->exon >>> 7 7 normgene->intron >>> 8 8 rescue->FLmRNA->unmapped >>> 9 9 control->affx->bac_spike >>> 10 10 oligo_spike_in >>> 11 11 r1_bac_spike_at >>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>> 1 2 4 7 9 >>> 263551 18 23 5331 18 >>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>> join >>> + featureSet on core_mps.fsetid=featureSet.fsetid where >>> + featureSet.type='4';") >>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>> + probs=0.95) >>>> library(genefilter) >>>> minval <- max(bkg) >>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >> The above line has a bit extra at the end that R doesn't like. >> >>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" >> And this is your hint. Error messages are your friends. >> >> Best, >> >> Jim >> >> >>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >>>> eset[ind,] >>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>> Here is my sessionInfo() output: >>> >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 >>> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 >>> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 >>> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 >>> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 >>> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 >>> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 >>> I appreciate your help... >>> Julia >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: Thursday, February 13, 2014 12:08 PM >>> To: Sabet, Julia A >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] filtering probes in affymetrix data >>> >>> Hi Julia, >>> >>> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. >>> >>> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). >>> >>> Best, >>> >>> Jim >>> >>> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >>>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >>>> eset.filt <- getMainProbes(eset.filt) >>>> >>>> This error resulted: >>>> Error: could not find function "getMainProbes" >>>> >>>> What should I do? >>>> Thanks! >>>> Julia >>>> >>>> >>>> -----Original Message----- >>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>> Sent: Thursday, February 13, 2014 9:36 AM >>>> To: Sabet, Julia A >>>> Cc: bioconductor at r-project.org >>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>> >>>> Hi Julia, >>>> >>>> There are several different things you can do. I'll show you one possibility. >>>> >>>> First, note that there are multiple different control probes on >>>> this array that aren't intended to measure differential expression, >>>> and should be excluded. So first let's look at the possible types >>>> of >>>> probesets: >>>> >>>>> librarypd.mogene.2.0.st) >>>>> con <- dbpd.mogene.2.0.st) >>>>> dbGetQuery(con, "select * from type_dict;") >>>> type type_id >>>> 1 1 main >>>> 2 2 control->affx >>>> 3 3 control->chip >>>> 4 4 control->bgp->antigenomic >>>> 5 5 control->bgp->genomic >>>> 6 6 normgene->exon >>>> 7 7 normgene->intron >>>> 8 8 rescue->FLmRNA->unmapped >>>> 9 9 control->affx->bac_spike >>>> 10 10 oligo_spike_in >>>> 11 11 r1_bac_spike_at >>>> >>>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >>>> >>>> >>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>> 1 2 4 7 9 >>>> 263551 18 23 5331 18 >>>> >>>> So we only have these probeset types: >>>> >>>> 1 1 main >>>> 2 2 control->affx >>>> 4 4 control->bgp->antigenomic >>>> 7 7 normgene->intron >>>> 9 9 control->affx->bac_spike >>>> >>>> And the 'main' probesets are those that we want to use for >>>> differential expression. Now one thing you could do is to say that >>>> the antigenomic probesets should give a good measure of background, >>>> as they are supposed to have sequences that don't exist in mice. So >>>> you could just extract those probesets, get some measure and use >>>> that as the lower limit of what you think is expressed or not. >>>> That's pretty naive, as a probe with higher GC content will have >>>> higher background than one with a lower GC content, but worrying >>>> about that is way beyond what I am prepared to go into. >>>> >>>> Now we can get the probeset IDs for the antigenomic probesets >>>> >>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>> join featureSet on core_mps.fsetid=featureSet.fsetid where >>>> featureSet.type='4';") >>>> >>>> And then extract those probesets and get a summary statistic. >>>> >>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>> probs=0.95) >>>> >>>> Which will give us the 95th percentile of these background probes. >>>> You could then use the kOverA function in genefilter to filter out >>>> any probesets where all samples are below the background values. >>>> The idea being that you want to filter out any probesets unless k >>>> samples have expression levels >= A. So if you have 10 samples, >>>> where 5 are controls and 5 are treated, you would do something like >>>> >>>> minval <- max(bkg) >>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>>> eset[ind,] >>>> >>>> You should also filter out all the non-main probesets. You can do >>>> that using getMainProbes() in the affycoretools package >>>> >>>> eset.filt <- getMainProbes(eset.filt) >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>> >>>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>>>> Hello all, >>>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>>>> >>>>> librarypd.mogene.2.0.st) >>>>> eset <- rma(affyRaw) >>>>> >>>>> and added gene annotation and I am following the limma user's >>>>> guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>>>> >>>>> Thank you! >>>>> Julia Sabet >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Could it be because I did the kOverA function twice, since the first time I put a 5 and I meant to put a 12? I thought it would just get overrided but maybe it didn't? antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join + featureSet on core_mps.fsetid=featureSet.fsetid where + featureSet.type='4';") > bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, + probs=0.95) > minval <- max(bkg) > ind <- genefilter(eset, filterfun(kOverA(5, minval))) > eset.filt <- eset[ind,] > ind <- genefilter(eset, filterfun(kOverA(12, minval))) > eset.filt <- eset[ind,] > library(affycoretools) eset.filt <- getMainProbes(eset.filt) Error in orig[[nm]][i, , ..., drop = drop] : (subscript) logical subscript too long -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Thursday, February 13, 2014 4:05 PM To: Sabet, Julia A Cc: bioconductor at r-project.org Subject: Re: [BioC] filtering probes in affymetrix data OK, what do you get if you type getMainProbes at an R prompt? It shouldn't be subsetting the ExpressionSet twice like that. Jim On Thursday, February 13, 2014 3:59:38 PM, Sabet, Julia A wrote: > I get this: > >> traceback() > 3: input[ind, ] > 2: input[ind, ] > 1: getMainProbes(eset.filt) >> > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 3:57 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > What do you get when you run > > traceback() > > right after that error? > > Jim > > > On 2/13/2014 3:51 PM, Sabet, Julia A wrote: >> Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: >> >>> eset.filt <- getMainProbes(eset.filt) >> Error in orig[[nm]][i, , ..., drop = drop] : >> (subscript) logical subscript too long >> >> >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 3:44 PM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> Hi Julia, >> >> On 2/13/2014 3:23 PM, Sabet, Julia A wrote: >>> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: >>> >>> librarypd.mogene.2.0.st) >>>> con <- dbpd.mogene.2.0.st) >>>> dbGetQuery(con, "select * from type_dict;") >>> type type_id >>> 1 1 main >>> 2 2 control->affx >>> 3 3 control->chip >>> 4 4 control->bgp->antigenomic >>> 5 5 control->bgp->genomic >>> 6 6 normgene->exon >>> 7 7 normgene->intron >>> 8 8 rescue->FLmRNA->unmapped >>> 9 9 control->affx->bac_spike >>> 10 10 oligo_spike_in >>> 11 11 r1_bac_spike_at >>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>> 1 2 4 7 9 >>> 263551 18 23 5331 18 >>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>> join >>> + featureSet on core_mps.fsetid=featureSet.fsetid where >>> + featureSet.type='4';") >>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>> + probs=0.95) >>>> library(genefilter) >>>> minval <- max(bkg) >>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >> The above line has a bit extra at the end that R doesn't like. >> >>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" >> And this is your hint. Error messages are your friends. >> >> Best, >> >> Jim >> >> >>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >>>> eset[ind,] >>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>> Here is my sessionInfo() output: >>> >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 >>> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 >>> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 >>> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 >>> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 >>> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 >>> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 >>> I appreciate your help... >>> Julia >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: Thursday, February 13, 2014 12:08 PM >>> To: Sabet, Julia A >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] filtering probes in affymetrix data >>> >>> Hi Julia, >>> >>> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. >>> >>> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). >>> >>> Best, >>> >>> Jim >>> >>> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >>>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >>>> eset.filt <- getMainProbes(eset.filt) >>>> >>>> This error resulted: >>>> Error: could not find function "getMainProbes" >>>> >>>> What should I do? >>>> Thanks! >>>> Julia >>>> >>>> >>>> -----Original Message----- >>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>> Sent: Thursday, February 13, 2014 9:36 AM >>>> To: Sabet, Julia A >>>> Cc: bioconductor at r-project.org >>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>> >>>> Hi Julia, >>>> >>>> There are several different things you can do. I'll show you one possibility. >>>> >>>> First, note that there are multiple different control probes on >>>> this array that aren't intended to measure differential expression, >>>> and should be excluded. So first let's look at the possible types >>>> of >>>> probesets: >>>> >>>>> librarypd.mogene.2.0.st) >>>>> con <- dbpd.mogene.2.0.st) >>>>> dbGetQuery(con, "select * from type_dict;") >>>> type type_id >>>> 1 1 main >>>> 2 2 control->affx >>>> 3 3 control->chip >>>> 4 4 control->bgp->antigenomic >>>> 5 5 control->bgp->genomic >>>> 6 6 normgene->exon >>>> 7 7 normgene->intron >>>> 8 8 rescue->FLmRNA->unmapped >>>> 9 9 control->affx->bac_spike >>>> 10 10 oligo_spike_in >>>> 11 11 r1_bac_spike_at >>>> >>>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >>>> >>>> >>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>> 1 2 4 7 9 >>>> 263551 18 23 5331 18 >>>> >>>> So we only have these probeset types: >>>> >>>> 1 1 main >>>> 2 2 control->affx >>>> 4 4 control->bgp->antigenomic >>>> 7 7 normgene->intron >>>> 9 9 control->affx->bac_spike >>>> >>>> And the 'main' probesets are those that we want to use for >>>> differential expression. Now one thing you could do is to say that >>>> the antigenomic probesets should give a good measure of background, >>>> as they are supposed to have sequences that don't exist in mice. So >>>> you could just extract those probesets, get some measure and use >>>> that as the lower limit of what you think is expressed or not. >>>> That's pretty naive, as a probe with higher GC content will have >>>> higher background than one with a lower GC content, but worrying >>>> about that is way beyond what I am prepared to go into. >>>> >>>> Now we can get the probeset IDs for the antigenomic probesets >>>> >>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>> join featureSet on core_mps.fsetid=featureSet.fsetid where >>>> featureSet.type='4';") >>>> >>>> And then extract those probesets and get a summary statistic. >>>> >>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>> probs=0.95) >>>> >>>> Which will give us the 95th percentile of these background probes. >>>> You could then use the kOverA function in genefilter to filter out >>>> any probesets where all samples are below the background values. >>>> The idea being that you want to filter out any probesets unless k >>>> samples have expression levels >= A. So if you have 10 samples, >>>> where 5 are controls and 5 are treated, you would do something like >>>> >>>> minval <- max(bkg) >>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>>> eset[ind,] >>>> >>>> You should also filter out all the non-main probesets. You can do >>>> that using getMainProbes() in the affycoretools package >>>> >>>> eset.filt <- getMainProbes(eset.filt) >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>> >>>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>>>> Hello all, >>>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>>>> >>>>> librarypd.mogene.2.0.st) >>>>> eset <- rma(affyRaw) >>>>> >>>>> and added gene annotation and I am following the limma user's >>>>> guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>>>> >>>>> Thank you! >>>>> Julia Sabet >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
How many probesets do you have left after the genefilter step (e.g., what does dim(eset.filt) give you)? On Thursday, February 13, 2014 4:19:11 PM, Sabet, Julia A wrote: > Could it be because I did the kOverA function twice, since the first time I put a 5 and I meant to put a 12? I thought it would just get overrided but maybe it didn't? > > antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join > + featureSet on core_mps.fsetid=featureSet.fsetid where > + featureSet.type='4';") >> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, > + probs=0.95) >> minval <- max(bkg) >> ind <- genefilter(eset, filterfun(kOverA(5, minval))) >> eset.filt <- eset[ind,] >> ind <- genefilter(eset, filterfun(kOverA(12, minval))) >> eset.filt <- eset[ind,] >> library(affycoretools) > eset.filt <- getMainProbes(eset.filt) > Error in orig[[nm]][i, , ..., drop = drop] : > (subscript) logical subscript too long > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 4:05 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > OK, what do you get if you type > > getMainProbes > > at an R prompt? It shouldn't be subsetting the ExpressionSet twice like that. > > Jim > > > > On Thursday, February 13, 2014 3:59:38 PM, Sabet, Julia A wrote: >> I get this: >> >>> traceback() >> 3: input[ind, ] >> 2: input[ind, ] >> 1: getMainProbes(eset.filt) >>> >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 3:57 PM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> What do you get when you run >> >> traceback() >> >> right after that error? >> >> Jim >> >> >> On 2/13/2014 3:51 PM, Sabet, Julia A wrote: >>> Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: >>> >>>> eset.filt <- getMainProbes(eset.filt) >>> Error in orig[[nm]][i, , ..., drop = drop] : >>> (subscript) logical subscript too long >>> >>> >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: Thursday, February 13, 2014 3:44 PM >>> To: Sabet, Julia A >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] filtering probes in affymetrix data >>> >>> Hi Julia, >>> >>> On 2/13/2014 3:23 PM, Sabet, Julia A wrote: >>>> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: >>>> >>>> librarypd.mogene.2.0.st) >>>>> con <- dbpd.mogene.2.0.st) >>>>> dbGetQuery(con, "select * from type_dict;") >>>> type type_id >>>> 1 1 main >>>> 2 2 control->affx >>>> 3 3 control->chip >>>> 4 4 control->bgp->antigenomic >>>> 5 5 control->bgp->genomic >>>> 6 6 normgene->exon >>>> 7 7 normgene->intron >>>> 8 8 rescue->FLmRNA->unmapped >>>> 9 9 control->affx->bac_spike >>>> 10 10 oligo_spike_in >>>> 11 11 r1_bac_spike_at >>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>> 1 2 4 7 9 >>>> 263551 18 23 5331 18 >>>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>>> join >>>> + featureSet on core_mps.fsetid=featureSet.fsetid where >>>> + featureSet.type='4';") >>>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>> + probs=0.95) >>>>> library(genefilter) >>>>> minval <- max(bkg) >>>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>> The above line has a bit extra at the end that R doesn't like. >>> >>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" >>> And this is your hint. Error messages are your friends. >>> >>> Best, >>> >>> Jim >>> >>> >>>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >>>>> eset[ind,] >>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <- >>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>> Here is my sessionInfo() output: >>>> >>>> R version 3.0.2 (2013-09-25) >>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >>>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 >>>> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 >>>> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 >>>> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 >>>> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 >>>> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 >>>> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 >>>> I appreciate your help... >>>> Julia >>>> >>>> -----Original Message----- >>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>> Sent: Thursday, February 13, 2014 12:08 PM >>>> To: Sabet, Julia A >>>> Cc: bioconductor at r-project.org >>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>> >>>> Hi Julia, >>>> >>>> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. >>>> >>>> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >>>>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >>>>> eset.filt <- getMainProbes(eset.filt) >>>>> >>>>> This error resulted: >>>>> Error: could not find function "getMainProbes" >>>>> >>>>> What should I do? >>>>> Thanks! >>>>> Julia >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>>> Sent: Thursday, February 13, 2014 9:36 AM >>>>> To: Sabet, Julia A >>>>> Cc: bioconductor at r-project.org >>>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>>> >>>>> Hi Julia, >>>>> >>>>> There are several different things you can do. I'll show you one possibility. >>>>> >>>>> First, note that there are multiple different control probes on >>>>> this array that aren't intended to measure differential expression, >>>>> and should be excluded. So first let's look at the possible types >>>>> of >>>>> probesets: >>>>> >>>>>> librarypd.mogene.2.0.st) >>>>>> con <- dbpd.mogene.2.0.st) >>>>>> dbGetQuery(con, "select * from type_dict;") >>>>> type type_id >>>>> 1 1 main >>>>> 2 2 control->affx >>>>> 3 3 control->chip >>>>> 4 4 control->bgp->antigenomic >>>>> 5 5 control->bgp->genomic >>>>> 6 6 normgene->exon >>>>> 7 7 normgene->intron >>>>> 8 8 rescue->FLmRNA->unmapped >>>>> 9 9 control->affx->bac_spike >>>>> 10 10 oligo_spike_in >>>>> 11 11 r1_bac_spike_at >>>>> >>>>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >>>>> >>>>> >>>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>>> 1 2 4 7 9 >>>>> 263551 18 23 5331 18 >>>>> >>>>> So we only have these probeset types: >>>>> >>>>> 1 1 main >>>>> 2 2 control->affx >>>>> 4 4 control->bgp->antigenomic >>>>> 7 7 normgene->intron >>>>> 9 9 control->affx->bac_spike >>>>> >>>>> And the 'main' probesets are those that we want to use for >>>>> differential expression. Now one thing you could do is to say that >>>>> the antigenomic probesets should give a good measure of background, >>>>> as they are supposed to have sequences that don't exist in mice. So >>>>> you could just extract those probesets, get some measure and use >>>>> that as the lower limit of what you think is expressed or not. >>>>> That's pretty naive, as a probe with higher GC content will have >>>>> higher background than one with a lower GC content, but worrying >>>>> about that is way beyond what I am prepared to go into. >>>>> >>>>> Now we can get the probeset IDs for the antigenomic probesets >>>>> >>>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>>> join featureSet on core_mps.fsetid=featureSet.fsetid where >>>>> featureSet.type='4';") >>>>> >>>>> And then extract those probesets and get a summary statistic. >>>>> >>>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>>> probs=0.95) >>>>> >>>>> Which will give us the 95th percentile of these background probes. >>>>> You could then use the kOverA function in genefilter to filter out >>>>> any probesets where all samples are below the background values. >>>>> The idea being that you want to filter out any probesets unless k >>>>> samples have expression levels >= A. So if you have 10 samples, >>>>> where 5 are controls and 5 are treated, you would do something like >>>>> >>>>> minval <- max(bkg) >>>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>>>> eset[ind,] >>>>> >>>>> You should also filter out all the non-main probesets. You can do >>>>> that using getMainProbes() in the affycoretools package >>>>> >>>>> eset.filt <- getMainProbes(eset.filt) >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> >>>>> >>>>> >>>>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>>>>> Hello all, >>>>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>>>>> >>>>>> librarypd.mogene.2.0.st) >>>>>> eset <- rma(affyRaw) >>>>>> >>>>>> and added gene annotation and I am following the limma user's >>>>>> guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>>>>> >>>>>> Thank you! >>>>>> Julia Sabet >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> -- >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> University of Washington >>>>> Environmental and Occupational Health Sciences >>>>> 4225 Roosevelt Way NE, # 100 >>>>> Seattle WA 98105-6099 >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
It gives me this: Features Samples 2426 36 -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Thursday, February 13, 2014 5:47 PM To: Sabet, Julia A Cc: bioconductor at r-project.org Subject: Re: [BioC] filtering probes in affymetrix data How many probesets do you have left after the genefilter step (e.g., what does dim(eset.filt) give you)? On Thursday, February 13, 2014 4:19:11 PM, Sabet, Julia A wrote: > Could it be because I did the kOverA function twice, since the first time I put a 5 and I meant to put a 12? I thought it would just get overrided but maybe it didn't? > > antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join > + featureSet on core_mps.fsetid=featureSet.fsetid where > + featureSet.type='4';") >> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, > + probs=0.95) >> minval <- max(bkg) >> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >> eset[ind,] ind <- genefilter(eset, filterfun(kOverA(12, minval))) >> eset.filt <- eset[ind,] >> library(affycoretools) > eset.filt <- getMainProbes(eset.filt) > Error in orig[[nm]][i, , ..., drop = drop] : > (subscript) logical subscript too long > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 4:05 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > OK, what do you get if you type > > getMainProbes > > at an R prompt? It shouldn't be subsetting the ExpressionSet twice like that. > > Jim > > > > On Thursday, February 13, 2014 3:59:38 PM, Sabet, Julia A wrote: >> I get this: >> >>> traceback() >> 3: input[ind, ] >> 2: input[ind, ] >> 1: getMainProbes(eset.filt) >>> >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 3:57 PM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> What do you get when you run >> >> traceback() >> >> right after that error? >> >> Jim >> >> >> On 2/13/2014 3:51 PM, Sabet, Julia A wrote: >>> Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: >>> >>>> eset.filt <- getMainProbes(eset.filt) >>> Error in orig[[nm]][i, , ..., drop = drop] : >>> (subscript) logical subscript too long >>> >>> >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: Thursday, February 13, 2014 3:44 PM >>> To: Sabet, Julia A >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] filtering probes in affymetrix data >>> >>> Hi Julia, >>> >>> On 2/13/2014 3:23 PM, Sabet, Julia A wrote: >>>> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: >>>> >>>> librarypd.mogene.2.0.st) >>>>> con <- dbpd.mogene.2.0.st) >>>>> dbGetQuery(con, "select * from type_dict;") >>>> type type_id >>>> 1 1 main >>>> 2 2 control->affx >>>> 3 3 control->chip >>>> 4 4 control->bgp->antigenomic >>>> 5 5 control->bgp->genomic >>>> 6 6 normgene->exon >>>> 7 7 normgene->intron >>>> 8 8 rescue->FLmRNA->unmapped >>>> 9 9 control->affx->bac_spike >>>> 10 10 oligo_spike_in >>>> 11 11 r1_bac_spike_at >>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>> 1 2 4 7 9 >>>> 263551 18 23 5331 18 >>>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>>> join >>>> + featureSet on core_mps.fsetid=featureSet.fsetid where >>>> + featureSet.type='4';") >>>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>> + probs=0.95) >>>>> library(genefilter) >>>>> minval <- max(bkg) >>>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>> The above line has a bit extra at the end that R doesn't like. >>> >>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" >>> And this is your hint. Error messages are your friends. >>> >>> Best, >>> >>> Jim >>> >>> >>>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt >>>>> <- eset[ind,] >>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt >>>>> <- >>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>> Here is my sessionInfo() output: >>>> >>>> R version 3.0.2 (2013-09-25) >>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >>>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 >>>> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 >>>> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 >>>> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 >>>> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 >>>> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 >>>> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 >>>> I appreciate your help... >>>> Julia >>>> >>>> -----Original Message----- >>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>> Sent: Thursday, February 13, 2014 12:08 PM >>>> To: Sabet, Julia A >>>> Cc: bioconductor at r-project.org >>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>> >>>> Hi Julia, >>>> >>>> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. >>>> >>>> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >>>>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >>>>> eset.filt <- getMainProbes(eset.filt) >>>>> >>>>> This error resulted: >>>>> Error: could not find function "getMainProbes" >>>>> >>>>> What should I do? >>>>> Thanks! >>>>> Julia >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>>> Sent: Thursday, February 13, 2014 9:36 AM >>>>> To: Sabet, Julia A >>>>> Cc: bioconductor at r-project.org >>>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>>> >>>>> Hi Julia, >>>>> >>>>> There are several different things you can do. I'll show you one possibility. >>>>> >>>>> First, note that there are multiple different control probes on >>>>> this array that aren't intended to measure differential >>>>> expression, and should be excluded. So first let's look at the >>>>> possible types of >>>>> probesets: >>>>> >>>>>> librarypd.mogene.2.0.st) >>>>>> con <- dbpd.mogene.2.0.st) >>>>>> dbGetQuery(con, "select * from type_dict;") >>>>> type type_id >>>>> 1 1 main >>>>> 2 2 control->affx >>>>> 3 3 control->chip >>>>> 4 4 control->bgp->antigenomic >>>>> 5 5 control->bgp->genomic >>>>> 6 6 normgene->exon >>>>> 7 7 normgene->intron >>>>> 8 8 rescue->FLmRNA->unmapped >>>>> 9 9 control->affx->bac_spike >>>>> 10 10 oligo_spike_in >>>>> 11 11 r1_bac_spike_at >>>>> >>>>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >>>>> >>>>> >>>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>>> 1 2 4 7 9 >>>>> 263551 18 23 5331 18 >>>>> >>>>> So we only have these probeset types: >>>>> >>>>> 1 1 main >>>>> 2 2 control->affx >>>>> 4 4 control->bgp->antigenomic >>>>> 7 7 normgene->intron >>>>> 9 9 control->affx->bac_spike >>>>> >>>>> And the 'main' probesets are those that we want to use for >>>>> differential expression. Now one thing you could do is to say that >>>>> the antigenomic probesets should give a good measure of >>>>> background, as they are supposed to have sequences that don't >>>>> exist in mice. So you could just extract those probesets, get some >>>>> measure and use that as the lower limit of what you think is expressed or not. >>>>> That's pretty naive, as a probe with higher GC content will have >>>>> higher background than one with a lower GC content, but worrying >>>>> about that is way beyond what I am prepared to go into. >>>>> >>>>> Now we can get the probeset IDs for the antigenomic probesets >>>>> >>>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>>> join featureSet on core_mps.fsetid=featureSet.fsetid where >>>>> featureSet.type='4';") >>>>> >>>>> And then extract those probesets and get a summary statistic. >>>>> >>>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>>> probs=0.95) >>>>> >>>>> Which will give us the 95th percentile of these background probes. >>>>> You could then use the kOverA function in genefilter to filter out >>>>> any probesets where all samples are below the background values. >>>>> The idea being that you want to filter out any probesets unless k >>>>> samples have expression levels >= A. So if you have 10 samples, >>>>> where 5 are controls and 5 are treated, you would do something >>>>> like >>>>> >>>>> minval <- max(bkg) >>>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>>>> eset[ind,] >>>>> >>>>> You should also filter out all the non-main probesets. You can do >>>>> that using getMainProbes() in the affycoretools package >>>>> >>>>> eset.filt <- getMainProbes(eset.filt) >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> >>>>> >>>>> >>>>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>>>>> Hello all, >>>>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>>>>> >>>>>> librarypd.mogene.2.0.st) >>>>>> eset <- rma(affyRaw) >>>>>> >>>>>> and added gene annotation and I am following the limma user's >>>>>> guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>>>>> >>>>>> Thank you! >>>>>> Julia Sabet >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> -- >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> University of Washington >>>>> Environmental and Occupational Health Sciences >>>>> 4225 Roosevelt Way NE, # 100 >>>>> Seattle WA 98105-6099 >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
OK, well that's probably excessive. I sort of wonder about Affy and their control probes. I originally wrote getMainProbes because the intronic controls were always popping up in lists of differentially expressed probesets. I wonder if these antigenomic probes are actually measuring something. You might be better served to just filter out the controls right after running rma(), and then use something less conservative like filt <- filterfun(kOverA(12, 6) or you could look at the other options in genefilter like varFilter(). Best, Jim On Thursday, February 13, 2014 5:48:59 PM, Sabet, Julia A wrote: > It gives me this: > Features Samples > 2426 36 > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 5:47 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > How many probesets do you have left after the genefilter step (e.g., what does dim(eset.filt) give you)? > > On Thursday, February 13, 2014 4:19:11 PM, Sabet, Julia A wrote: >> Could it be because I did the kOverA function twice, since the first time I put a 5 and I meant to put a 12? I thought it would just get overrided but maybe it didn't? >> >> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner join >> + featureSet on core_mps.fsetid=featureSet.fsetid where >> + featureSet.type='4';") >>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >> + probs=0.95) >>> minval <- max(bkg) >>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>> eset[ind,] ind <- genefilter(eset, filterfun(kOverA(12, minval))) >>> eset.filt <- eset[ind,] >>> library(affycoretools) >> eset.filt <- getMainProbes(eset.filt) >> Error in orig[[nm]][i, , ..., drop = drop] : >> (subscript) logical subscript too long >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 4:05 PM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> OK, what do you get if you type >> >> getMainProbes >> >> at an R prompt? It shouldn't be subsetting the ExpressionSet twice like that. >> >> Jim >> >> >> >> On Thursday, February 13, 2014 3:59:38 PM, Sabet, Julia A wrote: >>> I get this: >>> >>>> traceback() >>> 3: input[ind, ] >>> 2: input[ind, ] >>> 1: getMainProbes(eset.filt) >>>> >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: Thursday, February 13, 2014 3:57 PM >>> To: Sabet, Julia A >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] filtering probes in affymetrix data >>> >>> What do you get when you run >>> >>> traceback() >>> >>> right after that error? >>> >>> Jim >>> >>> >>> On 2/13/2014 3:51 PM, Sabet, Julia A wrote: >>>> Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: >>>> >>>>> eset.filt <- getMainProbes(eset.filt) >>>> Error in orig[[nm]][i, , ..., drop = drop] : >>>> (subscript) logical subscript too long >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>> Sent: Thursday, February 13, 2014 3:44 PM >>>> To: Sabet, Julia A >>>> Cc: bioconductor at r-project.org >>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>> >>>> Hi Julia, >>>> >>>> On 2/13/2014 3:23 PM, Sabet, Julia A wrote: >>>>> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: >>>>> >>>>> librarypd.mogene.2.0.st) >>>>>> con <- dbpd.mogene.2.0.st) >>>>>> dbGetQuery(con, "select * from type_dict;") >>>>> type type_id >>>>> 1 1 main >>>>> 2 2 control->affx >>>>> 3 3 control->chip >>>>> 4 4 control->bgp->antigenomic >>>>> 5 5 control->bgp->genomic >>>>> 6 6 normgene->exon >>>>> 7 7 normgene->intron >>>>> 8 8 rescue->FLmRNA->unmapped >>>>> 9 9 control->affx->bac_spike >>>>> 10 10 oligo_spike_in >>>>> 11 11 r1_bac_spike_at >>>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>>> 1 2 4 7 9 >>>>> 263551 18 23 5331 18 >>>>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>>>> join >>>>> + featureSet on core_mps.fsetid=featureSet.fsetid where >>>>> + featureSet.type='4';") >>>>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>>> + probs=0.95) >>>>>> library(genefilter) >>>>>> minval <- max(bkg) >>>>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>>> The above line has a bit extra at the end that R doesn't like. >>>> >>>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" >>>> And this is your hint. Error messages are your friends. >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt >>>>>> <- eset[ind,] >>>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt >>>>>> <- >>>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>>> Here is my sessionInfo() output: >>>>> >>>>> R version 3.0.2 (2013-09-25) >>>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>>> >>>>> locale: >>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >>>>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >>>>> >>>>> attached base packages: >>>>> [1] parallel stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 >>>>> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 >>>>> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 >>>>> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 >>>>> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 >>>>> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 >>>>> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 >>>>> I appreciate your help... >>>>> Julia >>>>> >>>>> -----Original Message----- >>>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>>> Sent: Thursday, February 13, 2014 12:08 PM >>>>> To: Sabet, Julia A >>>>> Cc: bioconductor at r-project.org >>>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>>> >>>>> Hi Julia, >>>>> >>>>> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. >>>>> >>>>> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >>>>>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >>>>>> eset.filt <- getMainProbes(eset.filt) >>>>>> >>>>>> This error resulted: >>>>>> Error: could not find function "getMainProbes" >>>>>> >>>>>> What should I do? >>>>>> Thanks! >>>>>> Julia >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>>>> Sent: Thursday, February 13, 2014 9:36 AM >>>>>> To: Sabet, Julia A >>>>>> Cc: bioconductor at r-project.org >>>>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>>>> >>>>>> Hi Julia, >>>>>> >>>>>> There are several different things you can do. I'll show you one possibility. >>>>>> >>>>>> First, note that there are multiple different control probes on >>>>>> this array that aren't intended to measure differential >>>>>> expression, and should be excluded. So first let's look at the >>>>>> possible types of >>>>>> probesets: >>>>>> >>>>>>> librarypd.mogene.2.0.st) >>>>>>> con <- dbpd.mogene.2.0.st) >>>>>>> dbGetQuery(con, "select * from type_dict;") >>>>>> type type_id >>>>>> 1 1 main >>>>>> 2 2 control->affx >>>>>> 3 3 control->chip >>>>>> 4 4 control->bgp->antigenomic >>>>>> 5 5 control->bgp->genomic >>>>>> 6 6 normgene->exon >>>>>> 7 7 normgene->intron >>>>>> 8 8 rescue->FLmRNA->unmapped >>>>>> 9 9 control->affx->bac_spike >>>>>> 10 10 oligo_spike_in >>>>>> 11 11 r1_bac_spike_at >>>>>> >>>>>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >>>>>> >>>>>> >>>>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>>>> 1 2 4 7 9 >>>>>> 263551 18 23 5331 18 >>>>>> >>>>>> So we only have these probeset types: >>>>>> >>>>>> 1 1 main >>>>>> 2 2 control->affx >>>>>> 4 4 control->bgp->antigenomic >>>>>> 7 7 normgene->intron >>>>>> 9 9 control->affx->bac_spike >>>>>> >>>>>> And the 'main' probesets are those that we want to use for >>>>>> differential expression. Now one thing you could do is to say that >>>>>> the antigenomic probesets should give a good measure of >>>>>> background, as they are supposed to have sequences that don't >>>>>> exist in mice. So you could just extract those probesets, get some >>>>>> measure and use that as the lower limit of what you think is expressed or not. >>>>>> That's pretty naive, as a probe with higher GC content will have >>>>>> higher background than one with a lower GC content, but worrying >>>>>> about that is way beyond what I am prepared to go into. >>>>>> >>>>>> Now we can get the probeset IDs for the antigenomic probesets >>>>>> >>>>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>>>> join featureSet on core_mps.fsetid=featureSet.fsetid where >>>>>> featureSet.type='4';") >>>>>> >>>>>> And then extract those probesets and get a summary statistic. >>>>>> >>>>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>>>> probs=0.95) >>>>>> >>>>>> Which will give us the 95th percentile of these background probes. >>>>>> You could then use the kOverA function in genefilter to filter out >>>>>> any probesets where all samples are below the background values. >>>>>> The idea being that you want to filter out any probesets unless k >>>>>> samples have expression levels >= A. So if you have 10 samples, >>>>>> where 5 are controls and 5 are treated, you would do something >>>>>> like >>>>>> >>>>>> minval <- max(bkg) >>>>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>>>>> eset[ind,] >>>>>> >>>>>> You should also filter out all the non-main probesets. You can do >>>>>> that using getMainProbes() in the affycoretools package >>>>>> >>>>>> eset.filt <- getMainProbes(eset.filt) >>>>>> >>>>>> Best, >>>>>> >>>>>> Jim >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>>>>>> Hello all, >>>>>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>>>>>> >>>>>>> librarypd.mogene.2.0.st) >>>>>>> eset <- rma(affyRaw) >>>>>>> >>>>>>> and added gene annotation and I am following the limma user's >>>>>>> guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>>>>>> >>>>>>> Thank you! >>>>>>> Julia Sabet >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> -- >>>>>> James W. MacDonald, M.S. >>>>>> Biostatistician >>>>>> University of Washington >>>>>> Environmental and Occupational Health Sciences >>>>>> 4225 Roosevelt Way NE, # 100 >>>>>> Seattle WA 98105-6099 >>>>> -- >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> University of Washington >>>>> Environmental and Occupational Health Sciences >>>>> 4225 Roosevelt Way NE, # 100 >>>>> Seattle WA 98105-6099 >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>>> >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Thank you. I couldn't figure out how to filter out controls. What would the code be to filter out both controls and background using kOverA as you suggested? Thanks Julia -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Thursday, February 13, 2014 6:21 PM To: Sabet, Julia A Cc: bioconductor at r-project.org Subject: Re: [BioC] filtering probes in affymetrix data OK, well that's probably excessive. I sort of wonder about Affy and their control probes. I originally wrote getMainProbes because the intronic controls were always popping up in lists of differentially expressed probesets. I wonder if these antigenomic probes are actually measuring something. You might be better served to just filter out the controls right after running rma(), and then use something less conservative like filt <- filterfun(kOverA(12, 6) or you could look at the other options in genefilter like varFilter(). Best, Jim On Thursday, February 13, 2014 5:48:59 PM, Sabet, Julia A wrote: > It gives me this: > Features Samples > 2426 36 > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 5:47 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > How many probesets do you have left after the genefilter step (e.g., what does dim(eset.filt) give you)? > > On Thursday, February 13, 2014 4:19:11 PM, Sabet, Julia A wrote: >> Could it be because I did the kOverA function twice, since the first time I put a 5 and I meant to put a 12? I thought it would just get overrided but maybe it didn't? >> >> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >> join >> + featureSet on core_mps.fsetid=featureSet.fsetid where >> + featureSet.type='4';") >>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >> + probs=0.95) >>> minval <- max(bkg) >>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>> eset[ind,] ind <- genefilter(eset, filterfun(kOverA(12, minval))) >>> eset.filt <- eset[ind,] >>> library(affycoretools) >> eset.filt <- getMainProbes(eset.filt) Error in orig[[nm]][i, , ..., >> drop = drop] : >> (subscript) logical subscript too long >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 4:05 PM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> OK, what do you get if you type >> >> getMainProbes >> >> at an R prompt? It shouldn't be subsetting the ExpressionSet twice like that. >> >> Jim >> >> >> >> On Thursday, February 13, 2014 3:59:38 PM, Sabet, Julia A wrote: >>> I get this: >>> >>>> traceback() >>> 3: input[ind, ] >>> 2: input[ind, ] >>> 1: getMainProbes(eset.filt) >>>> >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: Thursday, February 13, 2014 3:57 PM >>> To: Sabet, Julia A >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] filtering probes in affymetrix data >>> >>> What do you get when you run >>> >>> traceback() >>> >>> right after that error? >>> >>> Jim >>> >>> >>> On 2/13/2014 3:51 PM, Sabet, Julia A wrote: >>>> Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: >>>> >>>>> eset.filt <- getMainProbes(eset.filt) >>>> Error in orig[[nm]][i, , ..., drop = drop] : >>>> (subscript) logical subscript too long >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>> Sent: Thursday, February 13, 2014 3:44 PM >>>> To: Sabet, Julia A >>>> Cc: bioconductor at r-project.org >>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>> >>>> Hi Julia, >>>> >>>> On 2/13/2014 3:23 PM, Sabet, Julia A wrote: >>>>> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: >>>>> >>>>> librarypd.mogene.2.0.st) >>>>>> con <- dbpd.mogene.2.0.st) >>>>>> dbGetQuery(con, "select * from type_dict;") >>>>> type type_id >>>>> 1 1 main >>>>> 2 2 control->affx >>>>> 3 3 control->chip >>>>> 4 4 control->bgp->antigenomic >>>>> 5 5 control->bgp->genomic >>>>> 6 6 normgene->exon >>>>> 7 7 normgene->intron >>>>> 8 8 rescue->FLmRNA->unmapped >>>>> 9 9 control->affx->bac_spike >>>>> 10 10 oligo_spike_in >>>>> 11 11 r1_bac_spike_at >>>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>>> 1 2 4 7 9 >>>>> 263551 18 23 5331 18 >>>>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>>>> join >>>>> + featureSet on core_mps.fsetid=featureSet.fsetid where >>>>> + featureSet.type='4';") >>>>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>>> + probs=0.95) >>>>>> library(genefilter) >>>>>> minval <- max(bkg) >>>>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt >>>>>> <- >>>> The above line has a bit extra at the end that R doesn't like. >>>> >>>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" >>>> And this is your hint. Error messages are your friends. >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt >>>>>> <- eset[ind,] >>>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt >>>>>> <- >>>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>>> Here is my sessionInfo() output: >>>>> >>>>> R version 3.0.2 (2013-09-25) >>>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>>> >>>>> locale: >>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >>>>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >>>>> >>>>> attached base packages: >>>>> [1] parallel stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 >>>>> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 >>>>> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 >>>>> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 >>>>> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 >>>>> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 >>>>> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 >>>>> I appreciate your help... >>>>> Julia >>>>> >>>>> -----Original Message----- >>>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>>> Sent: Thursday, February 13, 2014 12:08 PM >>>>> To: Sabet, Julia A >>>>> Cc: bioconductor at r-project.org >>>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>>> >>>>> Hi Julia, >>>>> >>>>> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. >>>>> >>>>> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >>>>>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >>>>>> eset.filt <- getMainProbes(eset.filt) >>>>>> >>>>>> This error resulted: >>>>>> Error: could not find function "getMainProbes" >>>>>> >>>>>> What should I do? >>>>>> Thanks! >>>>>> Julia >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>>>> Sent: Thursday, February 13, 2014 9:36 AM >>>>>> To: Sabet, Julia A >>>>>> Cc: bioconductor at r-project.org >>>>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>>>> >>>>>> Hi Julia, >>>>>> >>>>>> There are several different things you can do. I'll show you one possibility. >>>>>> >>>>>> First, note that there are multiple different control probes on >>>>>> this array that aren't intended to measure differential >>>>>> expression, and should be excluded. So first let's look at the >>>>>> possible types of >>>>>> probesets: >>>>>> >>>>>>> librarypd.mogene.2.0.st) >>>>>>> con <- dbpd.mogene.2.0.st) >>>>>>> dbGetQuery(con, "select * from type_dict;") >>>>>> type type_id >>>>>> 1 1 main >>>>>> 2 2 control->affx >>>>>> 3 3 control->chip >>>>>> 4 4 control->bgp->antigenomic >>>>>> 5 5 control->bgp->genomic >>>>>> 6 6 normgene->exon >>>>>> 7 7 normgene->intron >>>>>> 8 8 rescue->FLmRNA->unmapped >>>>>> 9 9 control->affx->bac_spike >>>>>> 10 10 oligo_spike_in >>>>>> 11 11 r1_bac_spike_at >>>>>> >>>>>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >>>>>> >>>>>> >>>>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>>>> 1 2 4 7 9 >>>>>> 263551 18 23 5331 18 >>>>>> >>>>>> So we only have these probeset types: >>>>>> >>>>>> 1 1 main >>>>>> 2 2 control->affx >>>>>> 4 4 control->bgp->antigenomic >>>>>> 7 7 normgene->intron >>>>>> 9 9 control->affx->bac_spike >>>>>> >>>>>> And the 'main' probesets are those that we want to use for >>>>>> differential expression. Now one thing you could do is to say >>>>>> that the antigenomic probesets should give a good measure of >>>>>> background, as they are supposed to have sequences that don't >>>>>> exist in mice. So you could just extract those probesets, get >>>>>> some measure and use that as the lower limit of what you think is expressed or not. >>>>>> That's pretty naive, as a probe with higher GC content will have >>>>>> higher background than one with a lower GC content, but worrying >>>>>> about that is way beyond what I am prepared to go into. >>>>>> >>>>>> Now we can get the probeset IDs for the antigenomic probesets >>>>>> >>>>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>>>> join featureSet on core_mps.fsetid=featureSet.fsetid where >>>>>> featureSet.type='4';") >>>>>> >>>>>> And then extract those probesets and get a summary statistic. >>>>>> >>>>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>>>> probs=0.95) >>>>>> >>>>>> Which will give us the 95th percentile of these background probes. >>>>>> You could then use the kOverA function in genefilter to filter >>>>>> out any probesets where all samples are below the background values. >>>>>> The idea being that you want to filter out any probesets unless k >>>>>> samples have expression levels >= A. So if you have 10 samples, >>>>>> where 5 are controls and 5 are treated, you would do something >>>>>> like >>>>>> >>>>>> minval <- max(bkg) >>>>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt >>>>>> <- eset[ind,] >>>>>> >>>>>> You should also filter out all the non-main probesets. You can do >>>>>> that using getMainProbes() in the affycoretools package >>>>>> >>>>>> eset.filt <- getMainProbes(eset.filt) >>>>>> >>>>>> Best, >>>>>> >>>>>> Jim >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>>>>>> Hello all, >>>>>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>>>>>> >>>>>>> librarypd.mogene.2.0.st) >>>>>>> eset <- rma(affyRaw) >>>>>>> >>>>>>> and added gene annotation and I am following the limma user's >>>>>>> guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>>>>>> >>>>>>> Thank you! >>>>>>> Julia Sabet >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conducto >>>>>>> r >>>>>> -- >>>>>> James W. MacDonald, M.S. >>>>>> Biostatistician >>>>>> University of Washington >>>>>> Environmental and Occupational Health Sciences >>>>>> 4225 Roosevelt Way NE, # 100 >>>>>> Seattle WA 98105-6099 >>>>> -- >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> University of Washington >>>>> Environmental and Occupational Health Sciences >>>>> 4225 Roosevelt Way NE, # 100 >>>>> Seattle WA 98105-6099 >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>>> >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi Julia, Here is an example. Here I am using a different array, and I only have three samples per group, but otherwise the steps are the same. > eset <- rma(dat) > eset.filt <- getMainProbes(eset) > library(genefilter) > ind <- genefilter(eset.filt, filterfun(kOverA(3, 6))) > eset.dblfilt <- eset.filt[ind,] > dim(eset) Features Samples 70523 12 > dim(eset.filt) Features Samples 68993 12 > dim(eset.dblfilt) Features Samples 13445 12 Best, Jim On 2/19/2014 3:13 PM, Sabet, Julia A wrote: > Thank you. I couldn't figure out how to filter out controls. What would the code be to filter out both controls and background using kOverA as you suggested? > Thanks > Julia > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, February 13, 2014 6:21 PM > To: Sabet, Julia A > Cc: bioconductor at r-project.org > Subject: Re: [BioC] filtering probes in affymetrix data > > OK, well that's probably excessive. I sort of wonder about Affy and their control probes. I originally wrote getMainProbes because the intronic controls were always popping up in lists of differentially expressed probesets. I wonder if these antigenomic probes are actually measuring something. > > You might be better served to just filter out the controls right after running rma(), and then use something less conservative like > > filt <- filterfun(kOverA(12, 6) > > or you could look at the other options in genefilter like varFilter(). > > Best, > > Jim > > On Thursday, February 13, 2014 5:48:59 PM, Sabet, Julia A wrote: >> It gives me this: >> Features Samples >> 2426 36 >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: Thursday, February 13, 2014 5:47 PM >> To: Sabet, Julia A >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] filtering probes in affymetrix data >> >> How many probesets do you have left after the genefilter step (e.g., what does dim(eset.filt) give you)? >> >> On Thursday, February 13, 2014 4:19:11 PM, Sabet, Julia A wrote: >>> Could it be because I did the kOverA function twice, since the first time I put a 5 and I meant to put a 12? I thought it would just get overrided but maybe it didn't? >>> >>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>> join >>> + featureSet on core_mps.fsetid=featureSet.fsetid where >>> + featureSet.type='4';") >>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>> + probs=0.95) >>>> minval <- max(bkg) >>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <- >>>> eset[ind,] ind <- genefilter(eset, filterfun(kOverA(12, minval))) >>>> eset.filt <- eset[ind,] >>>> library(affycoretools) >>> eset.filt <- getMainProbes(eset.filt) Error in orig[[nm]][i, , ..., >>> drop = drop] : >>> (subscript) logical subscript too long >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: Thursday, February 13, 2014 4:05 PM >>> To: Sabet, Julia A >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] filtering probes in affymetrix data >>> >>> OK, what do you get if you type >>> >>> getMainProbes >>> >>> at an R prompt? It shouldn't be subsetting the ExpressionSet twice like that. >>> >>> Jim >>> >>> >>> >>> On Thursday, February 13, 2014 3:59:38 PM, Sabet, Julia A wrote: >>>> I get this: >>>> >>>>> traceback() >>>> 3: input[ind, ] >>>> 2: input[ind, ] >>>> 1: getMainProbes(eset.filt) >>>> -----Original Message----- >>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>> Sent: Thursday, February 13, 2014 3:57 PM >>>> To: Sabet, Julia A >>>> Cc: bioconductor at r-project.org >>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>> >>>> What do you get when you run >>>> >>>> traceback() >>>> >>>> right after that error? >>>> >>>> Jim >>>> >>>> >>>> On 2/13/2014 3:51 PM, Sabet, Julia A wrote: >>>>> Thank you both. Solved that problem. Now when I do the next line of code, I get another error message: >>>>> >>>>>> eset.filt <- getMainProbes(eset.filt) >>>>> Error in orig[[nm]][i, , ..., drop = drop] : >>>>> (subscript) logical subscript too long >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>>> Sent: Thursday, February 13, 2014 3:44 PM >>>>> To: Sabet, Julia A >>>>> Cc: bioconductor at r-project.org >>>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>>> >>>>> Hi Julia, >>>>> >>>>> On 2/13/2014 3:23 PM, Sabet, Julia A wrote: >>>>>> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step: >>>>>> >>>>>> librarypd.mogene.2.0.st) >>>>>>> con <- dbpd.mogene.2.0.st) >>>>>>> dbGetQuery(con, "select * from type_dict;") >>>>>> type type_id >>>>>> 1 1 main >>>>>> 2 2 control->affx >>>>>> 3 3 control->chip >>>>>> 4 4 control->bgp->antigenomic >>>>>> 5 5 control->bgp->genomic >>>>>> 6 6 normgene->exon >>>>>> 7 7 normgene->intron >>>>>> 8 8 rescue->FLmRNA->unmapped >>>>>> 9 9 control->affx->bac_spike >>>>>> 10 10 oligo_spike_in >>>>>> 11 11 r1_bac_spike_at >>>>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>>>> 1 2 4 7 9 >>>>>> 263551 18 23 5331 18 >>>>>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>>>>> join >>>>>> + featureSet on core_mps.fsetid=featureSet.fsetid where >>>>>> + featureSet.type='4';") >>>>>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>>>> + probs=0.95) >>>>>>> library(genefilter) >>>>>>> minval <- max(bkg) >>>>>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt >>>>>>> <- >>>>> The above line has a bit extra at the end that R doesn't like. >>>>> >>>>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt" >>>>> And this is your hint. Error messages are your friends. >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> >>>>>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt >>>>>>> <- eset[ind,] >>>>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>>>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt >>>>>>> <- >>>>>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt" >>>>>> Here is my sessionInfo() output: >>>>>> >>>>>> R version 3.0.2 (2013-09-25) >>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>>>> >>>>>> locale: >>>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 >>>>>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252 >>>>>> >>>>>> attached base packages: >>>>>> [1] parallel stats graphics grDevices utils datasets methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0 >>>>>> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0 >>>>>> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1 >>>>>> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 >>>>>> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12 >>>>>> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2 >>>>>> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0 >>>>>> I appreciate your help... >>>>>> Julia >>>>>> >>>>>> -----Original Message----- >>>>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>>>> Sent: Thursday, February 13, 2014 12:08 PM >>>>>> To: Sabet, Julia A >>>>>> Cc: bioconductor at r-project.org >>>>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>>>> >>>>>> Hi Julia, >>>>>> >>>>>> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded. >>>>>> >>>>>> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools). >>>>>> >>>>>> Best, >>>>>> >>>>>> Jim >>>>>> >>>>>> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote: >>>>>>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did: >>>>>>> eset.filt <- getMainProbes(eset.filt) >>>>>>> >>>>>>> This error resulted: >>>>>>> Error: could not find function "getMainProbes" >>>>>>> >>>>>>> What should I do? >>>>>>> Thanks! >>>>>>> Julia >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>>>>>> Sent: Thursday, February 13, 2014 9:36 AM >>>>>>> To: Sabet, Julia A >>>>>>> Cc: bioconductor at r-project.org >>>>>>> Subject: Re: [BioC] filtering probes in affymetrix data >>>>>>> >>>>>>> Hi Julia, >>>>>>> >>>>>>> There are several different things you can do. I'll show you one possibility. >>>>>>> >>>>>>> First, note that there are multiple different control probes on >>>>>>> this array that aren't intended to measure differential >>>>>>> expression, and should be excluded. So first let's look at the >>>>>>> possible types of >>>>>>> probesets: >>>>>>> >>>>>>>> librarypd.mogene.2.0.st) >>>>>>>> con <- dbpd.mogene.2.0.st) >>>>>>>> dbGetQuery(con, "select * from type_dict;") >>>>>>> type type_id >>>>>>> 1 1 main >>>>>>> 2 2 control->affx >>>>>>> 3 3 control->chip >>>>>>> 4 4 control->bgp->antigenomic >>>>>>> 5 5 control->bgp->genomic >>>>>>> 6 6 normgene->exon >>>>>>> 7 7 normgene->intron >>>>>>> 8 8 rescue->FLmRNA->unmapped >>>>>>> 9 9 control->affx->bac_spike >>>>>>> 10 10 oligo_spike_in >>>>>>> 11 11 r1_bac_spike_at >>>>>>> >>>>>>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do: >>>>>>> >>>>>>> >>>>>>>> table(dbGetQuery(con, "select type from featureSet;")[,1]) >>>>>>> 1 2 4 7 9 >>>>>>> 263551 18 23 5331 18 >>>>>>> >>>>>>> So we only have these probeset types: >>>>>>> >>>>>>> 1 1 main >>>>>>> 2 2 control->affx >>>>>>> 4 4 control->bgp->antigenomic >>>>>>> 7 7 normgene->intron >>>>>>> 9 9 control->affx->bac_spike >>>>>>> >>>>>>> And the 'main' probesets are those that we want to use for >>>>>>> differential expression. Now one thing you could do is to say >>>>>>> that the antigenomic probesets should give a good measure of >>>>>>> background, as they are supposed to have sequences that don't >>>>>>> exist in mice. So you could just extract those probesets, get >>>>>>> some measure and use that as the lower limit of what you think is expressed or not. >>>>>>> That's pretty naive, as a probe with higher GC content will have >>>>>>> higher background than one with a lower GC content, but worrying >>>>>>> about that is way beyond what I am prepared to go into. >>>>>>> >>>>>>> Now we can get the probeset IDs for the antigenomic probesets >>>>>>> >>>>>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner >>>>>>> join featureSet on core_mps.fsetid=featureSet.fsetid where >>>>>>> featureSet.type='4';") >>>>>>> >>>>>>> And then extract those probesets and get a summary statistic. >>>>>>> >>>>>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile, >>>>>>> probs=0.95) >>>>>>> >>>>>>> Which will give us the 95th percentile of these background probes. >>>>>>> You could then use the kOverA function in genefilter to filter >>>>>>> out any probesets where all samples are below the background values. >>>>>>> The idea being that you want to filter out any probesets unless k >>>>>>> samples have expression levels >= A. So if you have 10 samples, >>>>>>> where 5 are controls and 5 are treated, you would do something >>>>>>> like >>>>>>> >>>>>>> minval <- max(bkg) >>>>>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt >>>>>>> <- eset[ind,] >>>>>>> >>>>>>> You should also filter out all the non-main probesets. You can do >>>>>>> that using getMainProbes() in the affycoretools package >>>>>>> >>>>>>> eset.filt <- getMainProbes(eset.filt) >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Jim >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote: >>>>>>>> Hello all, >>>>>>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this: >>>>>>>> >>>>>>>> librarypd.mogene.2.0.st) >>>>>>>> eset <- rma(affyRaw) >>>>>>>> >>>>>>>> and added gene annotation and I am following the limma user's >>>>>>>> guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code? >>>>>>>> >>>>>>>> Thank you! >>>>>>>> Julia Sabet >>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor at r-project.org >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>> Search the archives: >>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conducto >>>>>>>> r >>>>>>> -- >>>>>>> James W. MacDonald, M.S. >>>>>>> Biostatistician >>>>>>> University of Washington >>>>>>> Environmental and Occupational Health Sciences >>>>>>> 4225 Roosevelt Way NE, # 100 >>>>>>> Seattle WA 98105-6099 >>>>>> -- >>>>>> James W. MacDonald, M.S. >>>>>> Biostatistician >>>>>> University of Washington >>>>>> Environmental and Occupational Health Sciences >>>>>> 4225 Roosevelt Way NE, # 100 >>>>>> Seattle WA 98105-6099 >>>>> -- >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> University of Washington >>>>> Environmental and Occupational Health Sciences >>>>> 4225 Roosevelt Way NE, # 100 >>>>> Seattle WA 98105-6099 >>>>> >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY

Login before adding your answer.

Traffic: 443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6