kOverA(k, A=100, na.rm=TRUE)

0

Entering edit mode

Roberts, Raymond ▴ 30

@roberts-raymond-2877

Last seen 10.7 years ago

sessionInfo() > sessionInfo() R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] grid splines stats graphics grDevices datasets [7] tools utils methods base other attached packages: [1] RColorBrewer_1.0-2 goProfiles_1.2.0 [3] aroma.affymetrix_0.9.3 aroma.apd_0.1.3 [5] R.huge_0.1.5 affxparser_1.12.2 [7] aroma.core_0.9.3 sfit_0.1.5 [9] aroma.light_1.8.1 digest_0.3.1 [11] matrixStats_0.1.2 R.rsp_0.3.4 [13] R.cache_0.1.7 farms_1.3 [15] MASS_7.2-41 RMySQL_0.6-0 [17] gplots_2.6.0 gdata_2.4.2 [19] gtools_2.5.0 affycoretools_1.12.0 [21] annaffy_1.12.1 KEGG.db_2.2.0 [23] gcrma_2.12.1 matchprobes_1.12.0 [25] biomaRt_1.14.0 RCurl_0.9-3 [27] GOstats_2.6.0 Category_2.6.0 [29] genefilter_1.20.0 survival_2.34-1 [31] RBGL_1.16.0 annotate_1.18.0 [33] xtable_1.5-2 GO.db_2.2.0 [35] AnnotationDbi_1.2.1 RSQLite_0.6-8 [37] DBI_0.2-4 graph_1.18.1 [39] limma_2.14.5 affy_1.18.2 [41] preprocessCore_1.2.0 affyio_1.8.0 [43] rJava_0.5-1 Biobase_2.0.1 [45] R.utils_1.0.2 R.oo_1.4.3 [47] R.methodsS3_1.0.1 loaded via a namespace (and not attached): [1] cluster_1.11.10 XML_1.94-0.1 There are only 78 shared between the two outputs. kOverA(2,7) produces 361 genes, kOverA(2,6) produces 121 genes. This seems like a major problem that I encountered in a completely different script, which was not mine so I just scrapped using it. There are no apparent similarities between the genes being included in both output sets. Is there another function I can use that will do the same filtering but may be more reliable? Ray Lovelace Respiratory Research Institute -----Original Message----- From: Robert Gentleman [mailto:rgentlem@fhcrc.org] Sent: Monday, June 30, 2008 2:27 PM To: Roberts, Raymond Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] kOverA(k, A=100, na.rm=TRUE) Hi, That behavior seems somewhat unusual. Could you perhaps see which genes are selected for kOverA(2,7) and which for kOverA(2,6)? And then look at the expression values to see what might be going on.. Also please include the output of sessionInfo, as requested in the posting guide. Robert Roberts, Raymond wrote: > Dear List, > > > > I'm using genefilter and Limma to analyze the human exon array. I did > my preprocessing with aroma.affymetrix and then loaded it into limma and > used genefilter to isolate the significant genes. At least that is what > I think I am doing. Here is my script starting at the function I have > the question about > > > > > > ##Filter for differentialy expressed genes > > f1<-kOverA(2,7) > > ffun<-filterfun(f1) > > which<-genefilter(exprs(RMASet), ffun) > > rma_no<-summary(which) > > rmafiltset<-RMASet[which,] > > > > > > > > grps <- paste(pData(rmafiltset)$hours) > > lev <- c("time0", "time3", "time6", "time12", "time24", "time48") > > f <- factor(grps, levels = lev) > > design <- model.matrix(~ 0 + f) > > colnames(design) <- lev > > rmafit <- lmFit(rmafiltset, design) > > > > > > > > contrast.matrix <- makeContrasts("time3-time0", "time6-time3", > "time12-time6", "time24-time12", "time48-time24", "time6-time0", > "time12-time0", "time24-time0", "time48-time0", levels=design) > > rmafit2 <- contrasts.fit(rmafit, contrast.matrix) > > rmafit3 <- eBayes(rmafit2) > > > > > > pval<-0.05 > > l2foldch<- 0.58496250072 > > > > rmaresults<-decideTests(rmafit3, p.value=pval, lfc=l2foldch) > > > > > > rmalist1 <- rmaresults[,1]!=0 > > rmalist2 <- rmaresults[,2]!=0 > > rmalist3 <- rmaresults[,3]!=0 > > rmalist4 <- rmaresults[,4]!=0 > > rmalist5 <- rmaresults[,5]!=0 > > rmalist6 <- rmaresults[,6]!=0 > > rmalist7 <- rmaresults[,7]!=0 > > rmalist8 <- rmaresults[,8]!=0 > > rmalist9 <- rmaresults[,9]!=0 > > > > rmagroup1 <- rmafiltset[rmalist1,] > > rmagroup2 <- rmafiltset[rmalist2,] > > rmagroup3 <- rmafiltset[rmalist3,] > > rmagroup4 <- rmafiltset[rmalist4,] > > rmagroup5 <- rmafiltset[rmalist5,] > > rmagroup6 <- rmafiltset[rmalist6,] > > rmagroup7 <- rmafiltset[rmalist7,] > > rmagroup8 <- rmafiltset[rmalist8,] > > rmagroup9 <- rmafiltset[rmalist9,] > > > > rmaglist1 <- featureNames(rmagroup1) > > rmaglist2 <- featureNames(rmagroup2) > > rmaglist3 <- featureNames(rmagroup3) > > rmaglist4 <- featureNames(rmagroup4) > > rmaglist5 <- featureNames(rmagroup5) > > rmaglist6 <- featureNames(rmagroup6) > > rmaglist7 <- featureNames(rmagroup7) > > rmaglist8 <- featureNames(rmagroup8) > > rmaglist9 <- featureNames(rmagroup9) > > > > rmaunion <- union(rmaglist1, union(rmaglist2, union(rmaglist3, > union(rmaglist4, union(rmaglist5, union(rmaglist6, union(rmaglist7, > union(rmaglist8, rmaglist9)))))))) > > > > rmaunionset <- rmafiltset[rmaunion,] > > > > plotdata2 <- exprs(rmaunionset) > > colnames(plotdata2) <- paste(pData(RMASet)$label) > > > > > > I go on to create a heatmap and html output of significant genes. My > problem is that when I change kOverA(2, 7) to, say, kOverA(2, 6), I get > fewer genes, I thought this should increase the number of genes I'm > seeing in the final output. If I increase from 7 to 8 I still get a > shorter list. It seems like 7 is the value that generates the maximum > output. If anyone could explain why this is happening I would really > appreciate it. If I should post more information please just let me > know. > > > > > > Ray Roberts > > > > Lovelace Respiratory Research Institute > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

GO Preprocessing Cancer genefilter limma GO Preprocessing Cancer genefilter limma • 1.1k views

ADD COMMENT • link updated 16.8 years ago by rgentleman ★ 5.5k • written 16.8 years ago by Roberts, Raymond ▴ 30

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 10.0 years ago

United States

Hi, Roberts, Raymond wrote: > sessionInfo() >> sessionInfo() > R version 2.7.0 (2008-04-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] grid splines stats graphics grDevices datasets > [7] tools utils methods base > > other attached packages: > [1] RColorBrewer_1.0-2 goProfiles_1.2.0 > [3] aroma.affymetrix_0.9.3 aroma.apd_0.1.3 > [5] R.huge_0.1.5 affxparser_1.12.2 > [7] aroma.core_0.9.3 sfit_0.1.5 > [9] aroma.light_1.8.1 digest_0.3.1 > [11] matrixStats_0.1.2 R.rsp_0.3.4 > [13] R.cache_0.1.7 farms_1.3 > [15] MASS_7.2-41 RMySQL_0.6-0 > [17] gplots_2.6.0 gdata_2.4.2 > [19] gtools_2.5.0 affycoretools_1.12.0 > [21] annaffy_1.12.1 KEGG.db_2.2.0 > [23] gcrma_2.12.1 matchprobes_1.12.0 > [25] biomaRt_1.14.0 RCurl_0.9-3 > [27] GOstats_2.6.0 Category_2.6.0 > [29] genefilter_1.20.0 survival_2.34-1 > [31] RBGL_1.16.0 annotate_1.18.0 > [33] xtable_1.5-2 GO.db_2.2.0 > [35] AnnotationDbi_1.2.1 RSQLite_0.6-8 > [37] DBI_0.2-4 graph_1.18.1 > [39] limma_2.14.5 affy_1.18.2 > [41] preprocessCore_1.2.0 affyio_1.8.0 > [43] rJava_0.5-1 Biobase_2.0.1 > [45] R.utils_1.0.2 R.oo_1.4.3 > [47] R.methodsS3_1.0.1 > > loaded via a namespace (and not attached): > [1] cluster_1.11.10 XML_1.94-0.1 > > Please show the code and the output - clearly there is a problem where there should not be one, and by not giving us the whole story you do make it rather hard to diagnose. > > There are only 78 shared between the two outputs. kOverA(2,7) produces > 361 genes, kOverA(2,6) produces 121 genes. This seems like a major > problem that I encountered in a completely different script, which was > not mine so I just scrapped using it. So all kOverA is doing is asking whether at least k of the arrays are larger than A. In your case k is 2 and once it is 7, another time 6. The code for kOverA is so simple as to suggest there is no problem there function (k, A = 100, na.rm = TRUE) { function(x) { if (na.rm) x <- x[!is.na(x)] sum(x > A) >= k } } this is applied row-wise to your expression matrix. Suppose your ExpressionSet is called myExpr then you can mimic kOverA as follows kO6 = apply(exprs(myExpr), 1, function(x) sum(x > 6) >= 2) kO7 = apply(exprs(myExpr), 1, function(x) sum(x > 7) >= 2) so perhaps you could run both of those and tell us what happens? and how they compare with the genes you selected - and please do show enough of the code so that someone other than yourself might be able to find the source of the error Robert > > There are no apparent similarities between the genes being included in > both output sets. Is there another function I can use that will do the > same filtering but may be more reliable? > > > Ray > > Lovelace Respiratory Research Institute > > > > > > > > -----Original Message----- > From: Robert Gentleman [mailto:rgentlem at fhcrc.org] > Sent: Monday, June 30, 2008 2:27 PM > To: Roberts, Raymond > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] kOverA(k, A=100, na.rm=TRUE) > > Hi, > That behavior seems somewhat unusual. > Could you perhaps see which genes are selected for > kOverA(2,7) and which for kOverA(2,6)? And then look at the expression > values to see what might be going on.. > > Also please include the output of sessionInfo, as requested in the > posting guide. > > Robert > > > Roberts, Raymond wrote: >> Dear List, >> >> >> >> I'm using genefilter and Limma to analyze the human exon array. I did >> my preprocessing with aroma.affymetrix and then loaded it into limma > and >> used genefilter to isolate the significant genes. At least that is > what >> I think I am doing. Here is my script starting at the function I have >> the question about >> >> >> >> >> >> ##Filter for differentialy expressed genes >> >> f1<-kOverA(2,7) >> >> ffun<-filterfun(f1) >> >> which<-genefilter(exprs(RMASet), ffun) >> >> rma_no<-summary(which) >> >> rmafiltset<-RMASet[which,] >> >> >> >> >> >> >> >> grps <- paste(pData(rmafiltset)$hours) >> >> lev <- c("time0", "time3", "time6", "time12", "time24", "time48") >> >> f <- factor(grps, levels = lev) >> >> design <- model.matrix(~ 0 + f) >> >> colnames(design) <- lev >> >> rmafit <- lmFit(rmafiltset, design) >> >> >> >> >> >> >> >> contrast.matrix <- makeContrasts("time3-time0", "time6-time3", >> "time12-time6", "time24-time12", "time48-time24", "time6-time0", >> "time12-time0", "time24-time0", "time48-time0", levels=design) >> >> rmafit2 <- contrasts.fit(rmafit, contrast.matrix) >> >> rmafit3 <- eBayes(rmafit2) >> >> >> >> >> >> pval<-0.05 >> >> l2foldch<- 0.58496250072 >> >> >> >> rmaresults<-decideTests(rmafit3, p.value=pval, lfc=l2foldch) >> >> >> >> >> >> rmalist1 <- rmaresults[,1]!=0 >> >> rmalist2 <- rmaresults[,2]!=0 >> >> rmalist3 <- rmaresults[,3]!=0 >> >> rmalist4 <- rmaresults[,4]!=0 >> >> rmalist5 <- rmaresults[,5]!=0 >> >> rmalist6 <- rmaresults[,6]!=0 >> >> rmalist7 <- rmaresults[,7]!=0 >> >> rmalist8 <- rmaresults[,8]!=0 >> >> rmalist9 <- rmaresults[,9]!=0 >> >> >> >> rmagroup1 <- rmafiltset[rmalist1,] >> >> rmagroup2 <- rmafiltset[rmalist2,] >> >> rmagroup3 <- rmafiltset[rmalist3,] >> >> rmagroup4 <- rmafiltset[rmalist4,] >> >> rmagroup5 <- rmafiltset[rmalist5,] >> >> rmagroup6 <- rmafiltset[rmalist6,] >> >> rmagroup7 <- rmafiltset[rmalist7,] >> >> rmagroup8 <- rmafiltset[rmalist8,] >> >> rmagroup9 <- rmafiltset[rmalist9,] >> >> >> >> rmaglist1 <- featureNames(rmagroup1) >> >> rmaglist2 <- featureNames(rmagroup2) >> >> rmaglist3 <- featureNames(rmagroup3) >> >> rmaglist4 <- featureNames(rmagroup4) >> >> rmaglist5 <- featureNames(rmagroup5) >> >> rmaglist6 <- featureNames(rmagroup6) >> >> rmaglist7 <- featureNames(rmagroup7) >> >> rmaglist8 <- featureNames(rmagroup8) >> >> rmaglist9 <- featureNames(rmagroup9) >> >> >> >> rmaunion <- union(rmaglist1, union(rmaglist2, union(rmaglist3, >> union(rmaglist4, union(rmaglist5, union(rmaglist6, union(rmaglist7, >> union(rmaglist8, rmaglist9)))))))) >> >> >> >> rmaunionset <- rmafiltset[rmaunion,] >> >> >> >> plotdata2 <- exprs(rmaunionset) >> >> colnames(plotdata2) <- paste(pData(RMASet)$label) >> >> >> >> >> >> I go on to create a heatmap and html output of significant genes. My >> problem is that when I change kOverA(2, 7) to, say, kOverA(2, 6), I > get >> fewer genes, I thought this should increase the number of genes I'm >> seeing in the final output. If I increase from 7 to 8 I still get a >> shorter list. It seems like 7 is the value that generates the maximum >> output. If anyone could explain why this is happening I would really >> appreciate it. If I should post more information please just let me >> know. >> >> >> >> >> >> Ray Roberts >> >> >> >> Lovelace Respiratory Research Institute >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD COMMENT • link 16.8 years ago rgentleman ★ 5.5k

Login before adding your answer.