genefilter displaying the expression set
5
0
Entering edit mode
dhaarini s ▴ 70
@dhaarini-s-3305
Last seen 10.2 years ago
Hi all! I am new to R and Bioconductor. I am having a dataset of 22283 genes and 190 samples. Due to the huge size of the data, I want to filter some irrelevant genes. I tried the "genefilter" package of BioC, but then understand that it does gene filtering by simply displaying whether the gene satifies the filter condition or not by marking it as TRUE. This is how I proceeded: > library(genefilter) > f1 <- kOverA(5, 10) > flist <- filterfun(f1) > ans <- genefilter(tumor, flist) (The object "tumor" contains my expression dataset.) The output is something like this: "x" "1007_s_at" TRUE "1053_at" FALSE "117_at" FALSE "121_at" FALSE "200001_at" TRUE "200002_at" TRUE .......................... But, Iwould like to know whether the genefilter will return me an expression set containing the filtered genes and their expression values for the samples. Please help me out! Thanks in advance. Regards, Dhaarini [[alternative HTML version deleted]]
genefilter genefilter • 1.7k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
Hi Dhaarini -- dhaarini s <dhaarini87 at="" gmail.com=""> writes: > Hi all! > I am new to R and Bioconductor. I am having a dataset of 22283 genes and 190 > samples. Due to the huge size of the data, I want to filter some irrelevant > genes. I tried the "genefilter" package of BioC, but then understand that it > does gene filtering by simply displaying whether the gene satifies the > filter condition or not by marking it as TRUE. This is how I proceeded: >> library(genefilter) >> f1 <- kOverA(5, 10) >> flist <- filterfun(f1) >> ans <- genefilter(tumor, flist) > (The object "tumor" contains my expression dataset.) The output is something > like this: > "x" > "1007_s_at" TRUE > "1053_at" FALSE > "117_at" FALSE > "121_at" FALSE > "200001_at" TRUE > "200002_at" TRUE just use the logical vector returned by genefilter to subset tumor, e.g., filteredTumor <- turmor[genefilter(turmor, flist),] You can think of 'tumor' as a matrix, with rows being features and columns samples. You're selecting the rows of 'tumor' that satisfy the filter criteria. Martin > But, Iwould like to know whether the genefilter will return me an expression > set containing the filtered genes and their expression values for the > samples. Please help me out! > Thanks in advance. > Regards, > Dhaarini > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Dharrini, filtered.tumor <- tumor[ans,] Should give you what you want. Best, Jim James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662 >>> dhaarini s <dhaarini87 at="" gmail.com=""> 02/25/09 3:43 AM >>> Hi all! I am new to R and Bioconductor. I am having a dataset of 22283 genes and 190 samples. Due to the huge size of the data, I want to filter some irrelevant genes. I tried the "genefilter" package of BioC, but then understand that it does gene filtering by simply displaying whether the gene satifies the filter condition or not by marking it as TRUE. This is how I proceeded: > library(genefilter) > f1 <- kOverA(5, 10) > flist <- filterfun(f1) > ans <- genefilter(tumor, flist) (The object "tumor" contains my expression dataset.) The output is something like this: "x" "1007_s_at" TRUE "1053_at" FALSE "117_at" FALSE "121_at" FALSE "200001_at" TRUE "200002_at" TRUE .......................... But, Iwould like to know whether the genefilter will return me an expression set containing the filtered genes and their expression values for the samples. Please help me out! Thanks in advance. Regards, Dhaarini [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 5 months ago
United States
Hi Dhaarini, What is in Peder's code but is NOT in the genefilter vignette is what you should do with the output of genefilter(), which is a logical vector the same length as the number of genes. You can use this vector to subset your expression data object like so: > ans <- genefilter(tumor, flist) > sum(ans) # tells you how many genes pass your filter > tumor.filt <- tumor[ans,] # subsetting your expression object by the TRUE/FALSE vector IMO, the vignette, genefilter/doc/howtogenefilter.pdf should also give an example of how to use the output of genefilter() to subset your expression object (hint, hint Biocore Team c/o BioC user list, the maintainer of genefilter). Cheers, Jenny At 06:12 AM 2/25/2009, Peder Worning wrote: >Hi Dhaarini, > >Filtering genes are a delicate but important matter and you can filter >them on low values and variance. Expression values that do not chance >over you samples are not very informative. > >I have made my own function that use the genefilter package that >combines low value, NA's, low variation and range. I use that but I >always try it out with different parameter to see what happens to my >data. >I am working with microRNA arrays and my data are in logscale, but the >principles should be the same. > >Here is the code, be ware of line shifts introduced by outlook: > >Data.filter <- >function(e.matrix,kk=as.integer(ncol(e.matrix)/8),aa=7,na=5,var=0.1,e r=3 >00){ ># This function takes an expression matrix with genes in rows and >samples in columns ># It filter genes out that do not meet the criteria ># kk minimal number of values > aa; na maximun number of NA; var minimal >variation of values; er minimal range of 2^values > e.matrix.f <- e.matrix [genefilter(e.matrix , kOverA(k= kk, A=aa, >na.rm=TRUE)),] > nna <- apply(e.matrix.f,1,function(x){(sumis.na(x)))}) > e.matrix.f <- e.matrix.f[nna<=na,] > rvar <- apply(e.matrix.f,1,function(x){var(x, na.rm = TRUE)}) > e.matrix.f = e.matrix.f[(rvar>=var),] > exp.range <- >apply(e.matrix.f,1,function(x){2**max(x,na.rm=TRUE)-2**min(x,na.rm=TR UE) >}) > e.matrix.f <- e.matrix.f[exp.range>er,] > e.matrix.f >} > >Good luck >Peder > >Best regards > >Exiqon A/S > > Peder Worning, Ph.D. > >Senior Scientist, Biomarker Discovery > >-----Original Message----- >From: bioconductor-bounces at stat.math.ethz.ch >[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of dhaarini s >Sent: Wednesday, February 25, 2009 9:40 AM >To: bioconductor at stat.math.ethz.ch >Subject: [BioC] genefilter displaying the expression set > >Hi all! >I am new to R and Bioconductor. I am having a dataset of 22283 genes and >190 >samples. Due to the huge size of the data, I want to filter some >irrelevant >genes. I tried the "genefilter" package of BioC, but then understand >that it >does gene filtering by simply displaying whether the gene satifies the >filter condition or not by marking it as TRUE. This is how I proceeded: > > library(genefilter) > > f1 <- kOverA(5, 10) > > flist <- filterfun(f1) > > ans <- genefilter(tumor, flist) >(The object "tumor" contains my expression dataset.) The output is >something >like this: >"x" >"1007_s_at" TRUE >"1053_at" FALSE >"117_at" FALSE >"121_at" FALSE >"200001_at" TRUE >"200002_at" TRUE >.......................... >But, Iwould like to know whether the genefilter will return me an >expression >set containing the filtered genes and their expression values for the >samples. Please help me out! >Thanks in advance. >Regards, >Dhaarini > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu
ADD COMMENT
0
Entering edit mode
Peder Worning ▴ 100
@peder-worning-3209
Last seen 10.2 years ago
Hi Dhaarini, Filtering genes are a delicate but important matter and you can filter them on low values and variance. Expression values that do not chance over you samples are not very informative. I have made my own function that use the genefilter package that combines low value, NA's, low variation and range. I use that but I always try it out with different parameter to see what happens to my data. I am working with microRNA arrays and my data are in logscale, but the principles should be the same. Here is the code, be ware of line shifts introduced by outlook: Data.filter <- function(e.matrix,kk=as.integer(ncol(e.matrix)/8),aa=7,na=5,var=0.1,er =3 00){ # This function takes an expression matrix with genes in rows and samples in columns # It filter genes out that do not meet the criteria # kk minimal number of values > aa; na maximun number of NA; var minimal variation of values; er minimal range of 2^values e.matrix.f <- e.matrix [genefilter(e.matrix , kOverA(k= kk, A=aa, na.rm=TRUE)),] nna <- apply(e.matrix.f,1,function(x){(sumis.na(x)))}) e.matrix.f <- e.matrix.f[nna<=na,] rvar <- apply(e.matrix.f,1,function(x){var(x, na.rm = TRUE)}) e.matrix.f = e.matrix.f[(rvar>=var),] exp.range <- apply(e.matrix.f,1,function(x){2**max(x,na.rm=TRUE)-2**min(x,na.rm=TRU E) }) e.matrix.f <- e.matrix.f[exp.range>er,] e.matrix.f } Good luck Peder Best regards Exiqon A/S Peder Worning, Ph.D. Senior Scientist, Biomarker Discovery -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of dhaarini s Sent: Wednesday, February 25, 2009 9:40 AM To: bioconductor at stat.math.ethz.ch Subject: [BioC] genefilter displaying the expression set Hi all! I am new to R and Bioconductor. I am having a dataset of 22283 genes and 190 samples. Due to the huge size of the data, I want to filter some irrelevant genes. I tried the "genefilter" package of BioC, but then understand that it does gene filtering by simply displaying whether the gene satifies the filter condition or not by marking it as TRUE. This is how I proceeded: > library(genefilter) > f1 <- kOverA(5, 10) > flist <- filterfun(f1) > ans <- genefilter(tumor, flist) (The object "tumor" contains my expression dataset.) The output is something like this: "x" "1007_s_at" TRUE "1053_at" FALSE "117_at" FALSE "121_at" FALSE "200001_at" TRUE "200002_at" TRUE .......................... But, Iwould like to know whether the genefilter will return me an expression set containing the filtered genes and their expression values for the samples. Please help me out! Thanks in advance. Regards, Dhaarini [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
dhaarini s ▴ 70
@dhaarini-s-3305
Last seen 10.2 years ago
Hi all! Thank You very much. I am able to sort it out. The kOverA function filters all the genes that exceed the value that we have mentioned. Is there any function which filters genes that have values within the range that we mention? Thanks in advance. Regards, Dhaarini. On Wed, Feb 25, 2009 at 2:10 PM, dhaarini s <dhaarini87@gmail.com> wrote: > Hi all! > I am new to R and Bioconductor. I am having a dataset of 22283 genes and > 190 samples. Due to the huge size of the data, I want to filter some > irrelevant genes. I tried the "genefilter" package of BioC, but then > understand that it does gene filtering by simply displaying whether the gene > satifies the filter condition or not by marking it as TRUE. This is how I > proceeded: > > library(genefilter) > > f1 <- kOverA(5, 10) > > flist <- filterfun(f1) > > ans <- genefilter(tumor, flist) > (The object "tumor" contains my expression dataset.) The output is > something like this: > "x" > "1007_s_at" TRUE > "1053_at" FALSE > "117_at" FALSE > "121_at" FALSE > "200001_at" TRUE > "200002_at" TRUE > .......................... > But, Iwould like to know whether the genefilter will return me an > expression set containing the filtered genes and their expression values for > the samples. Please help me out! > Thanks in advance. > Regards, > Dhaarini > > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6