[R] Select single probe-set with median expression from multiple probe-sets corresponding to same gene -AFFY

0

Entering edit mode

Atul Kakrana ▴ 30

@atul-kakrana-5871

Last seen 10.6 years ago

Hello Martin and All, I think I was not clear with my question and therefore would like to rephrase it. I am analyzing Affymetrix data and one thing I need to do is that select one probe-set if there are multiple probe-set for same gene and the criteria I need to use it to select the probe set with highest median expression across all the samples. So, if there are 5 probe-sets corresponding to same gene than I need to select the one with highest median expression across all samples to represent the expression of that gene. As I am trying to change from probe-set level to gene level analysis I was hoping that there must be some function already to do this in 'affy' or 'limma'. @Martin: I think you suggested me the right solution even when I was not clear with my question. Could you please confirm that? Also, wouldn't it be better to perform this step after bg correction, normalization? I am very confused at this moment. mydata <- ReadAffy() pData(mydata)<- read.table("phenodata",head = T,row.names=1,sep = '\t') esetRMA <- rma(mydata) >>>Perform probe set reduction here>>> I would really appreciate your suggestions on how and where I can select the probe-set with higest median expression across all the samples. Thanks AK On 03-Apr-13 11:34 PM, Martin Morgan wrote: > On 04/03/2013 03:17 PM, Atul Kakrana wrote: >> Hello All, >> >> I need your help. I am analysing affymetrix data and have to select the >> probe-set that has median expression among all the probe-sets for same >> gene. This way I want to remove the redundancy by keeping the analysis >> to single gene entry level. I am fully aware that it is not a nice thing >> to do but I just have to do it. >> >> To do so, I came across 'findLargest' function of 'genefilter' package >> but it's not well documented; and I do not know how to implement the >> 'findLargest' function. At this point I have: >> esetRMA <- rma(mydata) >> >> Could anybody guide me on how can I select single probeset with median >> expression from multiple probe-sets corresponding to single gene and >> discard others? Is there any other way to achieve so i.e. other than >> using 'genefilter'? >> >> Genefilter package: >> http://www.bioconductor.org/packages/2.11/bioc/html/genefilter.html > > Hi Atul --It's a Bioconductor package, so might as well ask instead on > the Bioconductor mailing list > > http://bioconductor.org/help/mailing-list/ > > As a reproducible example, load the "ALL" sample ExpressionSet, > Biobase and genefilter packates > > library(Biobase) > library(ALL) > library(genefilter) > > The three arguments to findLargest are the names of the probe sets > > featureNames(ALL) > > the test statistic > > rowMedians(ALL) > > and the chip from which the ExpressionSet is based > > annotation(ALL) > > So the variable > > idx = findLargest(featureNames(ALL), rowMedians(ALL), annotation(ALL) > > identifies the probes and > > ALL1 = ALL[idx,] > > gets you the data you're interested in. > > Again, follow-up questions should go to the Bioconductor mailing list. > > Martin > > >> >> Thanks >> >> AK >> > > -- Atul Kakrana DBI, Delaware Technology Park

GO probe Biobase genefilter GO probe Biobase genefilter • 2.7k views

ADD COMMENT • link updated 12.1 years ago by Martin Morgan 25k • written 12.1 years ago by Atul Kakrana ▴ 30

0

Entering edit mode

Mete Civelek ▴ 180

@mete-civelek-4566

Last seen 10.6 years ago

Hi Atul, Not sure if there is a function in affy or limma for what you are trying to do but this might help:http://www.inside-r.org/packages/cran/WGCNA/docs/collapseRows Mete On Apr 3, 2013, at 9:17 PM, Atul Kakrana <atulkakrana@gmail.com<mailto:atulkakrana@gmail.com>> wrote: Hello Martin and All, I think I was not clear with my question and therefore would like to rephrase it. I am analyzing Affymetrix data and one thing I need to do is that select one probe-set if there are multiple probe-set for same gene and the criteria I need to use it to select the probe set with highest median expression across all the samples. So, if there are 5 probe-sets corresponding to same gene than I need to select the one with highest median expression across all samples to represent the expression of that gene. As I am trying to change from probe-set level to gene level analysis I was hoping that there must be some function already to do this in 'affy' or 'limma'. @Martin: I think you suggested me the right solution even when I was not clear with my question. Could you please confirm that? Also, wouldn't it be better to perform this step after bg correction, normalization? I am very confused at this moment. mydata <- ReadAffy() pData(mydata)<- read.table("phenodata",head = T,row.names=1,sep = '\t') esetRMA <- rma(mydata) Perform probe set reduction here>>> I would really appreciate your suggestions on how and where I can select the probe-set with higest median expression across all the samples. Thanks AK On 03-Apr-13 11:34 PM, Martin Morgan wrote: On 04/03/2013 03:17 PM, Atul Kakrana wrote: Hello All, I need your help. I am analysing affymetrix data and have to select the probe-set that has median expression among all the probe-sets for same gene. This way I want to remove the redundancy by keeping the analysis to single gene entry level. I am fully aware that it is not a nice thing to do but I just have to do it. To do so, I came across 'findLargest' function of 'genefilter' package but it's not well documented; and I do not know how to implement the 'findLargest' function. At this point I have: esetRMA <- rma(mydata) Could anybody guide me on how can I select single probeset with median expression from multiple probe-sets corresponding to single gene and discard others? Is there any other way to achieve so i.e. other than using 'genefilter'? Genefilter package: http://www.bioconductor.org/packages/2.11/bioc/html/genefilter.html Hi Atul --It's a Bioconductor package, so might as well ask instead on the Bioconductor mailing list http://bioconductor.org/help/mailing-list/ As a reproducible example, load the "ALL" sample ExpressionSet, Biobase and genefilter packates library(Biobase) library(ALL) library(genefilter) The three arguments to findLargest are the names of the probe sets featureNames(ALL) the test statistic rowMedians(ALL) and the chip from which the ExpressionSet is based annotation(ALL) So the variable idx = findLargest(featureNames(ALL), rowMedians(ALL), annotation(ALL) identifies the probes and ALL1 = ALL[idx,] gets you the data you're interested in. Again, follow-up questions should go to the Bioconductor mailing list. Martin Thanks AK -- Atul Kakrana DBI, Delaware Technology Park _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ________________________________ IMPORTANT WARNING: This email (and any attachments) is o...{{dropped:12}}

ADD COMMENT • link 12.1 years ago Mete Civelek ▴ 180

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 12 weeks ago

United States

On 04/03/2013 09:17 PM, Atul Kakrana wrote: > Hello Martin and All, > > I think I was not clear with my question and therefore would like to rephrase > it. I am analyzing Affymetrix data and one thing I need to do is that select one > probe-set if there are multiple probe-set for same gene and the criteria I need > to use it to select the probe set with highest median expression across all the > samples. > > So, if there are 5 probe-sets corresponding to same gene than I need to select > the one with highest median expression across all samples to represent the > expression of that gene. As I am trying to change from probe-set level to gene > level analysis I was hoping that there must be some function already to do this > in 'affy' or 'limma'. > > @Martin: I think you suggested me the right solution even when I was not clear > with my question. Could you please confirm that? Also, wouldn't it be better to > perform this step after bg correction, normalization? I am very confused at this > moment. yes, I did suggest the solution you were looking for. Generally, you'd like to do these sorts of manipulations after normalization, etc. Martin > > mydata <- ReadAffy() > pData(mydata)<- read.table("phenodata",head = T,row.names=1,sep = '\t') > esetRMA <- rma(mydata) > > >>>Perform probe set reduction here>>> > > I would really appreciate your suggestions on how and where I can select the > probe-set with higest median expression across all the samples. > > Thanks > > AK > > > > > > > On 03-Apr-13 11:34 PM, Martin Morgan wrote: >> On 04/03/2013 03:17 PM, Atul Kakrana wrote: >>> Hello All, >>> >>> I need your help. I am analysing affymetrix data and have to select the >>> probe-set that has median expression among all the probe-sets for same >>> gene. This way I want to remove the redundancy by keeping the analysis >>> to single gene entry level. I am fully aware that it is not a nice thing >>> to do but I just have to do it. >>> >>> To do so, I came across 'findLargest' function of 'genefilter' package >>> but it's not well documented; and I do not know how to implement the >>> 'findLargest' function. At this point I have: >>> esetRMA <- rma(mydata) >>> >>> Could anybody guide me on how can I select single probeset with median >>> expression from multiple probe-sets corresponding to single gene and >>> discard others? Is there any other way to achieve so i.e. other than >>> using 'genefilter'? >>> >>> Genefilter package: >>> http://www.bioconductor.org/packages/2.11/bioc/html/genefilter.html >> >> Hi Atul --It's a Bioconductor package, so might as well ask instead on the >> Bioconductor mailing list >> >> http://bioconductor.org/help/mailing-list/ >> >> As a reproducible example, load the "ALL" sample ExpressionSet, Biobase and >> genefilter packates >> >> library(Biobase) >> library(ALL) >> library(genefilter) >> >> The three arguments to findLargest are the names of the probe sets >> >> featureNames(ALL) >> >> the test statistic >> >> rowMedians(ALL) >> >> and the chip from which the ExpressionSet is based >> >> annotation(ALL) >> >> So the variable >> >> idx = findLargest(featureNames(ALL), rowMedians(ALL), annotation(ALL) >> >> identifies the probes and >> >> ALL1 = ALL[idx,] >> >> gets you the data you're interested in. >> >> Again, follow-up questions should go to the Bioconductor mailing list. >> >> Martin >> >> >>> >>> Thanks >>> >>> AK >>> >> >> > > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 12.1 years ago Martin Morgan 25k

Login before adding your answer.