Summarizing two-channel data (RGList, MAList) for limma analysis

0

Entering edit mode

Stephen Turner ▴ 290

@stephen-turner-4916

Last seen 6.7 years ago

United States

Hello. I have 4 Agilent two-channel arrays that I read in using read.maimages(). I've done normalization and background subtraction. How do I now summarize the probe information (62976 probes) to gene-level expression values (39430 entrez RNAs, 16251 lincRNAs). I normally did this using rma() or gcrma() from the affy package when I have Affymetrix data. Thanks, Stephen [[alternative HTML version deleted]]

Normalization probe affy Normalization probe affy • 1.9k views

ADD COMMENT • link updated 13.2 years ago by Daniel Aaen Hansen ▴ 90 • written 13.2 years ago by Stephen Turner ▴ 290

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 7 weeks ago

United States

Hi, Stephen. You can use an average value, but for long-oligo arrays like Agilent, folks have often used the probe measurements directly. You can use the genefilter package to remove probes that do not vary across samples to reduce some of the redundancy; this increases power to detect differential expression by reducing the number of tests that must be included in the multiple-testing-correction. If you feel a strong need to summarize, using an average is probably not too bad an approach assuming that the probes for the same gene are correlated with each other (and many will be). Sean On Wed, Jan 11, 2012 at 4:45 PM, Stephen Turner <vustephen at="" gmail.com=""> wrote: > Hello. > > I have 4 Agilent two-channel arrays that I read in using read.maimages(). > I've done normalization and background subtraction. How do I now summarize > the probe information (62976 probes) to gene-level expression values (39430 > entrez RNAs, 16251 lincRNAs). I normally did this using rma() or gcrma() > from the affy package when I have Affymetrix data. > > Thanks, > > Stephen > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 13.2 years ago Sean Davis 21k

0

Entering edit mode

Samuel Wuest ▴ 330

@samuel-wuest-2821

Last seen 10.6 years ago

Hi Stephen, one option is to simply average multiple probes matching to the same transcript, using the avereps()-function (from the limma package). It takes an ID-argument, where you specify the transcipt (or maybe even "locus")-names, so that probes with the same IDs will be averaged. If you have duplicate probes (technical replicates, that is the same probe sequences etc), then avedups will do the job. There might be other options too? As far as I know, rma/gcrma for Affy arrays are wrappers for a combination of functions, that includes background correction, quantile normalization and median polish summarization of the 11 probes, and as you have done the first two steps with your arrays, that is no longer necessary (plus I guess you do often have only 1-2 probes per transcript anyway, so there would not be a "median polish" option for that anyway). Hope this helps, best Sam On 11 January 2012 21:45, Stephen Turner <vustephen@gmail.com> wrote: > Hello. > > I have 4 Agilent two-channel arrays that I read in using read.maimages(). > I've done normalization and background subtraction. How do I now summarize > the probe information (62976 probes) to gene-level expression values (39430 > entrez RNAs, 16251 lincRNAs). I normally did this using rma() or gcrma() > from the affy package when I have Affymetrix data. > > Thanks, > > Stephen > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- ----------------------------------------------------- Samuel Wuest Smurfit Institute of Genetics Trinity College Dublin Dublin 2, Ireland Phone: +353-1-896 2444 Web: http://www.tcd.ie/Genetics/wellmer-2/index.html Email: wuests@tcd.ie ------------------------------------------------------ [[alternative HTML version deleted]]

ADD COMMENT • link 13.2 years ago Samuel Wuest ▴ 330

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 6 weeks ago

EMBL European Molecular Biology Laborat…

Dear Stephen Hasn't the array vendor provided you already with some guidance on this? If there are multiple probes with different sequences supposedly targeting the same gene, I think you need assess (in some automated way) the alignment of the probes to the genome and to the gene model in order to see which of them is the 'better' one. Best wishes Wolfgang On 1/11/12 10:45 PM, Stephen Turner wrote: > Hello. > > I have 4 Agilent two-channel arrays that I read in using read.maimages(). > I've done normalization and background subtraction. How do I now summarize > the probe information (62976 probes) to gene-level expression values (39430 > entrez RNAs, 16251 lincRNAs). I normally did this using rma() or gcrma() > from the affy package when I have Affymetrix data. > > Thanks, > > Stephen > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 13.2 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Daniel Aaen Hansen ▴ 90

@daniel-aaen-hansen-5052

Last seen 23 months ago

Denmark

Dear Stephen, I have a similar situation and my approach has been to use avereps() from the limma package to average repeated probes. Then I use the biomaRt package to retrieve Ensembl's mapping for the probes since this is being updated with every new release of Ensembl. That should ensure a more up-to-date mapping to gene level. If you need one expression value for each gene you could then take the median or mean of repeated genes. If you want to convert the MA values back to RG values you can use the RG.MA() function from the limma package. That depends on how you are going to use your data. However, I think the most common is to use the MA values for downstream analysis, but I would also like to hear other opinions on this. Best, Daniel On Jan 11, 2012, at 10:45 PM, Stephen Turner wrote: > Hello. > > I have 4 Agilent two-channel arrays that I read in using read.maimages(). > I've done normalization and background subtraction. How do I now summarize > the probe information (62976 probes) to gene-level expression values (39430 > entrez RNAs, 16251 lincRNAs). I normally did this using rma() or gcrma() > from the affy package when I have Affymetrix data. > > Thanks, > > Stephen > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 13.2 years ago Daniel Aaen Hansen ▴ 90

Login before adding your answer.