Importing RMA processed data into limma for eBayes

0

Entering edit mode

Matthew Thornton ▴ 380

@matthew-thornton-5564

Last seen 14 days ago

USA, Los Angeles, USC

Dear List Patrons, I am processing some microarray data obtained with Affymetrix Mouse Gene 2.0 ST chips. Using the xps package, I have performed step-wise RMA processing - Background correction, Quantiles Normalization, and Median Polish summarization. The output of xps is a tab-delimited text file with 33,960 non-log base 2 transformed expression measures, the UnitName (an 8 digit number starting at 17550607) and a Unit ID (starts at 0 but isn't continuous). What I would like to do is import this data into the limma package, from my output text file. I know that I can make an expressionset and use this expressionset directly after making a design matrix, via the instructions here: http://permalink.gmane.org/gmane.science.biology.informatics.conductor /50591 I would like to add the gene names into the expressionset so that I can make volcano plots and graphs with limma, in accord with the limma users guide. I know how to access the gene names and relate them to the UnitName code copied from here: (http://biowhat.ucsd.edu/homer/basicTutorial/affymetrix.html) library(mogene20sttranscriptcluster.db) librarypd.mogene.2.0.st) mat1 <- as.matrix(exprs_mp) post_rma <- ExpressionSet(assayData=mat1, annotation="mogene.2.0.st") my_frame <- data.frame(exprs(post_rma)) Annot <- data.frame(ACCNUM=sapply(contents(mogene20sttranscriptclusterACCNUM), paste, collapse=", "), SYMBOL=sapply(contents(mogene20sttranscriptclusterSYMBOL), paste, collapse=", "), DESC=sapply(contents(mogene20sttranscriptclusterGENENAME), paste, collapse=", ")) all <- merge(Annot, my_frame, by.x=0, by.y=0, all=T) write.table(all,file="21Dec13_data.ann.txt",sep="\t") Is there a way to modify the above code so that I can add the gene names and gene information to my expresionset, in a way that makes them AS accessible to the limma package as they would be if I had used the affy package to process the data initially? Also, I would like to compare the output of VSN with RMA using eBayes in limma. I don't wish to import the raw intensities into limma, not because I couldn't use the same methods (vsn is in limma), only because I would have to start my data processing over. Any help or comments is greatly appreciated. Thank you. Sincerely, Matt Matthew E. Thornton Research Lab Specialist Saban Research Institute USC/Children?s Hospital Los Angeles 513X, Mail Stop 35 4661 W. Sunset Blvd. Los Angeles, CA 90027-6020 matthew.thornton at med.usc.edu

Microarray Normalization affy vsn limma PROcess xps Microarray Normalization affy vsn • 2.5k views

ADD COMMENT • link 11.3 years ago Matthew Thornton ▴ 380

0

Entering edit mode

Matthew Thornton ▴ 380

@matthew-thornton-5564

Last seen 14 days ago

USA, Los Angeles, USC

Dear List Patrons, I am processing some microarray data obtained with Affymetrix Mouse Gene 2.0 ST chips. Using the xps package, I have performed step-wise RMA processing - Background correction, Quantiles Normalization, and Median Polish summarization. The output of xps is a tab-delimited text file with 33,960 non-log base 2 transformed expression measures, the UnitName (an 8 digit number starting at 17550607) and a Unit ID (starts at 0 but isn't continuous). What I would like to do is import this data into the limma package, from my output text file. I know that I can make an expressionset and use this expressionset directly after making a design matrix, via the instructions here: http://permalink.gmane.org/gmane.science.biology.informatics.conductor /50591 I would like to add the gene names into the expressionset so that I can make volcano plots and graphs with limma, in accord with the limma users guide. I know how to access the gene names and relate them to the UnitName code copied from here: (http://biowhat.ucsd.edu/homer/basicTutorial/affymetrix.html) library(mogene20sttranscriptcluster.db) librarypd.mogene.2.0.st) mat1 <- as.matrix(exprs_mp) post_rma <- ExpressionSet(assayData=mat1, annotation="mogene.2.0.st") my_frame <- data.frame(exprs(post_rma)) Annot <- data.frame(ACCNUM=sapply(contents(mogene20sttranscriptclusterACCNUM), paste, collapse=", "), SYMBOL=sapply(contents(mogene20sttranscriptclusterSYMBOL), paste, collapse=", "), DESC=sapply(contents(mogene20sttranscriptclusterGENENAME), paste, collapse=", ")) all <- merge(Annot, my_frame, by.x=0, by.y=0, all=T) write.table(all,file="21Dec13_data.ann.txt",sep="\t") Is there a way to modify the above code so that I can add the gene names and gene information to my expresionset, in a way that makes them AS accessible to the limma package as they would be if I had used the affy package to process the data initially? Also, I would like to compare the output of VSN with RMA using eBayes in limma. I don't wish to import the raw intensities into limma, not because I couldn't use the same methods (vsn is in limma), only because I would have to start my data processing over. Any help or comments is greatly appreciated. Thank you. Sincerely, Matt Matthew E. Thornton Research Lab Specialist Saban Research Institute USC/Children?s Hospital Los Angeles 513X, Mail Stop 35 4661 W. Sunset Blvd. Los Angeles, CA 90027-6020 matthew.thornton at med.usc.edu ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of bioconductor-request@r-project.org [bioconductor-request@r-project.org] Sent: Monday, December 23, 2013 3:00 AM To: bioconductor at r-project.org Subject: Bioconductor Digest, Vol 130, Issue 25 Send Bioconductor mailing list submissions to bioconductor at r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/bioconductor or, via email, send a message with subject or body 'help' to bioconductor-request at r-project.org You can reach the person managing the list at bioconductor-owner at r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioconductor digest..." Today's Topics: 1. Re: Complete variant toolbox: gmapR/VariantTools/VariantAnnotation (Thomas Girke) 2. DESeq / DESeq2 for translation efficiency in polysome profiling or ribsome profiling experiments (Gilgi [guest]) 3. forgeSeqFiles (Melo [guest]) 4. Multiple contrasts in DESeq2 (Hugo Varet) ---------------------------------------------------------------------- Message: 1 Date: Sun, 22 Dec 2013 09:36:40 -0800 From: Thomas Girke <thomas.girke@ucr.edu> To: Valerie Obenchain <vobencha at="" fhcrc.org=""> Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> Subject: Re: [BioC] Complete variant toolbox: gmapR/VariantTools/VariantAnnotation Message-ID: <20131222173640.GA4345 at Thomas-Girkes-MacBook-Pro.local> Content-Type: text/plain; charset="us-ascii" Sorry for not responding sooner. - I agree, providing the variant mapping position relative to transcripts will be the most useful one, at least for features that are part of transcripts. For others that are not, e.g. intron and intergenic features, one could report the variant mappings relative to the start position of these features. Thomas On Thu, Dec 19, 2013 at 03:08:19AM +0000, Valerie Obenchain wrote: > Hi, > > On 12/17/2013 09:40 AM, Robert Castelo wrote: > > hi Valerie cc Thomas, > > > > sorry for hijacking the thread, regarding the request made below.. > > > > On 12/09/2013 09:07 PM, Valerie Obenchain wrote: > > [...] > >> I could add a 'REFLOC' column to the otuput of locateVariants() that > >> would essentially be the "equivalent" to 'CDSLOC' from predictCoding(). > > > > for the purpose of ordering cDNA primers flanking variants which one may > > want to validate through sanger sequencing, it is useful to have at hand > > the position of the variant with respect to the beginning of the > > transcript (cDNA) where it has been observed, thus not just from the > > beginning of the CDS but from the beginning of the transcript. > > > > is this newer 'REFLOC' going to contain this position? if not, would it > > be possible to get also a column for that from the locateVariants() > > call? (e.g., TXLOC) > > Yes, I think it makes sense to have 'REFLOC' be the position in the > reference starting from the beginning of the transcript. Unless others > have different thoughts this is what I'll go ahead with. > > Valerie > > > > > > > > > > > thanks!! > > robert. > > > -- > Valerie Obenchain > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B155 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: vobencha at fhcrc.org > Phone: (206) 667-3158 > Fax: (206) 667-1319 ------------------------------ Message: 2 Date: Sun, 22 Dec 2013 23:37:06 -0800 (PST) From: "Gilgi [guest]" <guest@bioconductor.org> To: bioconductor at r-project.org, gilgi.friedlander at weizmann.ac.il Cc: DESeq Maintainer <sanders at="" fs.tum.de=""> Subject: [BioC] DESeq / DESeq2 for translation efficiency in polysome profiling or ribsome profiling experiments Message-ID: <20131223073706.DA826143685 at mamba.fhcrc.org> Hello, I wanted to ask whether it is possible to use DESeq for the analysis of translation efficiency in polysome profiling or ribsome profiling. In polysome profiling - polysomes are isolated (e.g. by sucrose gradients), and mRNA that were associated with the polysomes are then isolated and sequenced. In ribosome profiling, mRNA fragments that are protected by the ribosome are sequenced. When such experiments are conducted in parallel to total RNA-seq, one can ask questions regarding the translation efficiency. For example: if we have 4 samples: ribosome sequencing (without treatment), total sequencing (without treatment), sequencing (with treatment), total sequencing (with treatment), one can ask how does the treatment affects the protein synthesis rate (which is the ratio ribosome/total). I wanted to ask whether it is possible to define such a model in DESeq for such analysis, in order to answer how is this ratio affected by the treatment. Thanks a lot -- output of sessionInfo(): none -- Sent via the guest posting facility at bioconductor.org. ------------------------------ Message: 3 Date: Sun, 22 Dec 2013 23:47:30 -0800 (PST) From: "Melo [guest]" <guest@bioconductor.org> To: bioconductor at r-project.org, memoly101 at yahoo.com Subject: [BioC] forgeSeqFiles Message-ID: <20131223074730.1C4F514369A at mamba.fhcrc.org> Hello everyone, I am so new to R as well as Bioconductor but found them very helpful so I'm trying to use. Now I need to make a BSgenome pckg for my own organism. Therefore I have made a folder named seqs_srcdir which contains 17,000 files (one gene_sequence per file), but when I downloaded, and even unzipped BSgenome I kept getting the Error: could not find function "forgeSeqFiles". I know I need to make the seed files so I gave the command line: forgeSeqlengthsFile(seqnames, prefix="pi1>", suffix=".fa", seqs_srcdir="/Users/Me/Documents/Microarray", seqs_destdir="/Users/Me/Documents/Microarray/Seeds", verbose=TRUE) Error: could not find function "forgeSeqFiles". I would appreciate it if you could please advice, Melo -- output of sessionInfo(): forgeSeqlengthsFile(seqnames, prefix="pi1>", suffix=".fa", seqs_srcdir="/Users/Me/Documents/Microarray", seqs_destdir="/Users/Me/Documents/Microarray/Seeds", verbose=TRUE) Error: could not find function "forgeSeqFiles". -- Sent via the guest posting facility at bioconductor.org. ------------------------------ Message: 4 Date: Mon, 23 Dec 2013 11:10:40 +0100 From: Hugo Varet <hugo.varet@pasteur.fr> To: bioconductor at r-project.org Subject: [BioC] Multiple contrasts in DESeq2 Message-ID: <52B80C20.5020009 at pasteur.fr> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Dear Michael Love and list members, I'm used to analyze RNA-seq data with your nice DESeq2 package (1.2.0) and I sometimes have to test complex null hypotheses using contrasts. For one of my project, I would like using multiple contrasts, i.e. testing for example (H0: beta1-beta2 = beta3-beta4 = 0) against (H1: beta1-beta2 != 0 and/or beta3-beta4 != 0), where the beta's are the regression coefficients of the models. To do so, I would like using a numeric matrix for the contrast argument in results(), but it only accepts a numeric vector. Hence, I was wondering if you planned to extend DESeq2 to the use of multiple contrasts? Best regards, Hugo -- Hugo Varet Plate-Forme Transcriptome et Epig?nome Institut Pasteur ------------------------------ _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor End of Bioconductor Digest, Vol 130, Issue 25 *********************************************

ADD COMMENT • link 11.3 years ago Matthew Thornton ▴ 380

Login before adding your answer.