Surrogate Variable Analysis
1
0
Entering edit mode
Jerry Cholo ▴ 190
@jerry-cholo-6218
Last seen 10.1 years ago
Hello, I would like to remove the batch effects from a gene expression data using Surrogate Variable Analysis (SVA). When I looked at the SVA ( http://www.bioconductor.org/packages/release/bioc/html/sva.html) and "bladderbatch", I noticed that for 57 different samples, there are 5 different batches. May someone let me know how I could define these batches for my own data? In fact, my datasets include the normal, disease, two different tissues, and two different chip arrays? Thanks, Jerry [[alternative HTML version deleted]]
sva sva • 2.5k views
ADD COMMENT
1
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.8 years ago
United States
Hi Jerry, Batch information is often annotated in a data set. If it is not, one way to annotate batches is to identify what time each sample was run and then see if they cluster into distinct groups - which you could call batches. Finally, the surrogate variable analysis approach with the sva() function takes as input the data matrix (normalized) and the corresponding information about the primary variables you care about and attempts to recover the batches from the microarray data themselves. I hope that helps. Jeff On Mon, Mar 17, 2014 at 9:00 PM, Jerry Cholo <jerrycholo@gmail.com> wrote: > Hello, > > I would like to remove the batch effects from a gene expression data using > Surrogate Variable Analysis (SVA). When I looked at the SVA ( > http://www.bioconductor.org/packages/release/bioc/html/sva.html) and > "bladderbatch", I noticed that for 57 different samples, there are 5 > different batches. May someone let me know how I could define these > batches for my own data? In fact, my datasets include the normal, disease, > two different tissues, and two different chip arrays? > > Thanks, > > Jerry > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Jerry, As Jeff mentioned, "Batch information is often annotated in a data set." You mentioned 5 batches, so it seems you know which batch each sample is from. In this case, the function 'removeBatchEffect' in limma package may be helpful. It is not intended to use with linear modelling. For linear modelling, it is better to include the batch factors in the linear model, for example in the following way when your level of batches is large (in your case it's 5, that is >3 ). dupcor <- duplicateCorrelation(data,design,block=batch ) dupcor$consensus.correlation fit <- lmFit( data,design, block=batch , correlation=dupcor$consensus) Hope this help. Di ---- Di Wu Postdoctoral fellow Harvard University, Statistics Department Harvard Medical School Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Jeff Leek [jtleek@gmail.com] Sent: Wednesday, March 19, 2014 2:13 PM To: Jerry Cholo Cc: bioconductor at r-project.org Subject: Re: [BioC] Surrogate Variable Analysis Hi Jerry, Batch information is often annotated in a data set. If it is not, one way to annotate batches is to identify what time each sample was run and then see if they cluster into distinct groups - which you could call batches. Finally, the surrogate variable analysis approach with the sva() function takes as input the data matrix (normalized) and the corresponding information about the primary variables you care about and attempts to recover the batches from the microarray data themselves. I hope that helps. Jeff On Mon, Mar 17, 2014 at 9:00 PM, Jerry Cholo <jerrycholo at="" gmail.com=""> wrote: > Hello, > > I would like to remove the batch effects from a gene expression data using > Surrogate Variable Analysis (SVA). When I looked at the SVA ( > http://www.bioconductor.org/packages/release/bioc/html/sva.html) and > "bladderbatch", I noticed that for 57 different samples, there are 5 > different batches. May someone let me know how I could define these > batches for my own data? In fact, my datasets include the normal, disease, > two different tissues, and two different chip arrays? > > Thanks, > > Jerry > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6