Steps of removing batch effects and hidden variables

0

Entering edit mode

shirley zhang ★ 1.0k

@shirley-zhang-2038

Last seen 10.3 years ago

Dear List, For high-throughput experiments (mircroarray, RNASeq, etc) with many batches of samples, as a routine procedure, we are suggested to apply Combat, SVA, PCA or PEER method to remove batch effects and hidden variables before any downstream analysis. But in terms of specific steps, I have listed the following 3 methods after normalization. Could anybody tell me which method is the best or other suggestions? Method1: Step1: remove outliers Step2: remove *Batch effects *if we know the exact batches Step3: apply SVA/PCA/PEER to remove *other hidden variables*. Method2: Step1: remove outliers Step2: apply SVA/PCA/PEER to remove *Batch effects and other hidden * variables. Method3: Step1: directly apply SVA/PCA/PEER to remove *outliers, Batch effects and other hidden variables* in one step. Many thanks, Shirley [[alternative HTML version deleted]]

RNASeq Normalization sva RNASeq Normalization sva • 4.7k views

ADD COMMENT • link updated 10.4 years ago by Lucia Peixoto ▴ 330 • written 10.4 years ago by shirley zhang ★ 1.0k

0

Entering edit mode

Lucia Peixoto ▴ 330

@lucia-peixoto-4203

Last seen 10.3 years ago

Hi Shirley, I always have problems with hidden variables, it's the nature of the biology I work with. However, in my experience, there's no such thing as a routine way to remove batch effects. I caution against a "one size fits all" pipeline, every biological question tends to be unique because the have different signal-to-noise ratios. As a general framework I do something like this: - PCA/MDS is always a first step to see what you are dealing with. If your effect size is big enough, likely you will have a reasonable clustering of replicates on the first PC and then you do not necessarily need to do anything, even if you have lots of samples and batches. - Use a method to directly model the unwanted variance (outliers, batch effects, hidden variables, whatever you call it) without removing any samples. Even if your PCA shows you you have outliers, removing outliers will come at a cost of losing power, so I try not to. I rely on the method to subtract the unwanted variance while maintaining (most of) the variance due to the treatment of interest. This of course is labor intensive and requires several iterations of variance removal, checking PCA and p-value plots and benefits from good knowledge of the biology you expect (positive controls). The question on how much variance removal is enough but not too much is specific to the experiment. Which method will work depends on the nature of the batch effect(s) and whether or not they are orthogonal to the treatment of interest. I use RUV ( http://www.bioconductor.org/packages/devel/bioc/html/RUVSeq.html) because RUV performs well even when the batch effect and signal are correlated and you have no idea what the batch effects actually are. What works best for my samples is to rely on the biological replicates to identify unwanted variance (RUVs). You need a reasonable number of replicates to do this and this will not work well when the replicates are very heterogeneous (i.e. cancer). Using PCA from the original expression matrix to model unwanted variance is another way, in this case I believe RUV will give you similar results as SVA, provided the batch effect(s) are not correlated with your treatment. In the end all this will only matter if removal of batch effects will allow you to see gene expression changes that you won't otherwise, that those changes are reproducible in an independent experiment and that they tell you something novel about the underlying biology. I find that doing a pathway analysis on the gene lists before and after batch effect removal can be useful hope this helps cheers Lucia On Fri, Aug 1, 2014 at 8:34 AM, shirley zhang <shirley0818@gmail.com> wrote: > Dear List, > > For high-throughput experiments (mircroarray, RNASeq, etc) with many > batches of samples, as a routine procedure, we are suggested to apply > Combat, SVA, PCA or PEER method to remove batch effects and hidden > variables before any downstream analysis. But in terms of specific steps, I > have listed the following 3 methods after normalization. Could anybody tell > me which method is the best or other suggestions? > > Method1: > Step1: remove outliers > Step2: remove *Batch effects *if we know the exact batches > Step3: apply SVA/PCA/PEER to remove *other hidden variables*. > > Method2: > Step1: remove outliers > Step2: apply SVA/PCA/PEER to remove *Batch effects and other hidden * > variables. > > Method3: > Step1: directly apply SVA/PCA/PEER to remove *outliers, Batch effects > and other hidden variables* in one step. > > Many thanks, > Shirley > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Lucia Peixoto PhD Postdoctoral Research Fellow Laboratory of Dr. Ted Abel Department of Biology School of Arts and Sciences University of Pennsylvania "Think boldly, don't be afraid of making mistakes, don't miss small details, keep your eyes open, and be modest in everything except your aims." Albert Szent-Gyorgyi [[alternative HTML version deleted]]

ADD COMMENT • link 10.4 years ago Lucia Peixoto ▴ 330

Login before adding your answer.