Dear List,
For high-throughput experiments (mircroarray, RNASeq, etc) with many
batches of samples, as a routine procedure, we are suggested to apply
Combat, SVA, PCA or PEER method to remove batch effects and hidden
variables before any downstream analysis. But in terms of specific
steps, I
have listed the following 3 methods after normalization. Could anybody
tell
me which method is the best or other suggestions?
Method1:
Step1: remove outliers
Step2: remove *Batch effects *if we know the exact batches
Step3: apply SVA/PCA/PEER to remove *other hidden variables*.
Method2:
Step1: remove outliers
Step2: apply SVA/PCA/PEER to remove *Batch effects and other
hidden *
variables.
Method3:
Step1: directly apply SVA/PCA/PEER to remove *outliers, Batch
effects
and other hidden variables* in one step.
Many thanks,
Shirley
[[alternative HTML version deleted]]
Hi Shirley,
I always have problems with hidden variables, it's the nature of the
biology I work with. However, in my experience, there's no such thing
as a
routine way to remove batch effects. I caution against a "one size
fits
all" pipeline, every biological question tends to be unique because
the
have different signal-to-noise ratios. As a general framework I do
something like this:
- PCA/MDS is always a first step to see what you are dealing with. If
your
effect size is big enough, likely you will have a reasonable
clustering of
replicates on the first PC and then you do not necessarily need to do
anything, even if you have lots of samples and batches.
- Use a method to directly model the unwanted variance (outliers,
batch
effects, hidden variables, whatever you call it) without removing any
samples. Even if your PCA shows you you have outliers, removing
outliers
will come at a cost of losing power, so I try not to. I rely on the
method
to subtract the unwanted variance while maintaining (most of) the
variance due to the treatment of interest. This of course is labor
intensive and requires several iterations of variance removal,
checking PCA
and p-value plots and benefits from good knowledge of the biology you
expect (positive controls). The question on how much variance removal
is
enough but not too much is specific to the experiment.
Which method will work depends on the nature of the batch effect(s)
and
whether or not they are orthogonal to the treatment of interest. I use
RUV (
http://www.bioconductor.org/packages/devel/bioc/html/RUVSeq.html)
because
RUV performs well even when the batch effect and signal are correlated
and
you have no idea what the batch effects actually are. What works best
for
my samples is to rely on the biological replicates to identify
unwanted
variance (RUVs). You need a reasonable number of replicates to do this
and
this will not work well when the replicates are very heterogeneous
(i.e.
cancer). Using PCA from the original expression matrix to model
unwanted
variance is another way, in this case I believe RUV will give you
similar
results as SVA, provided the batch effect(s) are not correlated with
your
treatment.
In the end all this will only matter if removal of batch effects will
allow
you to see gene expression changes that you won't otherwise, that
those
changes are reproducible in an independent experiment and that they
tell
you something novel about the underlying biology.
I find that doing a pathway analysis on the gene lists before and
after
batch effect removal can be useful
hope this helps
cheers
Lucia
On Fri, Aug 1, 2014 at 8:34 AM, shirley zhang <shirley0818@gmail.com>
wrote:
> Dear List,
>
> For high-throughput experiments (mircroarray, RNASeq, etc) with many
> batches of samples, as a routine procedure, we are suggested to
apply
> Combat, SVA, PCA or PEER method to remove batch effects and hidden
> variables before any downstream analysis. But in terms of specific
steps, I
> have listed the following 3 methods after normalization. Could
anybody tell
> me which method is the best or other suggestions?
>
> Method1:
> Step1: remove outliers
> Step2: remove *Batch effects *if we know the exact batches
> Step3: apply SVA/PCA/PEER to remove *other hidden variables*.
>
> Method2:
> Step1: remove outliers
> Step2: apply SVA/PCA/PEER to remove *Batch effects and other
hidden *
> variables.
>
> Method3:
> Step1: directly apply SVA/PCA/PEER to remove *outliers, Batch
effects
> and other hidden variables* in one step.
>
> Many thanks,
> Shirley
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Lucia Peixoto PhD
Postdoctoral Research Fellow
Laboratory of Dr. Ted Abel
Department of Biology
School of Arts and Sciences
University of Pennsylvania
"Think boldly, don't be afraid of making mistakes, don't miss small
details, keep your eyes open, and be modest in everything except your
aims."
Albert Szent-Gyorgyi
[[alternative HTML version deleted]]