Hi,
The lab I work with has used "whole genome" human arrays (~18,000
genes) for a couple years and I have helped with the analysis using
Limma. Now, due to costs, they are now considering switching from
whole genome arrays to focused arrays with ~400 genes of interest
(selected from the whole-genome array results).
The obvious analysis problems with a focused array where most genes
are
changing are:
1. LOESS normalization assumes most genes are not changing. If most
of
the genes are expected to change, there is no basis to recenter the
data around zero. The response from the lab was that they would be
willing to include 100-150 genes that are not expected to change.
2. The B-statistic in Limma requires a parameter indicating a certain
fraction of genes are changing. The corresponding moderated
t-statistic uses the data from all genes to moderate the standard
error
in the t calculation. Both of these could change dramatically if most
of the genes on the array are changing.
My questions are:
1. Are my concerns valid and are there ways around around them? Are
there other analysis pitfalls with this scenario?
2. Can Limma handle situations where most of an array is expected to
change? What modifications, if any, need to be made to the Limma
analysis to account for this?
3. Alternatively, is there a more appropriate statistical package to
use in this case?
Thanks.
--
Mike
I would've rephrased the problem differently:
Given that you can't depend on the "typical" assumption of
"zero-expression", what features should you design in for
comparability?
The idea of housekeeping genes seems sensible in theory -- in
practice, I'm not sure how to protect from inadvertent "discovery".
best,
-tony
On 6/7/05, Mike Schaffer <mschaff@bu.edu> wrote:
> Hi,
>
> The lab I work with has used "whole genome" human arrays (~18,000
> genes) for a couple years and I have helped with the analysis using
> Limma. Now, due to costs, they are now considering switching from
> whole genome arrays to focused arrays with ~400 genes of interest
> (selected from the whole-genome array results).
>
> The obvious analysis problems with a focused array where most genes
are
> changing are:
>
> 1. LOESS normalization assumes most genes are not changing. If most
of
> the genes are expected to change, there is no basis to recenter the
> data around zero. The response from the lab was that they would be
> willing to include 100-150 genes that are not expected to change.
>
> 2. The B-statistic in Limma requires a parameter indicating a
certain
> fraction of genes are changing. The corresponding moderated
> t-statistic uses the data from all genes to moderate the standard
error
> in the t calculation. Both of these could change dramatically if
most
> of the genes on the array are changing.
>
>
> My questions are:
>
> 1. Are my concerns valid and are there ways around around them? Are
> there other analysis pitfalls with this scenario?
>
> 2. Can Limma handle situations where most of an array is expected to
> change? What modifications, if any, need to be made to the Limma
> analysis to account for this?
>
> 3. Alternatively, is there a more appropriate statistical package to
> use in this case?
>
>
> Thanks.
>
> --
> Mike
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
--
best,
-tony
"Commit early,commit often, and commit in a repository from which we
can easily
roll-back your mistakes" (AJR, 4Jan05).
A.J. Rossini
blindglobe@gmail.com
>Date: Tue, 7 Jun 2005 09:33:51 -0400
>From: Mike Schaffer <mschaff@bu.edu>
>Subject: [BioC] Limma analysis of focused arrays vs. whole genome
> arrays
>To: bioconductor@stat.math.ethz.ch
>
>Hi,
>
>The lab I work with has used "whole genome" human arrays (~18,000
>genes) for a couple years and I have helped with the analysis using
>Limma. Now, due to costs, they are now considering switching from
>whole genome arrays to focused arrays with ~400 genes of interest
>(selected from the whole-genome array results).
>
>The obvious analysis problems with a focused array where most genes
are
>changing are:
>
>1. LOESS normalization assumes most genes are not changing. If most
of
>the genes are expected to change, there is no basis to recenter the
>data around zero. The response from the lab was that they would be
>willing to include 100-150 genes that are not expected to change.
>
>2. The B-statistic in Limma requires a parameter indicating a certain
>fraction of genes are changing. The corresponding moderated
>t-statistic uses the data from all genes to moderate the standard
error
>in the t calculation. Both of these could change dramatically if
most
>of the genes on the array are changing.
>
>
>My questions are:
>
>1. Are my concerns valid and are there ways around around them? Are
>there other analysis pitfalls with this scenario?
>
>2. Can Limma handle situations where most of an array is expected to
>change? What modifications, if any, need to be made to the Limma
>analysis to account for this?
To quote from the Limma User's Guide (page 15):
"In such a situation, the best strategy is to include on the arrays a
series of non-differentially
expressed control spots, such as a titration series of whole-library-
pool
spots, and to use the
up-weighting method discussed below. In the absence of the such
control
spots, normalization
of boutique arrays requires specialist advice."
>3. Alternatively, is there a more appropriate statistical package to
>use in this case?
I don't know of any other available methods. In my opinion, you have
to put
down control spots, "house-keeping" genes if that is all you can get,
but
preferably constructed spots as described above.
Gordon
>Thanks.
>
>--
>Mike