Eugene Bolotin wrote:
> Dear Bioconductor mailing list,
> I am analysing some tiling, chip-on-chip, two color (one is input
and the
> other is chromatin), Agilent data. There are 10 arrays with ~44000
> features, scanned with GenePix, each that represent most of the
human
> promoters of about 8kb regions, each with one biological replicate
giving
> the grand total of 20 arrays. I am interested in getting high
resolution
> peaks, hopefully with p-values.
If I understand you, you have 10 different array designs, each
covering
a portion of the genome? I will make that assumption below, so
correct
me if I misunderstood.
Look at the ACME or Ringo packages (both in the devel/bioc 2.0
repository). Both are geared toward nimblegen arrays, but they offer
some methods for dealing with ChIP/chip data.
> I am trying to use Limma to normalize them
> using RMA. However all these arrays have different probes, so in the
end I
> should end up with ~440,000 different probe values. However Limma
treats
> these arrays as replicates and I only end up with 44,000 probes. How
can I
> keep it from doing that?
You can't. If you want to use limma, you will need to load each set
of
arrays with the same probes as a separate batch.
> Also, any suggestions about normalization methods
> would be greatly appreciated.
>
I would load the arrays separately, median center them and scale each
set of arrays to have the same MAD (on the log2 scale). Because of
the
strong correlation between probes along the chromosome, probe-specific
artifacts, etc. are much less harmful than for gene expression
analyses.
Nonlinear normalization methods have the potential of reducing any
signal, so unless you have a strong reason to use them, I would
suggest
not using them.
After loading the arrays, you will want to combine them, perhaps as a
data.frame. Then, order the probes by chromosome and chromosome
position. Finally, you can take your combined data and form one of
the
data structures required by ACME or Ringo.
Sean