Entering edit mode
Hello all,
I know the general question of "should I summarize/average/etc probes
that map to the same gene?" has been discussed many times before. But,
I
feel that it might be slightly different on the Illumina platform (at
least for the Mouse chip, which is the one I have been using).
For non-control probes, there simply is no advantage to using probe
summarized data relative to target summarized data, since you
basically
have the same number of distinct sequences. So, even though the probe
names have changed, and there appear to be ~70k of them, there are
only
~46k different probe sequences, which just about map nicely to the
number of targets...
The numbers:
> length(as.list(lumiMouseV1TARGETID2NUID))
[1] 46116
> length(as.list(lumiMouseV1PROBEID2NUID))
[1] 70182
> length(unique(as.list(lumiMouseV1PROBEID2NUID)))
[1] 46120
Cheers,
Cei
sessionInfo()
R version 2.7.0 (2008-04-22)
i386-apple-darwin8.10.1
locale:
C
attached base packages:
[1] stats graphics grDevices datasets tools utils
methods
[8] base
other attached packages:
[1] lumiMouseV1_1.3.1 lumiMouseAll.db_1.2.0 AnnotationDbi_1.2.0
[4] RSQLite_0.6-8 DBI_0.2-4 lumi_1.6.0
[7] mgcv_1.3-30 affy_1.18.0 preprocessCore_1.2.0
[10] affyio_1.8.0 Biobase_2.0.0
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.