Entering edit mode
Daniel Emden
▴
10
@daniel-emden-5896
Last seen 10.3 years ago
Hi,
for my diploma thesis I need to merge several microarray datasets from
different platforms. I came across the virtualArray bioconductor
package
which claims to do exactly what I need. The example from the package
vignette runs just fine. I computed the virtual Array without
bacheffect
removal as follows:
same as in the paper:
http://www.biomedcentral.com/1471-2105/14/75
# get sample data from the paper
GSE23402 <- getGEO("GSE23402")
GSE26428 <- getGEO("GSE26428")
GSE28688 <- getGEO("GSE28688")
# extract ExpSets and reduce data
GSE23402 <- GSE23402[[1]][,1:24]
GSE26428 <- GSE26428[[1]]
GSE28688 <- GSE28688[[1]]
# merge via virtualArray
library(virtualArray)
virtArrays <- list()
virtArrays[["wBatchEffects"]] <-
virtualArrayExpressionSets(all_expression_sets=c('GSE23402',
'GSE26428',
'GSE28688'), removeBatcheffect=FALSE)
# get ExpSet
virtArray <- virtArrays[["wBatchEffects"]]
# quantiles before merge
quantile(exprs(GSE23402))
quantile(exprs(GSE26428))
quantile(exprs(GSE28688))
# quantiles after merge
ind1 <- which(pData(virtArray)$Batch=="GSE23402")
ind2 <- which(pData(virtArray)$Batch=="GSE26428")
ind3 <- which(pData(virtArray)$Batch=="GSE28688")
quantile(exprs(virtArray)[,ind1[1:3]])
quantile(exprs(virtArray)[,ind2[1:3]])
quantile(exprs(virtArray)[,ind3[1:3]])
# output
# before merge
> quantile(exprs(GSE23402)) 0% 25% 50% 75%
100%
3.330558 4.518535 5.883376 8.140574 14.777982 >
quantile(exprs(GSE26428)) 0% 25% 50% 75%
100%
0.8432676 2.4635710 5.6495232 8.1862528 14.8623320 >
quantile(exprs(GSE28688)) 0% 25% 50% 75%
100%
4.821043 5.613557 5.935004 7.473984 15.387470
# after merge
> quantile(exprs(virtArray)[,ind1]) 0% 25% 50%
75% 100%
3.744348 5.217698 6.638584 8.421955 14.758915 >
quantile(exprs(virtArray)[,ind2]) 0% 25% 50%
75% 100%
3.744348 5.217698 6.638584 8.421955 14.758915 >
quantile(exprs(virtArray)[,ind3]) 0% 25% 50%
75% 100%
3.744348 5.217698 6.638584 8.421955 14.758915
As you can see, the quantiles of the first three samples from each
dataset
are very different before the merge. After the merge they are all the
same.
Is that correct? Where is my mistake? For me this looks very strange.
I get the same result with batcheffect removal.
The values in the exprs(virtArray) are very different but how is it
possible, that the quantiles/boxplots are the same?
As far as I know the data from getGEO are already normalized. Is
the virtualArrayExpressionSets function performing a second
normalization?
Thanks,
Daniel Emden
--
> sessionInfo()R version 3.0.0 (2013-04-03)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C
LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] hgug4112a.db_2.9.0 hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0
RSQLite_0.11.2 DBI_0.2-5
[6] AnnotationDbi_1.22.1 BiocParallel_0.2.0 virtualArray_1.4.0
preprocessCore_1.22.0 plyr_1.8
[11] GEOquery_2.26.1 Biobase_2.20.0 BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] affy_1.38.0 affyio_1.28.0 affyPLM_1.36.0
BiocInstaller_1.10.0 Biostrings_2.28.0
[6] codetools_0.2-8 foreach_1.4.0 gcrma_2.32.0
grid_3.0.0 IRanges_1.18.0
[11] iterators_1.0.6 lattice_0.20-15 outliers_0.14
quadprog_1.5-4 RCurl_1.95-4.1
[16] reshape2_1.2.2 splines_3.0.0 stats4_3.0.0
stringr_0.6.2 tools_3.0.0
[21] tseries_0.10-30 XML_3.96-1.1 zlibbioc_1.6.0
zoo_1.7-10
[[alternative HTML version deleted]]