Entering edit mode
Scott Ochsner
▴
300
@scott-ochsner-599
Last seen 10.3 years ago
Dear BioC,
I would like to use simple correlation to assess the consistency
between a seven independent expression array datasets. All datasets
are on the same platform, hgu133a.
In the materials and methods section from
http://cancerres.aacrjournals.org/cgi/content/full/67/21/10296#top
they state,
"To assess for consistency between the three studies, Pearson
correlation was computed pair-wise between the mean values of common
genes. The three studies showed significant positive pair-wise
correlation."
I'm having trouble following their statement. I don't have to worry
about common genes as all of the seven studies I'm looking at are on
the same platform.
I thought of doing something as below:
#eset is your standard ExpressionSet object
#treatment is a vector describing which group each array belongs to.
There are two groups, cont. and drug.
>avg<-function(eset,treatment){
+ tmp<-aggregate(t(exprs(eset)),by=list(treatment),mean)
+ rownames(tmp)<-tmp[,1]
+ t(tmp[,-1])
+ }
>groupAverage<-avg(eset,treatment)
> dim(groupAverage)
[1] 22277 14
> cor(sampleAverage)
c.d3529 c.d3834 c.d4006 c.d4025 c.d6800 c.d8540
c.d9936 e.d3529 e.d3834 e.d4006 e.d4025 e.d6800 e.d8540
c.d3529 1.0000000 0.9659532 0.7933771 0.7498652 0.8957816 0.8874096
0.9041292 0.9917589 0.9535454 0.7964003 0.7577108 0.8889499 0.8904473
c.d3834 0.9659532 1.0000000 0.8071949 etc....
Questions:
1. Since I'm expecting most of the probe sets on these arrays to not
change, shouldn't I expect high correlation even between the cont. and
drug groups? Or in other words, how informative is doing cor across
all of the probe sets?
2. How might I assess the significance of these correlations.
> sessionInfo()
R version 2.7.0 (2008-04-22)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] splines tools stats graphics grDevices utils
datasets methods base
other attached packages:
[1] affycoretools_1.12.0 annaffy_1.12.1 KEGG.db_2.2.0
gcrma_2.12.1 matchprobes_1.12.0 biomaRt_1.14.0
[7] RCurl_0.9-3 GOstats_2.6.0 Category_2.6.0
RBGL_1.16.0 GO.db_2.2.0 graph_1.18.1
[13] limma_2.14.2 affy_1.18.1 preprocessCore_1.2.0
affyio_1.8.0 MLInterfaces_1.14.1 annotate_1.18.0
[19] xtable_1.5-2 AnnotationDbi_1.2.1 RSQLite_0.6-8
DBI_0.2-4 rda_1.0 rpart_3.1-41
[25] genefilter_1.20.0 survival_2.34-1 MASS_7.2-41
Biobase_2.0.1
loaded via a namespace (and not attached):
[1] class_7.2-41 cluster_1.11.10 XML_1.95-2
Scott A. Ochsner, Ph.D.
NURSA Bioinformatics
Molecular and Cellular Biology
Baylor College of Medicine
Houston, TX. 77030
phone: 713-798-6227