Dear all,
After some careful reading, I've settled for CAMERA as my favaorite gene set enrichment test.
However, in one of my datasets (condition 3 vs condition 1; 3 replicates, microarray data), I get a very confusing error:
I get virtually the same enriched gene sets (and direction!) when I perform the test on the first or on the second condition. When I look at the genes, they are also the same! How can this be possible?
Here's some snipets:
Mi data are in this data.frame:
colnames(final.table)
'data.frame': 22518 obs. of 9 variables:
$ Probe.Set.ID: Factor w/ 22518 levels "1415676_a_at",..: 1 2 3 4 5 6 7 8 9 10 ...
$ SYMBOL : Factor w/ 12142 levels "0610006L08Rik",..: 8523 3533 8506 1316 6878 1654 6973 8500 12091 6863 ...
$ Entrez_IDs : Factor w/ 12142 levels "100017","100019",..: 3274 7450 8495 2718 11955 1075 11391 3264 10576 11953 ...
$ Cond1_1.CEL : num 11.57 8.29 10.63 9.87 8.32 ...
$ Cond1_2.CEL : num 11.48 8.16 10.47 9.63 8.06 ...
$ Cond1_3.CEL : num 11.39 8.41 11.01 10.18 8.63 ...
$ Cond3_1.CEL : num 11.11 7.93 10.58 8.89 7.98 ...
$ Cond3_2.CEL : num 11.13 8.16 10.51 9.07 7.67 ...
$ Cond3_3.CEL : num 11.04 8.62 11.04 9.1 7.82 ...
And then I construct my test:
grp = c( "Cond1", "Cond3")
Group=as.factor(c(rep(grp[1], length(grep(grp[1], colnames(final.table),ignore.case = F ))),
rep(grp[2], length(grep(grp[2], colnames(final.table),ignore.case = F )))))
des = model.matrix(~0+Group)
colnames(des) <- levels(Group)
contr <- makeContrasts(Cond3-Cond1, levels = des)
str_split(rownames(final.table), pattern = "_") ->k
unlist(lapply(k,function(x) x[1]))->k
ids2indices(iset,k)-> indices
camera(final.table, indices, des, contrast=2)-> current.cam.up.Cond3
camera(final.table,, indices, des, contrast=1)-> current.cam.up.Cond1
Here I'm showing the first 6 gene sets, but you can trust me that they are pretty much the same ones...
current.cam.up.Cond3
NGenes Direction PValue FDR
HALLMARK_MYC_TARGETS_V1 452 Up 3.867097e-12 1.933549e-10
HALLMARK_OXIDATIVE_PHOSPHORYLATION 379 Up 1.868544e-10 4.671361e-09
HALLMARK_UNFOLDED_PROTEIN_RESPONSE 262 Up 1.813248e-05 3.022080e-04
HALLMARK_MTORC1_SIGNALING 492 Up 4.983641e-05 6.229551e-04
HALLMARK_TNFA_SIGNALING_VIA_NFKB 419 Up 4.852641e-04 4.852641e-03
HALLMARK_MYC_TARGETS_V2 138 Up 7.818134e-04 6.515112e-03
> head(res.fil.up.A)
NGenes Direction PValue FDR
HALLMARK_MYC_TARGETS_V1 452 Up 6.978126e-14 3.489063e-12
HALLMARK_OXIDATIVE_PHOSPHORYLATION 379 Up 1.145075e-10 2.862686e-09
HALLMARK_E2F_TARGETS 509 Up 1.303369e-08 2.172282e-07
HALLMARK_G2M_CHECKPOINT 592 Up 4.696686e-05 5.870858e-04
HALLMARK_UNFOLDED_PROTEIN_RESPONSE 262 Up 6.785470e-05 6.785470e-04
HALLMARK_MTORC1_SIGNALING 492 Up 2.174479e-04 1.812066e-03
I would expect that if the genes sets are significicant and the same in the two conditions, the direction should change..
Thanks a lot for helping solve this puzzle,
Natalia
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)
locale:
[1] C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] mouse4302cdf_2.18.0 genefilter_1.54.2 affyPLM_1.48.0 preprocessCore_1.34.0
[5] gcrma_2.44.0 affy_1.50.0 mouse4302.db_3.2.3 org.Mm.eg.db_3.3.0
[9] AnnotationDbi_1.34.4 rgl_0.97.0 dplyr_0.5.0 ggrepel_0.6.5
[13] ggplot2_2.2.1 RColorBrewer_1.1-2 stringr_1.1.0 knitr_1.15.1
[17] gplots_3.0.1 calibrate_1.7.2 MASS_7.3-45 FactoMineR_1.34
[21] oligo_1.36.1 Biostrings_2.40.2 XVector_0.12.1 IRanges_2.6.1
[25] S4Vectors_0.10.3 oligoClasses_1.34.0 limma_3.28.21 Biobase_2.32.0
[29] BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] jsonlite_1.2 splines_3.3.1 foreach_1.4.3 gtools_3.5.0
[5] shiny_1.0.0 assertthat_0.1 highr_0.6 RSQLite_1.1-2
[9] lattice_0.20-34 digest_0.6.11 GenomicRanges_1.24.3 colorspace_1.3-2
[13] Matrix_1.2-8 htmltools_0.3.5 httpuv_1.3.3 plyr_1.8.4
[17] XML_3.98-1.5 affxparser_1.44.0 zlibbioc_1.18.0 xtable_1.8-2
[21] scales_0.4.1 gdata_2.17.0 affyio_1.42.0 ff_2.2-13
[25] annotate_1.50.1 tibble_1.2 SummarizedExperiment_1.2.3 lazyeval_0.2.0
[29] survival_2.40-1 magrittr_1.5 mime_0.5 evaluate_0.10
[33] memoise_1.0.0 BiocInstaller_1.22.3 tools_3.3.1 munsell_0.4.3
[37] cluster_2.0.5 flashClust_1.01-2 GenomeInfoDb_1.8.7 caTools_1.17.1
[41] RCurl_1.95-4.8 grid_3.3.1 iterators_1.0.8 htmlwidgets_0.8
[45] leaps_3.0 labeling_0.3 bitops_1.0-6 gtable_0.2.0
[49] codetools_0.2-15 DBI_0.5-1 R6_2.2.0 bit_1.1-12
[53] KernSmooth_2.23-15 stringi_1.1.2 Rcpp_0.12.9 scatterplot3d_0.3-38
I was just trying a shortcut to typing my whole code. Which is here. And it works, my question still remains the same.
runTest <- function(iset,name, final.table, grp){
c(grep(grp[1], colnames(final.table),ignore.case = F ), grep(grp[2], colnames(final.table),ignore.case = F))->curr.ind
exp.mat.0<-final.table[,curr.ind]
dummy_ind <- seq(from=1, to =dim(exp.mat.0)[1])
rownames(exp.mat.0)<- paste0(final.table$Entrez_IDs,"_",dummy_ind) # because we have non-unique values
# remove NA rows
complete.cases(exp.mat.0)->a
exp.mat.0 <- exp.mat.0[a,]
Group=as.factor(c(rep(grp[1], length(grep(grp[1], colnames(final.table),ignore.case = F ))),
rep(grp[2], length(grep(grp[2], colnames(final.table),ignore.case = F )))))
des = model.matrix(~0+Group)
colnames(des) <- levels(Group)
contr <- makeContrasts(Cond3-Cond1, levels = des)
str_split(rownames(exp.mat.0), pattern = "_") ->k
unlist(lapply(k,function(x) x[1]))->k
ids2indices(iset,k)-> indices
camera(exp.mat.0, indices, des, contrast=2)-> current.cam #.up.Q
camera(exp.mat.0, indices, des, contrast=1)-> current.cam
entrez.ids<- lapply(indices, function(x) k[x])
symbol.ids <- lapply(entrez.ids, function(x) match(x, final.table$Entrez_IDs))
col.id <- which(names(final.table) == "SYMBOL")
symbol.names <- lapply(symbol.ids, function(x) paste(as.character(final.table[unlist(x),col.id]), collapse = ";"))
genes.cam <-vector()
match(rownames(current.cam),names(symbol.names))->b
genes.cam <-unlist(symbol.names[b])
cbind(current.cam, genes.cam)->current.camera.tot
res <-list( "camera"= current.camera.tot)
write.table(cbind(rownames(current.camera.tot),current.camera.tot), file=paste0(outdir,name,".",grp[2],"vs",grp[1],".camera.txt"), row.names = FALSE, sep = "\t", col.names =
c("geneSet",colnames(current.camera.tot)))
return(res)
}
load(paste0(gene.sets.dir,"mouse_c2_v5p1.rdata"))
load(paste0(gene.sets.dir,"mouse_H_v5p1.rdata"))
isets <- list()
isets[[1]] <- Mm.H
isets[[2]] <- Mm.c2
names(isets) <- c("H","C2")
# get enrichments
enrichment.H <- list()
enrichment.C2 <- list()
for (i in 1:dim(comb.all)[2]){
runTest(isets[[1]], "H", mat, c(comb.all[2*i-1],comb.all[2*i]))->enrichment.H[[i]]
#runTest(isets[[2]], "C2", mat, c(comb.all[2*i-1],comb.all[2*i]))->enrichment.C2[[i]]
names(enrichment.H)[[i]] <- paste0(comb.all[2*i],".vs.",comb.all[2*i-1])
#names(enrichment.C2)[[i]] <- paste0(comb.all[2*i-1],".vs.",comb.all[2*i])
}
My answer remains the same. Did you read it?
If I''ve previously defined my design:
> des
Mock_3h Z_3h
1 1 0
2 1 0
3 1 0
4 0 1
5 0 1
6 0 1
wouldn't this instruction:
camera(exp.mat.0, indices, des, contrast=2)
specifiy that I run my test for my condition Z_3h?
this is how I interpret the contrast parameter.
What James is saying is that since your design does not have an intercept term (ie. you are using a "cell means" model), you need to test a contrast among the columns of your design matrix by specifying a vector of coefficients which is as long as the number of columns of your design matrix, and not just a single column (index) into our design matrix.
In your case, this call:
is essentially testing something you don't really care to answer, ie. you are asking if the mean expression of the genes is different from zero in your "Z_3h" group.
You probably would rather ask if the mean expression is different between your two groups, in which case you need to pass the vector that specifies your contrast across the columns of your design (as mentioned in the docs), so something like:
Which is testing the difference in expression calculated by your
Z_3h - Mock_3h
groups, given the design you specified in your comment above.Putting it a third way: when James said camera knows nothing about your contrast matrix, he means that it doesn't know that the "2" in your
contrast=2
is referencing the second column of a contrast matrix you built somewhere else (again, beating a dead horse: that contrast matrix has as many rows as you have columns in your design matrix). You need to pass the vector that is the second column of your contrast matrix directly.Make sense?