Hi,
I have been trying to find differently express gene in GSE19864 between senescent and growing cells using their GEO raw data.
My first problem is that using reported normalization method (GC-rma) I can't get the same dataset as their normalized GEO dataset. There are 2 experiments in the same dataset wich in a PCA cluster in two groups.So I tried using batch effect / surrogate variables removal from sva R package without much success, my DE gene list is still different from their. Am i missing something ? cf my code is below
gse <- gse_gcrma expr <- exprs(gse) pheno <- droplevels(pData(gse)) mod <- model.matrix(~cell_status, data=pheno) mod0 <- model.matrix(~1, data=pheno) n.sv <- sva::num.sv(expr,mod,method="leek") svobj <- sva::sva(as.matrix(expr), mod, mod0, n.sv = n.sv) modSv = cbind(mod,svobj$sv) colnames(modSv) <- c(levels(pheno$cell_status), paste("Surrogate",seq_along(1:svobj$n.sv),sep = "")) contrast_mat <- makeContrasts(Senescent-Growing, levels=modSv) fit = lmFit(gse,modSv) fit2 <- contrasts.fit(fit, contrast_mat) fit2 <- eBayes(fit2) tT_gcrma_sva <- topTable(fit2, adjust="fdr", sort.by="B", number=50)
PS: which normalization would you recommend for affy array: rma, gc-rma or frma ?
Thank you Bernd for this explanation, you are right I was mistaken for the colnames part, I used ~0+cell_status before when creating the variable mod, but I email jeff asking some question about sva and he recommended that I removed the 0.
I tried your solution and it is working great, now at least some of the DE genes are similar with authors results, but I still don't really understand why it is only working when using an intercept for the SVA part (~cell_status).