Question

eBayes options and decideTests method

0

Entering edit mode

irene.roman • 0

@ireneroman-22635

Last seen 4.1 years ago

Dear Bioconductor volunteers,

I am Irene Román, postdoctoral researcher in epidemiology, and I am writing because I have some questions associated to an analysis I am currently doing.

I am analyzing biomarkers of demyelination in mice. In a first step, I am analyzing biomarkers in microglia. I have selected 2 GEOs and analyzed them together. I attach the most important code for the analysis. I am not including quality control (QC) figures to be more concise (although I can include them if needed), QC figures were ok after the steps that are shown in the code.

My questions are the following:

In the eBayes function there is one option to include an intensity-trend for the prior variance (trend) and another to robustify estimations against outlier sample variances (robust). I am inclined to put them both=TRUE as probably the prior variance is not constant and there could be some outliers. Although I have read the function and the limma package documentation in detail I cannot get a clear idea of which strategy is best regarding these 2 options. In my analysis, I found that the number of differentially expressed genes obtained depends on which option is selected (Table). In general, the trend=TRUE option increases the number of differentially expressed genes, the robust=TRUE option decreases them, and setting both to TRUE gives you a number between both (Table).
When I examine the differentially expressed genes with decideTests I can choose between 2 methods: separate and global. I would like to use global to adjust for the multiple comparisons undertaken in the 2 separate comparisons I am doing. As expected I get a different number of differentially expressed genes depending on the method I use (Table). In addition, for 1 of the 2 comparisons I only get differentially expressed genes when I select method=global. This was a bit surprising, as I would expect that a further multiple adjustment for the number of comparisons would reduce the number of differentially expressed genes. Then, when I do a topTable I can obtain the differentially expressed genes with the separate method, which seems to be the default, but I cannot select the genes for the global method (that I can see with the vennDiagram).

I would really appreciate if you could give me some insight in the use of the trend and robust options and help in obtaining the differentially expressed genes with the global method.

Thank you so much, sincerely,
Irene.

Summary of results by strategy used

library(affy)
library(affyPLM)
library(affyQCReport)
library(scatterplot3d)
library(gplots)
library(limma)
library(GEOquery)
library(annotate)
library(mogene10sttranscriptcluster.db)
library(oligo)
library(tidyr)
library(ReactomePA)
library(a4Base)
library(casper)
library(sva)
library(SummarizedExperiment)
library(dplyr)
library(topGO)
library(ReactomePA)
library(DOSE)
library(clusterProfiler)
library(TimeSeriesExperiment)
library(calibrate)
library(ggplot2)

workingDir<-"C:/Users/iroman/Documents/Master_Omics/Project"
setwd(workingDir)

GEO84<-getGEOSuppFiles("GSE84113")
GEO66<-getGEOSuppFiles("GSE66926") 

untar("GSE84113/GSE84113_RAW.tar", exdir = getwd())
untar("GSE66926/GSE66926_RAW.tar", exdir = getwd())

# reading cel files I am interested in
GEOFS84 <- read.celfiles(filenames = c("GSM2227385_25374.CEL.gz","GSM2227386_25375.CEL.gz",
                                       "GSM2227387_25376.CEL.gz","GSM2227388_25377.CEL.gz","GSM2227389_25378.CEL.gz", 
                                       "GSM2227396_25385.CEL.gz","GSM2227397_25386.CEL.gz","GSM2227398_25387.CEL.gz",
                                       "GSM2227399_25388.CEL.gz","GSM2227400_25389.CEL.gz"))
GEOFS66 <- read.celfiles(filenames = c("GSM1634433_23164.CEL.gz", "GSM1634434_23165.CEL.gz",
                                       "GSM1634436_23167.CEL.gz", "GSM1634437_23168.CEL.gz"))

# renaming samples
sampleNames(GEOFS84)<-gsub("GSM2227","",sampleNames(GEOFS84))
sampleNames(GEOFS84)<-gsub(".CEL.gz","",sampleNames(GEOFS84))
sampleNames(GEOFS66)<-gsub("GSM16344","",sampleNames(GEOFS66))
sampleNames(GEOFS66)<-gsub(".CEL.gz","",sampleNames(GEOFS66))

# normalization
GEOFS84.rma<-rma(GEOFS84,target="core")
GEOFS66.rma<-rma(GEOFS66,target="core")

### Sample combination and quantile normalization
GEOS_microglia<-combineTwoExpressionSet(GEOFS84.rma,GEOFS66.rma)
GEOS_microglia_quant = quantileNorm(GEOS_microglia)

### Removing experiment effect
pheno = pData(GEOS_microglia_quant)
pheno$batch<-c("1","1","1","1","1","1","1","1","1","1","2","2","2","2")
edata = exprs(GEOS_microglia_quant)
batch = pheno$batch
colnames(edata)<-paste(rownames(pheno),as.character(batch),sep="_")
modcombat = model.matrix(~1, data=pheno) 
combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat,par.prior=TRUE,prior.plots=FALSE)

### Non-specific filtering

## Filter based on intensity
# Calculation of row-wise medians
microg.medians <- rowMedians(combat_edata)
hist_medians <- hist(microg.medians)
quantile(microg.medians,c(0.3))

# Filter transcripts with lower intensity than 4 in at least 3 arrays (smallest experimental group)
med_threshold <- 4.14
samples_cutoff <- 3
idx_med_threshold <- apply(combat_edata,1,
                           function(x){
                             sum(x > med_threshold) >= samples_cutoff})
table(idx_med_threshold)
combat.filt <- subset(combat_edata,idx_med_threshold)

## Annotation of transcript clusters
combat.filt.eset <- ExpressionSet(combat.filt)
# getting annotation from the transcript clusters
annot.microg <- AnnotationDbi::select(mogene10sttranscriptcluster.db,
                                      keys=(featureNames(combat.filt.eset)),
                                      columns = c("SYMBOL","GENENAME","ENSEMBL"),
                                      keytype = "PROBEID")
annot.microg$SYMBOL <- as.factor(annot.microg$SYMBOL)
# filter out the probes that do not map to a gene
annot.microg.filt <- subset(annot.microg, !is.na(SYMBOL))

## Removing probes mapping to multiple gene symbols
annot.grouped <- group_by(annot.microg.filt, PROBEID)
annot.summ <- dplyr::summarize(annot.grouped, no_of_matches = n_distinct(SYMBOL))

# filter for probes with multiple matches
annot.microg.multfilt <- filter(annot.summ, no_of_matches >1)
# filter out probes with multiple matches from assay data
ids_to_exclude <- (featureNames(combat.filt.eset) %in% annot.microg.multfilt$PROBEID)
data.final <- subset(combat.filt.eset, !ids_to_exclude)
# filter out probes with multiple matches from feature data
fData(data.final)$PROBEID <- rownames(fData(data.final))
fData(data.final) <- left_join(fData(data.final),annot.microg.filt)

### DEGs
pData(data.final)$cond<-factor(c("WTcont","WTcont","WTcupri","WTcupri","WTcupri",
                                 "Tremcont","Tremcont","Tremcupri","Tremcupri","Tremcupri","WTcont","WTcupri","Tremcont","Tremcupri"))

cond.mic<-as.factor(pData(data.final)$cond)

design.mic<-model.matrix(~0+cond.mic)
rownames(design.mic)<-sampleNames(data.final)
colnames(design.mic)<-c("Tremcontrol","Tremcupri","WTcontrol","WTcupri")

fit.mic<-lmFit(exprs(data.final),design.mic)
contrast.matrix.mic<-makeContrasts(WTcontrol-WTcupri,Tremcontrol-Tremcupri,levels=design.mic)

fit2.mic<-contrasts.fit(fit.mic,contrast.matrix.mic)
fite.mic<-eBayes(fit2.mic, trend=TRUE)

resultsG <- decideTests(fite.mic,method="global",adjust.method="BH")
resultsS <- decideTests(fite.mic,method="separate",adjust.method="BH")
vennDiagram(resultsG)
vennDiagram(resultsS)

top.table_BH_WT<-topTable(fite.mic,coef=1,n=Inf,adjust.method="BH")
results.p0.05_BH_WT<-top.table_BH_WT[top.table_BH_WT$adj.P.Val<0.05,]

top.table_BH_Trem<-topTable(fite.mic,coef=2,n=Inf,adjust.method="BH")
results.p0.05_BH_Trem<-top.table_BH_Trem[top.table_BH_Trem$adj.P.Val<0.05,]

# sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252    

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pd.mogene.1.0.st.v1_3.14.1           DBI_1.0.0                           
 [3] RSQLite_2.1.1                        ggplot2_3.1.1                       
 [5] calibrate_1.7.5                      TimeSeriesExperiment_1.0.4          
 [7] clusterProfiler_3.10.1               DOSE_3.8.2                          
 [9] topGO_2.34.0                         SparseM_1.77                        
[11] graph_1.60.0                         dplyr_0.8.0.1                       
[13] sva_3.30.1                           mgcv_1.8-27                         
[15] nlme_3.1-137                         casper_2.16.1                       
[17] a4Base_1.30.0                        a4Core_1.30.0                       
[19] a4Preproc_1.30.0                     glmnet_2.0-16                       
[21] foreach_1.4.4                        Matrix_1.2-15                       
[23] multtest_2.38.0                      genefilter_1.64.0                   
[25] mpm_1.0-22                           KernSmooth_2.23-15                  
[27] MASS_7.3-51.1                        annaffy_1.54.0                      
[29] KEGG.db_3.2.3                        GO.db_3.7.0                         
[31] ReactomePA_1.26.0                    tidyr_0.8.3                         
[33] oligo_1.46.0                         Biostrings_2.50.2                   
[35] XVector_0.22.0                       oligoClasses_1.44.0                 
[37] mogene10sttranscriptcluster.db_8.7.0 org.Mm.eg.db_3.7.0                  
[39] annotate_1.60.0                      XML_3.98-1.19                       
[41] AnnotationDbi_1.44.0                 GEOquery_2.50.5                     
[43] limma_3.38.3                         gplots_3.0.1.1                      
[45] scatterplot3d_0.3-41                 affyQCReport_1.60.0                 
[47] lattice_0.20-38                      affyPLM_1.58.0                      
[49] preprocessCore_1.44.0                gcrma_2.54.0                        
[51] affy_1.60.0                          SummarizedExperiment_1.12.0         
[53] DelayedArray_0.8.0                   BiocParallel_1.16.6                 
[55] matrixStats_0.54.0                   Biobase_2.42.0                      
[57] GenomicRanges_1.34.0                 GenomeInfoDb_1.18.2                 
[59] IRanges_2.16.0                       S4Vectors_0.20.1                    
[61] BiocGenerics_0.28.0                 

loaded via a namespace (and not attached):
  [1] proto_1.0.0              tidyselect_0.2.5         htmlwidgets_1.3          munsell_0.5.0           
  [5] codetools_0.2-16         chron_2.3-53             withr_2.1.2              colorspace_1.4-1        
  [9] GOSemSim_2.8.0           knitr_1.22               rstudioapi_0.10          simpleaffy_2.58.0       
 [13] urltools_1.7.3           GenomeInfoDbData_1.2.0   polyclip_1.10-0          bit64_0.9-7             
 [17] farver_2.0.1             coda_0.19-3              xfun_0.6                 affxparser_1.54.0       
 [21] R6_2.4.0                 graphlayouts_0.5.0       VGAM_1.1-2               locfit_1.5-9.1          
 [25] bitops_1.0-6             fgsea_1.8.0              gridGraphics_0.4-1       assertthat_0.2.1        
 [29] scales_1.0.0             nnet_7.3-12              ggraph_2.0.0             enrichplot_1.2.0        
 [33] gtable_0.3.0             tidygraph_1.1.2          rlang_0.4.2              splines_3.5.3           
 [37] rtracklayer_1.42.2       lazyeval_0.2.2           acepack_1.4.1            europepmc_0.3           
 [41] checkmate_1.9.1          BiocManager_1.30.4       yaml_2.2.0               reshape2_1.4.3          
 [45] GenomicFeatures_1.34.3   backports_1.1.3          qvalue_2.14.1            Hmisc_4.2-0             
 [49] tools_3.5.3              ggplotify_0.0.4          affyio_1.52.0            ff_2.2-14               
 [53] RColorBrewer_1.1-2       proxy_0.4-23             dynamicTreeCut_1.63-1    ggridges_0.5.1          
 [57] gsubfn_0.7               Rcpp_1.0.1               plyr_1.8.4               base64enc_0.1-3         
 [61] progress_1.2.0           zlibbioc_1.28.0          purrr_0.3.2              RCurl_1.95-4.12         
 [65] prettyunits_1.0.2        rpart_4.1-13             sqldf_0.4-11             viridis_0.5.1           
 [69] cowplot_0.9.4            ggrepel_0.8.1            cluster_2.0.7-1          magrittr_1.5            
 [73] data.table_1.12.8        DO.db_2.9                triebeard_0.3.0          reactome.db_1.66.0      
 [77] hms_0.4.2                xtable_1.8-3             gaga_2.28.1              gridExtra_2.3           
 [81] compiler_3.5.3           biomaRt_2.38.0           tibble_2.1.1             crayon_1.3.4            
 [85] htmltools_0.4.0          Formula_1.2-3            geneplotter_1.60.0       tweenr_1.0.1            
 [89] rappdirs_0.3.1           readr_1.3.1              permute_0.9-5            gdata_2.18.0            
 [93] igraph_1.2.4.1           pkgconfig_2.0.2          rvcheck_0.1.7            GenomicAlignments_1.18.1
 [97] foreign_0.8-71           xml2_1.2.0               EBarrays_2.46.0          stringr_1.4.0           
[101] digest_0.6.18            vegan_2.5-4              fastmatch_1.1-0          htmlTable_1.13.1        
[105] edgeR_3.24.3             Rsamtools_1.34.1         gtools_3.8.1             graphite_1.28.2         
[109] jsonlite_1.6             viridisLite_0.3.0        pillar_1.3.1             httr_1.4.0              
[113] survival_2.43-3          glue_1.3.1               UpSetR_1.4.0             iterators_1.0.10        
[117] bit_1.1-14               ggforce_0.3.1            stringi_1.4.3            blob_1.1.1              
[121] DESeq2_1.22.2            latticeExtra_0.6-28      caTools_1.17.1.2         memoise_1.1.0

limma eBayes decideTests topTable • 1.7k views

ADD COMMENT • link updated 5.2 years ago by Gordon Smyth 52k • written 5.2 years ago by irene.roman • 0

0

Entering edit mode

The Table is in https://ibb.co/M2MYDXj Irene.

ADD REPLY • link 5.2 years ago irene.roman • 0

0

Entering edit mode

Do you have a question? I've read through your post, but there are no questions, only comments.

ADD REPLY • link 5.2 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Gordon,

I have 2 questions:

in which situations are the trend and robust options of the eBayes function recommended?
how can I obtain the differentially expressed genes identified with the global method of the decideTests?

Thank you so much, Irene.

ADD REPLY • link 5.2 years ago irene.roman • 0

score 0 · Answer 1 · 2020-02-22

I can't give tell you how to analyse this particular data set, but it is not correct to use Combat and then feed the corrected values into R as if they were not corrected. If you correct for a covariate or batch effect, then the covariate or batch factor must be included in the limma linear model.

Also, a minor suggestion: why use lmFit(exprs(data.final),design.mic) instead of simply lmFit(data.final, design.mic)?

in which situations is trend recommended?

Whenever there is a strong trend in the residual variables. Try plotSA(fite.mic) and see.

in which situations is robust recommended?

Whenever there might be genes with outlier variances. Have a look at plotSA(fite.mic) and see. It usually does no harm to include robust=TRUE even if it is not necessary.

how can I obtain the differentially expressed genes identified with the global method of the decideTests?

write.fit(fite.mic, results=resultsG, adjust="BH", method="global", file="globalDEresults.txt")

Alternatively you can select the DE genes with a logical subsetting vector:

is.DE.WT <- as.logical(resultsG[,1])
is.DE.Trem <- as.logical(resultsG[,2])