Question

limma's mroast and romer

0

Entering edit mode

Ilario De Toma ▴ 10

@ilario-de-toma-5961

Last seen 7.1 years ago

Italy

Hi, I am puzzled by the apparently opposite results in mroast and romer functions from limma.

If I well understood the difference is that romer is optimized for a gsea analysis on dataset of gene sets and use sample label rotations, while mroast use rotations of genes and is suitable to test one or more gene sets.

In my case I have 22 samples (11 vs 11), and I want to test two custom gene sets: m1 (14 genes) and m2.

If I plot the running enrichment scores (y axis) along the overal list of genes ranked for the moderated t statistics, I can clearly see that, especially for the m1 genes, genes in the gene set distribute at the top of the list. While this agrees with the mroast direction that says it is up, romer "Up" p-values are high, while the "down" p-value are low.

Why this results are opposite?

Moreover mroast PropUp column indicates "1" for both M1 and M2 list, but in reality it should be 0.7857143 for M1 and 0.5833333 for M2.

> gsea.m1m2=mroast(gene.level, index=ids2indices(m1m2_list, rownames(gene.level)),design=design, array.weights=aw, nrot=999999)
> gsea.m1m2
   NGenes PropDown PropUp Direction PValue   FDR PValue.Mixed FDR.Mixed
M2     24        0      1        Up  1e-06 1e-06        1e-06     1e-06
M1     14        0      1        Up  1e-06 1e-06        1e-06     1e-06


> gsea.m1m2=romer(gene.level, index=ids2indices(m1m2_list, rownames(gene.level)),design=design, array.weights=aw)
> gsea.m1m2
   NGenes     Up   Down  Mixed
M1     14 0.8984 0.1017 0.9999
M2     24 0.9831 0.0170 0.9997

> m1m2_list
$M1
 [1] "15945" "20304" "20846" "17329" "21926" "12524" "18126" "20296" "27056" "16193" "16365" "16176" "83430" "20656"

$M2
 [1] "17534"  "19016"  "19224"  "21803"  "16181"  "17533"  "16153"  "12655"  "20295"  "20778"  "170786" "20852"  "11846" 
[14] "16600"  "57262"  "16178"  "56221"  "170776" "98396"  "19015"  "170779" "69142"  "93671"  "69165"

> gene.level[1:3,1:6]
          scr.healthy anti210.healthy scr.healthy anti210.healthy scr.healthy anti210.healthy
100008567    6.834766        7.008072    6.935282        6.818102    6.830925        6.676217
100017       6.847484        7.260983    6.894910        7.166771    7.249061        7.224644
100019       7.633976        7.522403    7.595522        7.609688    7.673701        7.553197


> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.04

locale:
 [1] LC_CTYPE=it_IT.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8        LC_COLLATE=it_IT.UTF-8    
 [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=it_IT.UTF-8    LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocInstaller_1.18.4  limma_3.24.15         clusterProfiler_2.2.7 DOSE_2.6.6            org.Mm.eg.db_3.1.2   
 [6] pathview_1.8.0        org.Hs.eg.db_3.1.2    AnnotationDbi_1.30.1  GenomeInfoDb_1.4.3    IRanges_2.2.7        
[11] S4Vectors_0.6.6       KEGGgraph_1.26.0      graph_1.46.0          XML_3.98-1.3          scales_0.3.0         
[16] biomaRt_2.24.1        beadarray_2.18.0      ggplot2_1.0.1         Biobase_2.28.0        BiocGenerics_0.14.0  
[21] RSQLite_1.0.0         DBI_0.3.1            

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1          plyr_1.8.3           XVector_0.8.0        bitops_1.0-6         tools_3.2.2         
 [6] zlibbioc_1.14.0      base64_1.1           digest_0.6.8         gtable_0.1.2         png_0.1-7           
[11] igraph_1.0.1         Rgraphviz_2.12.0     proto_0.3-10         stringr_1.0.0        httr_1.0.0          
[16] Biostrings_2.36.4    grid_3.2.2           qvalue_2.0.0         R6_2.1.1             GOSemSim_1.26.0     
[21] BeadDataPackR_1.20.0 GO.db_3.1.2          reshape2_1.4.1       DO.db_2.9            magrittr_1.5        
[26] GenomicRanges_1.20.8 MASS_7.3-44          splines_3.2.2        KEGGREST_1.8.0       colorspace_1.2-6    
[31] stringi_0.5-5        RCurl_1.95-4.7       munsell_0.4.2        illuminaio_0.10.0

romer mroast limma • 3.0k views

ADD COMMENT • link updated 8.7 years ago by Gordon Smyth 50k • written 8.7 years ago by Ilario De Toma ▴ 10

1

Entering edit mode

This reversal between roast (a self-contained test) and romer (a competitive test) can only occur in special circumstances.

I have a suspicion of what the problem may be here. Please show me what your design matrix is.

Also, how exactly have you concluded that PropUp should be 0.786 for M1 and 0.583 for M2?

ADD REPLY • link updated 8.7 years ago by Steve Lianoglou ★ 13k • written 8.7 years ago by Gordon Smyth 50k

0

Entering edit mode

As regards the value for PropUp, if i have well understood stands for "proportion up", so simply by looking at the topTable of those genes I know that are 11 out of 14 for the M1 gene set and 14 our of 24 for M2.

Here is my design matrix, which is the same between the mroast and romer calls.

> design
   anti210.healthy scr.healthy
1                0           1
2                1           0
3                0           1
4                1           0
5                0           1
6                1           0
7                0           1
8                1           0
9                0           1
10               1           0
11               0           1
12               1           0
13               0           1
14               1           0
15               0           1
16               1           0
17               0           1
18               1           0
19               0           1
20               1           0
21               0           1
22               1           0
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$TS
[1] "contr.treatment"

ADD REPLY • link 8.7 years ago Ilario De Toma ▴ 10

2

Entering edit mode

I don't think you are testing what you think you are testing given the design you show (you have no intercept) and the contrast that is being used (you haven't specified one, so it defaults to the last column of your design matrix).

You can keep the same design, and specify contrast=c(1, -1) in your calls to romer and roast, which would be testing the (anti20 / scr) fold changes.

You said you compared the romer and roast results vs. what you got by looking at where the moderated t-stats landed on the distribution of the t-stats ... what was the (straight up limma for differential expression) code you used to calculate those?

ADD REPLY • link 8.7 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

You are right! I forgot the argument contrasts that I used for calling limma's fit. What a stupid mistake!

> contrasts
                 Contrasts
Levels            anti210.healthy - scr.healthy
  anti210.healthy                             1
  scr.healthy                                -1

Now results agree:

> gsea.m1m2
   NGenes     Up   Down Mixed
M1     14 0.1246 0.8755 0.4971
M2     24 0.3481 0.6520 0.8300
> gsea.m1m2.mroast
   NGenes   PropDown    PropUp Direction PValue   FDR PValue.Mixed FDR.Mixed
M1     14 0.07142857 0.4285714        Up 0.047 0.093        0.037    0.0370
M2     24 0.12500000 0.2500000        Up 0.468 0.468        0.034    0.0365

Anyway it sounds strange that even calculating the opposite contrasts results were diverging.

ADD REPLY • link 8.7 years ago Ilario De Toma ▴ 10

0

Entering edit mode

Good to see that you now have correct results.

The divergent results that you got originally are not at all surprising given the hypothesis you were testing -- see my answer below.

ADD REPLY • link 8.7 years ago Gordon Smyth 50k

score 2 · Answer 1 · 2015-09-30

Hello

If I well understood the difference is that romer is optimized for a gsea analysis on dataset of gene sets and use sample label rotations, while mroast use rotations of genes and is suitable to test one or more gene sets.

You've got your wires crossed here. They both calculate p-values by rotation (something akin to sample permutation, but not that exactly: cf. the Langsrud, 2005 and Wu et al, 2010 citations in the help pages for both functions) -- I believe the consensus on "rotating" (you probably mean "permuting") genes is that this is a no-no due to inherent intra-gene correlation structure among genes in a gene set that you would now be breaking (and would provide you with p-values that are too optimistic).

Anyway: one of the main differences between mroast and romer is the hypothesis under test. mroast is a "self-contained" test, which is testing (in so many words) if the logFC of the genes in the geneset is equal to zero for the contrast under test. romer is a competitive test, which is something like a rank sum test, ie: do the genes in the geneset show enrichment at the tails of the logFC distribution from all the genes under test.

You'll find that romer (and camera) will often be harder tests "to pass" when compared to roast.

score 2 · Answer 2 · 2015-10-02

When you get a result as strange as this, it is usually a sign that you've done something wrong.

In this case, you hadn't specified the contrast vector, and the default wasn't sensible for your design matrix.

The null hypothesis that was tested was that the mean expression (log2 CPM) in the second group was greater than zero. All genes in M1 and M2 have much higher expression than that, so roast gave significantly up p-values with PropUp=100%. On the other hand, the genes in M1 and M2 apparently have lower cpm values than an average gene, so romer gave down results.

By the way, you are not interpreting the PropUp and PropDown columns correctly. These columns only count genes that are up or down to a worthwhile extent, not all genes with positive or negative logFC. See the help page of roast() for the exact definition.