Question

How to apply KEGG enrichment analysis to the overlap of multiple contrasts in MArrayLM fit?

0

Entering edit mode

antoinefelden ▴ 10

@antoinefelden-16088

Last seen 6.6 years ago

I ran a DGE analysis with 4 different contrasts with lmFit() + contrasts.fit() in limma, and I'm interested in the overlap of the four different contrasts. I identified the genes that are indeed differentially expressed in all four contrasts, and coded that in tfit$genes$test ("yes" if differentially expressed, "no" if not). The MArrayLM object is pasted below.

I did manage to run a Kegg pathway enrichment analysis for each of the contrasts individually, but I'm after a way to run a single analysis for the set of DE genes in all contrasts. Is that feasible?

An object of class "MArrayLM"
$coefficients
   Contrasts
         ARvsCA     ARvsEU      ARvsAU     ARvsNZ
  1 -0.18067896 -0.2044603 -0.22881771 -0.1833862
  2 -0.04079345 -1.1859285 -0.39206763 -0.3653143
  3 -0.12733594  0.1763288 -0.07934863 -0.1252855
  4  0.07648264  0.6875827  0.13266024  0.3442508
  5  0.09678434  0.4514540  0.13137207  0.2943875
10387 more rows ...

$stdev.unscaled
   Contrasts
       ARvsCA    ARvsEU    ARvsAU    ARvsNZ
  1 0.1820675 0.2300422 0.1936519 0.1780754
  2 0.1499697 0.1868217 0.1574127 0.1458230
  3 0.1480576 0.1898012 0.1567854 0.1442379
  4 0.1883755 0.2695961 0.2030905 0.1910940
  5 0.1406236 0.1774499 0.1479352 0.1374007
10387 more rows ...

$sigma
[1] 1.2656682 1.3338325 0.5922579 1.9853202 0.9379173
10387 more elements ...

$df.residual
[1] 21 21 21 21 21
10387 more elements ...

$cov.coefficients
         Contrasts
Contrasts    ARvsCA    ARvsEU ARvsAU    ARvsNZ
   ARvsCA 0.3666667 0.2000000    0.2 0.2000000
   ARvsEU 0.2000000 0.5333333    0.2 0.2000000
   ARvsAU 0.2000000 0.2000000    0.4 0.2000000
   ARvsNZ 0.2000000 0.2000000    0.2 0.3428571

$rank
[1] 5

$genes
    gene_id line test
1 gene12245    1   no
2 gene12244    2   no
3 gene12247    3   no
4 gene12246    4   no
5 gene12241    5   no
10387 more rows ...

$Amean
       1        2        3        4        5 
4.749884 8.665232 5.995541 4.329715 6.534481 
10387 more elements ...

$method
[1] "ls"

$design
  AR_heads AU_heads CA_heads EU_heads NZ_heads
1        1        0        0        0        0
2        1        0        0        0        0
3        1        0        0        0        0
4        1        0        0        0        0
5        1        0        0        0        0
21 more rows ...

$contrasts
          Contrasts
Levels     ARvsCA ARvsEU ARvsAU ARvsNZ
  AR_heads      1      1      1      1
  AU_heads      0      0     -1      0
  CA_heads     -1      0      0      0
  EU_heads      0     -1      0      0
  NZ_heads      0      0      0     -1

$df.prior
[1] 2.659777

$s2.prior
[1] 0.6719199

$s2.post
[1] 1.4973681 1.6546414 0.3868724 3.5739382 0.8563319
10387 more elements ...

$df.total
[1] 23.65978 23.65978 23.65978 23.65978 23.65978
10387 more elements ...

$t
   Contrasts
        ARvsCA     ARvsEU     ARvsAU     ARvsNZ
  1 -0.1937939 -0.2378606 -0.3853473 -0.2105622
  2  0.0000000 -4.3627269 -1.2572034 -1.2144966
  3  0.0000000  0.3288759  0.0000000  0.0000000
  4  0.0000000  1.0792898  0.0000000  0.5722939
  5  0.0000000  1.9118965  0.0000000  1.2338677
10387 more rows ...

$p.value
   Contrasts
       ARvsCA       ARvsEU    ARvsAU    ARvsNZ
  1 0.5071515 0.5252303689 0.4194142 0.4945336
  2 0.8719553 0.0001139932 0.1181080 0.1248538
  3 0.5476869 0.3794963643 0.7396859 0.5572903
  4 0.8441133 0.2050344009 0.7492352 0.3837871
  5 0.6637349 0.0347921880 0.5483615 0.1158894
10387 more rows ...

$treat.lfc
[1] 0.1375035

limma kegg • 1.2k views

ADD COMMENT • link updated 6.5 years ago by Gordon Smyth 52k • written 6.5 years ago by antoinefelden ▴ 10

0

Entering edit mode

Yes, it's easy enough. But you need to have generally recognized gene IDs (usually Entrez Gene Ids) order to run kegga(). Your gene_ids don't seem to be Entrez Ids. Have you just anonymized them?

ADD REPLY • link 6.5 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon,

Sorry I reported the wrong MArrayLM fit. Before using it as input for kegga(), I did change the gene_id into RefSeq identifiers as below:

$genes gene_id line test 1 105678280 1 no 2 105678292 2 no 3 105678279 3 no 4 105678278 4 no 5 105678296 5 no 10387 more rows ...

ADD REPLY • link 6.5 years ago antoinefelden ▴ 10

score 1 · Accepted Answer · 2018-09-28

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 17 hours ago

WEHI, Melbourne, Australia

If your data is mouse and fit$genes$gene_id contains Entrez Gene Ids, then you could proceed like this:

results <- decideTests(fit)
MyGeneSet <- fit$genes$gene_id[ rowSums( results != 0 ) == 4 ]
k <- kegga(MyGeneSet, universe=fit$genes$gene_id, species="Mm")
topKEGG(k)