Main effect on multiple comparisons in limma
1
0
Entering edit mode
@035265c1
Last seen 10 months ago
Denmark

Dear limma experts,

I am running a limma analysis where I am testing for differential expression analysis and I have two variables I want to test. One is tissue and can be muscle or adipose_tissue, the other one is timepoint, which can be pre and post. When entering my contrasts, if I define my groupings as a combination of both variables (muscle_pre, muscle_post, AT_pre, AT_post), I understand that the main effect of timepoint should be written as:

(muscle_post + AT_post)/2 - (muscle_pre + AT_pre)/2

Nonetheless, if I only group my samples as pre and post (ingoring tissue) and use this contrast:

post - pre

The results differsubstantially. My first contrast gives many more significantly different genes. When using the multiple comparison approach I get 853 significantly different proteins, whereas when using the post - pre contrast I get 280.

Why is that?

Many thanks in advance!

Roger

First contrast:


metadata <- data.frame(
    "sample_id" = colnames(data_matrix),
    "grouping" = rep(c(
        "AT_post",
        "AT_pre",
        "muscle_post",
        "muscle_pre"
    ), 8),
    "subject_id" = c(
        rep("S01", 4),
        rep("S02", 4),
        rep("S03", 4),
        rep("S04", 4),
        rep("S05", 4),
        rep("S06", 4),
        rep("S07", 4),
        rep("S08", 4)
    )
)

design_matrix <- model.matrix(~ 0 + metadata$grouping + metadata$subject_id)

colnames(design_matrix) <- c(
    "AT_post",
    "AT_pre",
    "muscle_post",
    "muscle_pre",
    "S02",
    "S03",
    "S04",
    "S05",
    "S06",
    "S07",
    "S08"
)

# Define the contrasts:

contrast_matrix <- limma::makeContrasts(
    muscleVsAT = (muscle_pre + muscle_post) / 2 - (AT_pre + AT_post) / 2,
    PostVsPre = (muscle_post + AT_post) / 2 - (muscle_pre + AT_pre) / 2,
    baselinemuscleVsAT = muscle_pre - AT_pre,
    musclePostVsPre = muscle_post - muscle_pre,
    ATPostVsPre = AT_post - AT_pre,
    levels = colnames(design_matrix)
)


fit_1 <- limma::eBayes(
    limma::lmFit(
        data_matrix,
        design_matrix
    )
)

ebayes_fit <- limma::eBayes(
    limma::contrasts.fit(
        fit_1,
        contrast_matrix
    )
)

# Extracting results:

DE_results <- limma::topTable(
    ebayes_fit,
    number = Inf
)

length(DE_results |> 
           dplyr::filter(adj.P.Val < 0.05) |> 
           dplyr::pull(genes))

[1] 853

Second contrast:

metadata <- data.frame(
    "sample_id" = colnames(data_matrix),
    "grouping" = rep(c(
        "post",
        "pre"
    ), 16),
    "subject_id" = c(
        rep("S01", 4),
        rep("S02", 4),
        rep("S03", 4),
        rep("S04", 4),
        rep("S05", 4),
        rep("S06", 4),
        rep("S07", 4),
        rep("S08", 4)
    )
)

design_matrix <- model.matrix(~ 0 + metadata$grouping + metadata$subject_id)

colnames(design_matrix) <- c(
    "post",
    "pre",
    "S02",
    "S03",
    "S04",
    "S05",
    "S06",
    "S07",
    "S08"
)

# Define the contrasts:

contrast_matrix <- limma::makeContrasts(
    PostVsPre = post - pre,
    levels = colnames(design_matrix)
)


fit_1 <- limma::eBayes(
    limma::lmFit(
        data_matrix,
        design_matrix
    )
)

ebayes_fit <- limma::eBayes(
    limma::contrasts.fit(
        fit_1,
        contrast_matrix
    )
)

# Extracting results:

DE_results <- limma::topTable(
    ebayes_fit,
    number = Inf
)

length(DE_results |> 
           dplyr::filter(adj.P.Val < 0.05) |> 
           dplyr::pull(genes))

[1] 280

sessionInfo( )

R version 4.3.2 (2023-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.utf8 [2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8 [4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8

time zone: Europe/Copenhagen tzcode source: internal

attached base packages: [1] stats graphics grDevices datasets utils methods
[7] base

other attached packages: [1] usethis_2.2.2

loaded via a namespace (and not attached): [1] bit_4.0.5 jsonlite_1.8.8 gtable_0.3.4
[4] limma_3.58.1 dplyr_1.1.4 compiler_4.3.2
[7] renv_1.0.3 crayon_1.5.2 Rcpp_1.0.12
[10] tidyselect_1.2.0 parallel_4.3.2 tidyr_1.3.1
[13] scales_1.3.0 yaml_2.3.8 fastmap_1.1.1
[16] statmod_1.5.0 here_1.0.1 ggplot2_3.4.4
[19] R6_2.5.1 labeling_0.4.3 pak_0.7.1
[22] generics_0.1.3 htmlwidgets_1.6.4 ggrepel_0.9.5
[25] tibble_3.2.1 munsell_0.5.0 rprojroot_2.0.4
[28] pillar_1.9.0 tzdb_0.4.0 rlang_1.1.3
[31] utf8_1.2.4 fs_1.6.3 lazyeval_0.2.2
[34] bit64_4.0.5 viridisLite_0.4.2 plotly_4.10.4
[37] cli_3.6.2 withr_3.0.0 magrittr_2.0.3
[40] crosstalk_1.2.1 digest_0.6.34 grid_4.3.2
[43] vroom_1.6.5 rstudioapi_0.15.0 lifecycle_1.0.4
[46] vctrs_0.6.5 glue_1.7.0 data.table_1.14.10 [49] farver_2.1.1 fansi_1.0.6 colorspace_2.1-0
[52] httr_1.4.7 purrr_1.0.2 ellipsis_0.3.2
[55] tools_4.3.2 pkgconfig_2.0.3 htmltools_0.5.7

```

limmaGUI lim MultipleComparison multip • 459 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 day ago
WEHI, Melbourne, Australia

The first analysis adjusts for differences between the two tissues (muscle and adipose) whereas the second doesn't. The first approach will therefore produce a lower residual variance and hence more DE genes.

To put the question around the other way, how could you reasonably expect to misrepresent the nature of your data, ignoring the differences between the two tissues, and still get good results?

By the way, I don't recommend main effects. In my opinion, conditional effects are more useful than main effects for genomic analyses, see for example: How does one test for interactions in EdgeR?

ADD COMMENT

Login before adding your answer.

Traffic: 597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6