Hi everyone, and prof Gordon Smyth
Pls help on how best to view two designs used for limma as below. The objective was to pool higher/secondary-level groups as well as first-level groups of samples within the design to get DGE.
So, with a design and the logCPM mean-variance output i.e. voom() function , Four people used the logic of normal designs and therefore added the 'higher/secondary-level' contrasts as below
ct<-makeContrasts(g2v1=(group2_dead+group2_alive) - (group1_dead+group1_alive),
g2v1dead=group2_dead - group1_dead , g2v1alive=group2_alive - group1_alive, status=(group1_dead+group2dead) - group1dead+group1alive, levels=design)
b<-eBayes( contrasts.fit( lmFit(data, design), contrasts=ct))
summary(decideTests(b))
sessionInfo( )
My question is : Does this approach have any form of interpretation from the resulting DE or it should be discarded completely in favour of division by numbers as below
ct <- makeContrasts(g2v1=(group2_dead+group2_alive)/2 - (group1_dead+group1_alive)/2 ,
g2v1dead=group2_dead - group1_dead , g2v1alive=group2_alive - group1_alive, status=(group1_dead+group2dead)/2 - (group1dead+group1alive)/2 , levels=design)
b<-eBayes( contrasts.fit( lmFit(data, design), contrasts=ct))
summary(decideTests(b))
sessionInfo( )
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] GEOmetadb_1.52.0 RSQLite_2.2.7 GSA_1.03.1 sva_3.38.0
[5] BiocParallel_1.24.1 genefilter_1.72.1 mgcv_1.8-31 nlme_3.1-148
[9] oligo_1.54.1 Biostrings_2.58.0 XVector_0.30.0 IRanges_2.24.1
[13] S4Vectors_0.28.1 oligoClasses_1.52.0 affy_1.68.0 forcats_0.5.1
[17] stringr_1.4.0 dplyr_1.0.6 purrr_0.3.4 readr_1.4.0
[21] tidyr_1.1.3 tibble_3.1.1 ggplot2_3.3.5 tidyverse_1.3.1
[25] limma_3.46.0 GEOquery_2.58.0 Biobase_2.50.0 BiocGenerics_0.36.1
Very many thanks for the reply! @ James MacDonald
Indeed it is probably unnecessary to do
g2v1=(group2_dead+group2_alive)
hence the question about interpretation vis-a-vis the concept of DE. Part of why I asked about interpretability is because there was a 'non-expert' querying me about the input are all sum of log dataI guess you are indicating that such is not interpretable
The two different contrast matrices you give will yield identical lists of DE genes, p-values and FDRs. The only difference will be in the log-fold-changes, which will differ by a factor of 2 for the third contrast. As long as you know what the logFCs mean, both choices lead to the same conclusions, but I would always myself use the mean-mean contrast instead of sum-sum.