Hi all,
I'm doing metatranscriptomics differential expression analysis in R studio. I have 1-3 replicates per site and am trying to use edgeR. I want to compare Sites J and F, both when water is flowing (On) and not flowing (Off) with sites A-D, which are constant. However, I'm somewhat confused by my output and how to interpret this biologically or if my method of analysis was just incorrect... (Rounded numbers used below). How can OnvConstant and OffvConstant be similar to each other when BothvConstant, which include both On and Off samples in "Both," be different? I tried a few On/Off Site v Constant Site constrasts, like FvA, FvB, JvA. JvB, and these result in a large number of genes labeled as downregulated... While constant vs constant, like AvB, AvC, result in about a quarter up, half notsig, a quarter down. FvJ is mostly NotSig, with a few up and down.
fit.glm <- glmFit(d, design.mat)
glm.contrasts<-makeContrasts(
OnvSB=(J.On+F.On)-(A.Constant+B.Constant+C.Constant+D.Constant),
OffvSB=(J.Off+F.Off)-(A.Constant+B.Constant+C.Constant+D.Constant),
BothVConstant=(J.On+F.On+J.Off+F.Off)-(A.Constant+B.Constant+C.Constant+D.Constant),
levels=design.mat)
OnVConstant.glm <- glmLRT(fit.glm, contrast=glm.contrasts[,"OnVConstant.glmt"])
OnVConstant.glmDE <- decideTestsDGE(OnVConstant.glm, adjust.method = "fdr", p.value=0.1)
summary(OnVConstant.glmDE)
#output
Down 5
NotSig 300
Up 4600
#
OffVConstant.glm <- glmLRT(fit.glm, contrast=glm.contrasts[,"OffVConstant"])
OffVConstant.glmDE <- decideTestsDGE(OffVConstant.glm, adjust.method = "fdr", p.value=0.1)
summary(OffVConstant.glmDE)
#output:
Down 10
NotSig 695
Up 4200
#
BothVConstant.glm <- glmLRT(fit.glm, contrast=glm.contrasts[,"BothVConstant"])
BothVConstant.glmDE <- decideTestsDGE(BothVConstant.glm, adjust.method = "fdr", p.value=0.1)
summary(BothVConstant.glmDE)
#output:
Down 800
NotSig 1700
Up 2405
sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0
[3] dplyr_1.0.8 purrr_0.3.4
[5] readr_2.1.2 tidyr_1.2.0
[7] tibble_3.1.6 ggplot2_3.3.5
[9] tidyverse_1.3.1 viridis_0.6.2
[11] viridisLite_0.4.0 pheatmap_1.0.12
[13] argparse_2.1.6 ape_5.6-2
[15] gplots_3.1.3 ctc_1.64.0
[17] amap_0.8-18 DESeq2_1.30.1
[19] SummarizedExperiment_1.20.0 Biobase_2.50.0
[21] MatrixGenerics_1.2.1 matrixStats_0.61.0
[23] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
[25] IRanges_2.24.1 S4Vectors_0.28.1
[27] BiocGenerics_0.36.1 edgeR_3.32.1
[29] limma_3.46.0 BiocManager_1.30.16
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 ellipsis_0.3.2
[3] XVector_0.30.0 fs_1.5.0
[5] rstudioapi_0.13 bit64_4.0.5
[7] lubridate_1.8.0 AnnotationDbi_1.52.0
[9] fansi_1.0.2 xml2_1.3.3
[11] codetools_0.2-16 splines_4.0.2
[13] cachem_1.0.6 geneplotter_1.68.0
[15] ade4_1.7-19 jsonlite_1.8.0
[17] phyloseq_1.34.0 broom_0.7.12
[19] annotate_1.68.0 cluster_2.1.0
[21] dbplyr_2.1.1 compiler_4.0.2
[23] httr_1.4.2 backports_1.4.1
[25] assertthat_0.2.1 Matrix_1.3-2
[27] fastmap_1.1.0 cli_3.1.0
[29] tools_4.0.2 igraph_1.2.11
[31] gtable_0.3.0 glue_1.5.0
[33] GenomeInfoDbData_1.2.4 reshape2_1.4.4
[35] Rcpp_1.0.8 cellranger_1.1.0
[37] vctrs_0.3.8 Biostrings_2.58.0
[39] zCompositions_1.4.0-1 rhdf5filters_1.2.1
[41] multtest_2.46.0 nlme_3.1-155
[43] iterators_1.0.14 rvest_1.0.2
[45] lifecycle_1.0.1 gtools_3.9.2
[47] XML_3.99-0.9 zlibbioc_1.36.0
[49] MASS_7.3-55 scales_1.1.1
[51] vroom_1.5.7 hms_1.1.1
[53] biomformat_1.18.0 rhdf5_2.34.0
[55] RColorBrewer_1.1-2 memoise_2.0.1
[57] gridExtra_2.3 NADA_1.6-1.1
[59] stringi_1.7.6 RSQLite_2.2.10
[61] genefilter_1.72.1 foreach_1.5.2
[63] permute_0.9-7 caTools_1.18.2
[65] BiocParallel_1.24.1 truncnorm_1.0-8
[67] rlang_1.0.2 pkgconfig_2.0.3
[69] bitops_1.0-7 lattice_0.20-41
[71] Rhdf5lib_1.12.1 bit_4.0.4
[73] tidyselect_1.1.2 plyr_1.8.6
[75] magrittr_2.0.1 R6_2.5.1
[77] generics_0.1.2 DelayedArray_0.16.3
[79] DBI_1.1.2 withr_2.5.0
[81] pillar_1.7.0 haven_2.4.3
[83] mgcv_1.8-39 survival_3.3-1
[85] RCurl_1.98-1.6 modelr_0.1.8
[87] crayon_1.5.0 KernSmooth_2.23-17
[89] utf8_1.2.2 tzdb_0.2.0
[91] ALDEx2_1.22.0 readxl_1.3.1
[93] locfit_1.5-9.5 grid_4.0.2
[95] data.table_1.14.2 blob_1.2.2
[97] vegan_2.5-7 reprex_2.0.1
[99] xtable_1.8-4 munsell_0.5.0
Thanks! The results m make more sense this way, and averages is what I was hoping to compare. Is it ok that each one has a different number of replicates (J.On has 3, F.On has 1, A.Constant has 2, etc)? I'm unsure how to control for that.
Yes, it is ok that different groups have different numbers of replicates. edgeR adjusts for that automatically. However you may get somewhat more significantly DE genes when comparing groups with more replicates because of the increased statistical power.