Hi
I was wondering what the best practice was to do pair-wise comparisons e.g. I want to compare control vs. treatment at followup, whilst taking individual gene expression at baseline into account. Is this correct?
#my dataset
Group Timepoint Participant_ID
1 Treatment Baseline 01
2 Treatment Followup 01
3 Treatment Baseline 02
4 Treatment Followup 02
5 Treatment Baseline 03
6 Treatment Followup 03
7 Treatment Baseline 04
8 Treatment Followup 04
9 Treatment Baseline 05
10 Treatment Followup 05
...
28 Control Followup 14
29 Control Baseline 15
30 Control Followup 15
31 Control Baseline 16
32 Control Followup 16
33 Control Baseline 17
34 Control Followup 17
35 Control Baseline 18
36 Control Followup 18
37 Control Baseline 19
38 Control Followup 19
#if you want to replicate it
sampleinfo <- data.frame(
Group = rep(rep(c("Treatment", "Control"), times = c(10, 9)), each = 2),
Timepoint = rep(c("Baseline", "Followup"), times = 19),
Participant_ID = rep(paste0(sprintf("%02d", 1:19)), each = 2)
)
sampleinfo$Participant_ID<-factor(sampleinfo$prefix)
sampleinfo$Group <- factor(sampleinfo$Group, levels = c("Control", "Treatment"))
sampleinfo$Timepoint <- factor(sampleinfo$Timepoint, levels = c("Baseline", "Followup"))
model<- as.formula(~ Participant_ID + Timepoint * Group)
ddsObj.raw <- DESeqDataSetFromTximport(txi = txi,
colData = sampleinfo,
design = model)
The issue when I try to do that, this error message comes up:
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
Please read the vignette section 'Model matrix not full rank':
vignette('DESeq2')
Am I right to think this is happening because one of the formula terms might be "redundant" (e.g. group is already explained by participant_ID)? I saw some online posts on "nesting" but I don't fully get how that works / how to proceed. I'm also not sure if the interaction term in the model is correct, are we expecting Treatment and Followup to have an interaction effect?
I'd appreciate any feedback or help as I'm new to all of this!
Thank you!
> sessionInfo()
R version 4.4.3 (2025-02-28)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggrepel_0.9.6 ggfortify_0.4.17 lubridate_1.9.4 forcats_1.0.0
[5] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
[9] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0
[13] DESeq2_1.44.0 SummarizedExperiment_1.34.0 Biobase_2.64.0 MatrixGenerics_1.16.0
[17] matrixStats_1.5.0 GenomicRanges_1.56.2 GenomeInfoDb_1.40.1 IRanges_2.38.1
[21] S4Vectors_0.42.1 BiocGenerics_0.50.0 tximport_1.32.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 lattice_0.22-6 tzdb_0.5.0 vctrs_0.6.5 tools_4.4.3
[6] generics_0.1.3 parallel_4.4.3 pkgconfig_2.0.3 Matrix_1.7-3 lifecycle_1.0.4
[11] GenomeInfoDbData_1.2.12 farver_2.1.2 compiler_4.4.3 munsell_0.5.1 codetools_0.2-20
[16] pillar_1.10.1 crayon_1.5.3 BiocParallel_1.38.0 DelayedArray_0.30.1 abind_1.4-8
[21] tidyselect_1.2.1 locfit_1.5-9.12 stringi_1.8.7 labeling_0.4.3 grid_4.4.3
[26] colorspace_2.1-1 cli_3.6.4 SparseArray_1.4.8 magrittr_2.0.3 S4Arrays_1.4.1
[31] withr_3.0.2 scales_1.3.0 UCSC.utils_1.0.0 bit64_4.6.0-1 timechange_0.3.0
[36] XVector_0.44.0 httr_1.4.7 bit_4.6.0 gridExtra_2.3 hms_1.1.3
[41] rlang_1.1.5 Rcpp_1.0.14 glue_1.8.0 rstudioapi_0.17.1 vroom_1.6.5
[46] jsonlite_2.0.0 R6_2.6.1 zlibbioc_1.50.0
It's exactly this, no? https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups
Thanks that's great!! I've followed the instructions from the link you sent, is my code correct? By correct I mean, is my code comparing followup between the two groups taking into account individual differences at baseline?
Follow up question, is this the correct way to extract results?