DESeq - help with pair wise comparisons
0
0
Entering edit mode
bd2000 ▴ 30
@5d657c1d
Last seen 3 hours ago
United Kingdom

Hi

I was wondering what the best practice was to do pair-wise comparisons e.g. I want to compare control vs. treatment at followup, whilst taking individual gene expression at baseline into account. Is this correct?


#my dataset 

       Group Timepoint Participant_ID
1  Treatment  Baseline             01
2  Treatment  Followup             01
3  Treatment  Baseline             02
4  Treatment  Followup             02
5  Treatment  Baseline             03
6  Treatment  Followup             03
7  Treatment  Baseline             04
8  Treatment  Followup             04
9  Treatment  Baseline             05
10 Treatment  Followup             05
...
28   Control  Followup             14
29   Control  Baseline             15
30   Control  Followup             15
31   Control  Baseline             16
32   Control  Followup             16
33   Control  Baseline             17
34   Control  Followup             17
35   Control  Baseline             18
36   Control  Followup             18
37   Control  Baseline             19
38   Control  Followup             19

#if you want to replicate it
sampleinfo <- data.frame(
  Group = rep(rep(c("Treatment", "Control"), times = c(10, 9)), each = 2),
  Timepoint = rep(c("Baseline", "Followup"), times = 19),
  Participant_ID = rep(paste0(sprintf("%02d", 1:19)), each = 2)
)

sampleinfo$Participant_ID<-factor(sampleinfo$prefix)
sampleinfo$Group <- factor(sampleinfo$Group, levels = c("Control", "Treatment")) 
sampleinfo$Timepoint <- factor(sampleinfo$Timepoint, levels = c("Baseline", "Followup")) 

model<- as.formula(~ Participant_ID + Timepoint * Group)

ddsObj.raw <- DESeqDataSetFromTximport(txi = txi,
                                       colData = sampleinfo,
                                       design = model)

The issue when I try to do that, this error message comes up:


Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

Am I right to think this is happening because one of the formula terms might be "redundant" (e.g. group is already explained by participant_ID)? I saw some online posts on "nesting" but I don't fully get how that works / how to proceed. I'm also not sure if the interaction term in the model is correct, are we expecting Treatment and Followup to have an interaction effect?

I'd appreciate any feedback or help as I'm new to all of this!

Thank you!

> sessionInfo()
R version 4.4.3 (2025-02-28)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3.2

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggrepel_0.9.6               ggfortify_0.4.17            lubridate_1.9.4             forcats_1.0.0              
 [5] stringr_1.5.1               dplyr_1.1.4                 purrr_1.0.4                 readr_2.1.5                
 [9] tidyr_1.3.1                 tibble_3.2.1                ggplot2_3.5.1               tidyverse_2.0.0            
[13] DESeq2_1.44.0               SummarizedExperiment_1.34.0 Biobase_2.64.0              MatrixGenerics_1.16.0      
[17] matrixStats_1.5.0           GenomicRanges_1.56.2        GenomeInfoDb_1.40.1         IRanges_2.38.1             
[21] S4Vectors_0.42.1            BiocGenerics_0.50.0         tximport_1.32.0            

loaded via a namespace (and not attached):
 [1] gtable_0.3.6            lattice_0.22-6          tzdb_0.5.0              vctrs_0.6.5             tools_4.4.3            
 [6] generics_0.1.3          parallel_4.4.3          pkgconfig_2.0.3         Matrix_1.7-3            lifecycle_1.0.4        
[11] GenomeInfoDbData_1.2.12 farver_2.1.2            compiler_4.4.3          munsell_0.5.1           codetools_0.2-20       
[16] pillar_1.10.1           crayon_1.5.3            BiocParallel_1.38.0     DelayedArray_0.30.1     abind_1.4-8            
[21] tidyselect_1.2.1        locfit_1.5-9.12         stringi_1.8.7           labeling_0.4.3          grid_4.4.3             
[26] colorspace_2.1-1        cli_3.6.4               SparseArray_1.4.8       magrittr_2.0.3          S4Arrays_1.4.1         
[31] withr_3.0.2             scales_1.3.0            UCSC.utils_1.0.0        bit64_4.6.0-1           timechange_0.3.0       
[36] XVector_0.44.0          httr_1.4.7              bit_4.6.0               gridExtra_2.3           hms_1.1.3              
[41] rlang_1.1.5             Rcpp_1.0.14             glue_1.8.0              rstudioapi_0.17.1       vroom_1.6.5            
[46] jsonlite_2.0.0          R6_2.6.1                zlibbioc_1.50.0
DESeq2 • 230 views
ADD COMMENT
0
Entering edit mode

Thanks that's great!! I've followed the instructions from the link you sent, is my code correct? By correct I mean, is my code comparing followup between the two groups taking into account individual differences at baseline?

sampleinfo <- sampleinfo %>%
  group_by(Group) %>%
  mutate(Participant.n = as.integer(factor(Participant_ID)))

sampleinfo$Participant.n <- factor(sampleinfo$Participant.n) 

simple.model<- as.formula(~ Participant.n + Timepoint) #for now 

ddsObj.raw <- DESeqDataSetFromTximport(txi = txi,
                                       colData = sampleinfo,
                                       design = simple.model)

model_matrix<-model.matrix(~ Group + Group:Participant.n + Group:Timepoint, sampleinfo)
correct_model_matrix <- model_matrix[, !colnames(model_matrix) %in% "GroupTreatment:Participant.n10"] #I have unbalanced groups 

design(ddsObj.raw) <- correct_model_matrix

ddsObj <- DESeq(ddsObj.raw)

results <- results(ddsObj, contrast=list("GroupTreatment.Followup", "GroupControl.Followup"))

#upregulated
sum(results$log2FoldChange > 0 & 
      results$padj <0.05, na.rm = TRUE)

#downregulated
sum(results$log2FoldChange < 0 & 
      results$padj < 0.05, na.rm = TRUE)
ADD REPLY
0
Entering edit mode

Follow up question, is this the correct way to extract results?

results <- results(ddsObj, list(c("GroupControl.Followup", "GroupTreatment.Followup")), alpha = 0.05 )
ADD REPLY

Login before adding your answer.

Traffic: 913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6