Dear Bioconductor community,
I have time-course Nanostring data and despite having looked at the vignette and several examples I can't figure out how to make the proper design or if I can use Deseq2 at all for my problem. I have 2 groups: Con and Bio, that had blood drawn in 4-time points (0h, 0.5h, 3h, 8h). The problem here is that the Con group has only 2 subjects, whereas the Bio has 3. So every time I try to make the matrix design I get "Not a Full Rank". Looking at the vignette, this is happening due to the difference in samples. I have tried to follow the "Levels without samples" in the vignette, but still can't figure out the design. Is there a way to run this in Deseq2?
countData= read.table("Counts.txt", header=T,row.names=1)
colData= read.table ("Groups.txt", header=T, row.names = 1, stringsAsFactors = TRUE)
as.data.frame(colData)
Groups Time Ind Ind.n
X20201013_30102498530821.01_B1_01 bio 0h 1 1
X20201013_30102498530821.01_B10_04 bio 8h 1 1
X20201013_30102498530821.01_B4_02 bio 0.5h 1 1
X20201013_30102498530821.01_B7_03 bio 3h 1 1
X20201013_30102498530821.01_C3_07 con 0h 4 2
X20201013_30102498530821.01_C5_08 con 0.5h 4 2
X20201013_30102498530821.01_C7_09 con 3h 4 2
X20201013_30102498530821.01_C9_10 con 8h 4 2
X20201014_30102498540821.01_B11_04 bio 8h 2 1
X20201014_30102498540821.01_B2_01 bio 0h 2 1
X20201014_30102498540821.01_B5_02 bio 0.5h 2 1
X20201014_30102498540821.01_B8_03 bio 3h 2 1
X20201014_30102498540821.01_C10_10 con 8h 5 2
X20201014_30102498540821.01_C4_07 con 0h 5 2
X20201014_30102498540821.01_C6_08 con 0.5h 5 2
X20201014_30102498540821.01_C8_09 con 3h 5 2
X20201016_30102498550821.01_B12_04 bio 8h 3 1
X20201016_30102498550821.01_B3_01 bio 0h 3 1
X20201016_30102498550821.01_B6_02 bio 0.5h 3 1
X20201016_30102498550821.01_B9_03 bio 3h 3 1
m1<-model.matrix(~Groups + Time + Groups:Ind.n + Groups:Time, colData)
colnames(m1)
unname(m1)
all.zero <- apply(m1, 2, function(x) all(x==0))
all.zero
idx <- which(all.zero)
m1 <- m1[,-idx]
unname(m1)
m0<-model.matrix(~ Groups + Time + Groups:Ind.n, colData)
dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData, design= m1)
dds <- DESeq(dds, full= m1, reduced= m0, test="LRT")
> sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.28.1 SummarizedExperiment_1.18.2 DelayedArray_0.14.1 matrixStats_0.57.0
[5] Biobase_2.48.0 GenomicRanges_1.40.0 GenomeInfoDb_1.24.2 IRanges_2.22.2
[9] S4Vectors_0.26.1 BiocGenerics_0.34.0
loaded via a namespace (and not attached):
[1] genefilter_1.70.0 tidyselect_1.1.0 locfit_1.5-9.4 purrr_0.3.4 splines_4.0.2
[6] lattice_0.20-41 colorspace_1.4-1 vctrs_0.3.4 generics_0.1.0 blob_1.2.1
[11] survival_3.2-7 XML_3.99-0.5 rlang_0.4.7 pillar_1.4.6 glue_1.4.2
[16] DBI_1.1.0 BiocParallel_1.22.0 bit64_4.0.5 RColorBrewer_1.1-2 GenomeInfoDbData_1.2.3
[21] lifecycle_0.2.0 zlibbioc_1.34.0 munsell_0.5.0 gtable_0.3.0 memoise_1.1.0
[26] geneplotter_1.66.0 AnnotationDbi_1.50.3 Rcpp_1.0.5 xtable_1.8-4 scales_1.1.1
[31] annotate_1.66.0 XVector_0.28.0 bit_4.0.4 ggplot2_3.3.2 digest_0.6.25
[36] dplyr_1.0.2 grid_4.0.2 tools_4.0.2 bitops_1.0-6 magrittr_1.5
[41] RCurl_1.98-1.2 RSQLite_2.2.0 tibble_3.0.3 crayon_1.3.4 pkgconfig_2.0.3
[46] ellipsis_0.3.1 Matrix_1.2-18 rstudioapi_0.11 R6_2.5.0 compiler_4.0.2
Thank you, Michael, but I wasn't clear. Sorry about that. Doing the procedure described above I still get the "Not Full Rank" message. I don't know what I'm doing wrong, could you help me to clarify that?
Looks like there are two errors. First, if I understand what you are trying to do, your model matrix is wrong, because your Ind.n column is wrong. It should probably look like this:
And it's still 'wrong' inasmuch as the reference level is 0.5 hours rather than 0 hours, but I leave that to the OP to fix.
Thank you so much, James. I see what you mean about the Ind.n in the matrix and that was exactly my problem. I appreciate it.
I’d suggest to collaborate with a statistician on choosing the right design and construction of the model matrix that can be estimated with your data.