Hi,
I'm a fairly new DESeq2 user and was hoping to get some help with how to correctly account for technical replicates in my dataset. Here is some example data:
suppressPackageStartupMessages(library("DESeq2"))
coldata <- data.frame(
'group' = factor(c(rep('group1', 4), rep('group2', 4))),
'sample' = factor(c(1, 1, 2, 2, 3, 3, 4, 4)),
'tech_rep' = factor(rep(c('rep1', 'rep2'), 4)),
row.names = paste0(rep('samp', 8), 1:8)
)
coldata
group sample tech_rep
samp1 group1 1 rep1
samp2 group1 1 rep2
samp3 group1 2 rep1
samp4 group1 2 rep2
samp5 group2 3 rep1
samp6 group2 3 rep2
samp7 group2 4 rep1
samp8 group2 4 rep2
cts <- data.frame(matrix(round(runif(n=8*50, min=0, max=20)), nrow=50))
colnames(cts) <- rownames(coldata)
We are interested in DEGs between groups 1 and 2, and we would like to account for the fact that each sample has two technical replicates. Initially, I thought the right way to do this was:
dds <- DESeqDataSetFromMatrix(
countData = cts,
colData = coldata,
design = ~tech_rep + group
)
which does not result in any error message, but then I realized that this design might suggest that all the "rep1" are from the same batch, and all "rep2" from another batch, which isn't true. I instead tried
dds <- DESeqDataSetFromMatrix(
countData = cts,
colData = coldata,
design = ~sample + tech_rep + group
)
converting counts to integer mode
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
Please read the vignette section 'Model matrix not full rank':
vignette('DESeq2')
but this doesn't work. I have read the vignette section "Group-specific condition effects, individuals nested within groups" and tried to apply it to my own data.
#make a simple object in which to replace design matrix later
dds <- DESeqDataSetFromMatrix(
countData = cts,
colData = coldata,
design = ~group
)
dds$samp.n <- factor(rep(rep(1:2,each=2),2))
dds@design <- ~group + group:samp.n + group:tech_rep
dds <- DESeq(dds)
results(dds)
log2 fold change (MLE): groupgroup2.tech reprep2
Wald test p-value: groupgroup2.tech reprep2
DataFrame with 50 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
1 8.64649 0.114671 1.01386 0.113103 0.90994904 0.968031
2 11.35877 -0.612314 0.98719 -0.620259 0.53508701 0.968031
3 9.09778 0.184718 0.98820 0.186923 0.85172066 0.968031
4 11.67832 -0.497979 0.95027 -0.524040 0.60025100 0.968031
5 7.12506 -5.162355 1.95858 -2.635770 0.00839465 0.419610
... ... ... ... ... ... ...
46 5.66848 0.657364 1.65819 0.396434 0.691785 0.968031
47 10.78412 0.627660 1.18219 0.530930 0.595467 0.968031
48 6.25440 0.870858 1.62538 0.535787 0.592106 0.968031
49 10.59046 0.369202 1.10770 0.333305 0.738904 0.968031
50 6.28831 1.329147 1.50087 0.885587 0.375840 0.942738
Is this correct? When I try it on my actual data, I don't seem to get any padj < 0.05, whereas I get a number of genes if I simply ignore the technical replicates and treat all samples as biological replicates.
I would be very grateful for any help with this.
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 13.3.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.36.0 SummarizedExperiment_1.26.1 Biobase_2.56.0 MatrixGenerics_1.8.1
[5] matrixStats_0.62.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.4 IRanges_2.30.1
[9] S4Vectors_0.34.0 BiocGenerics_0.42.0
loaded via a namespace (and not attached):
[1] KEGGREST_1.36.3 genefilter_1.78.0 locfit_1.5-9.6 tidyselect_1.2.0 splines_4.2.0
[6] lattice_0.20-45 generics_0.1.3 colorspace_2.0-3 vctrs_0.5.1 utf8_1.2.2
[11] blob_1.2.3 XML_3.99-0.11 survival_3.3-1 rlang_1.0.6 pillar_1.8.1
[16] glue_1.6.2 DBI_1.1.3 BiocParallel_1.30.4 bit64_4.0.5 RColorBrewer_1.1-3
[21] GenomeInfoDbData_1.2.8 lifecycle_1.0.3 zlibbioc_1.42.0 Biostrings_2.64.1 munsell_0.5.0
[26] gtable_0.3.1 codetools_0.2-18 memoise_2.0.1 geneplotter_1.74.0 fastmap_1.1.0
[31] parallel_4.2.0 fansi_1.0.3 AnnotationDbi_1.58.0 Rcpp_1.0.9 xtable_1.8-4
[36] scales_1.2.1 cachem_1.0.6 DelayedArray_0.22.0 annotate_1.74.0 XVector_0.36.0
[41] bit_4.0.4 ggplot2_3.4.0 png_0.1-7 dplyr_1.0.10 grid_4.2.0
[46] cli_3.4.1 tools_4.2.0 bitops_1.0-7 magrittr_2.0.3 RCurl_1.98-1.9
[51] RSQLite_2.2.18 tibble_3.1.8 pkgconfig_2.0.3 crayon_1.5.2 Matrix_1.5-1
[56] assertthat_0.2.1 httr_1.4.4 rstudioapi_0.14 R6_2.5.1 compiler_4.2.0