I have a question about how DESeq2 handles design models which include a parameter that only applies to a single sample in the dataset. For context, we are using DESeq2 to capture differential chromatin accessibility due to age differences (adult vs juvenile) using CUT&RUN data from the venom glands of 5 rattlesnakes, including 1 adult male, 1 adult female, and 3 juvenile males. We tested the differential accessibility of peaks which were called using MACS2.
We initially tested how the DESeq2 design model behaved both with and without sex as a confounding parameter for age (see below), and found that including sex resulted in slightly fewer significant peaks, which we interpreted as being more conservative. We are currently responding to a reviewer that requested justification for including sex in our analyses. Can you please elaborate on 1.) how DESeq2 handles the lack of replicates by including sex in this case, and 2.) if our decision to include sex is justified technically and/or biologically?
(more context) -- This differential accessibility analysis using CUT&RUN data was performed alongside a primary RNA-seq differential expression analysis which we have much more robust sampling for (n=18), as well as a second differential accessibility test using ATAC-seq data, again with better sampling (n=8). Both the RNA-seq and ATAC-seq tests included venom gland samples collected from the same individuals used for CUT&RUN. Comparing PCA plots for the RNA-seq and ATAC-seq tests identified sex as an important contributor to both expression and accessibility variation, leading to our decision to account for sex when isolating the effects of age in all three analyses.
#Corresponding DESeq2 code:
# (1) model design with sex=
dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = directory,
design = ~ sex + age)
#choose keep peaks with count > 10 following tutorial
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
#set reference levels for factors
>dds$age <- relevel(dds$age, ref = "juvenile")
#Run Analysis
dds <- DESeq(dds)
########### which resulted in this summary output:
out of 94563 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 850, 0.9%
LFC < 0 (down) : 578, 0.61%
outliers [1] : 0, 0%
low counts [2] : 11001, 12%
(mean count < 24)
#
#
#
#
#
# Compared to running the model without sex (age only).....:
# (2) model design without sex=
dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = directory,
design = ~ age)
########### which resulted in this summary output:
out of 94563 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 2511, 2.7%
LFC < 0 (down) : 2494, 2.6%
outliers [1] : 0, 0%
low counts [2] : 1834, 1.9%
(mean count < 5)
sessionInfo( )
I can provide more info including the CUT&RUN dispersion plots, PCA plots, and MA-plots comparing with and without sex outputs if needed. Thank you for your time!
-Mike
You have 5 samples total. You might not even have enough power to see difference between ages. You don't have enough to see age and sex. I don't think your use of sex is justified here.