Question

DESeq2 - nested samples (individuals) - time course - conditions

0

Entering edit mode

gaio.transposon • 0

@gaiotransposon-22414

Last seen 5.4 years ago

Hi,

I am using DESeq2, but unsure whether or not I am using the right design. I could not find here anything similar.

subject: A, B, C, D, E, F cohort: ctrl, treatment time: 0, 1

As you can see subjects belong to either cohort (ctrl or treatment) and each subject is sampled at multiple time points.

Is there a way in DESeq2 to "block" for subjects? In my case (my data) base counts can vary substantially among subjects, and I am trying to find genes that behave differently in time, depending on the treatment (cohort).

Q1: Would this design work for me?

design(dds) <- ~ time + cohort + time:cohort
dds <- DESeq(dds, test="LRT", reduced = ~ time)

Q2: How can I implement the "blocking" of subjects? If you are going to suggest duplicateCorr from voom/limma, at what point of the DESeq2 analysis should I do that?

Q3: Do I really need to exclude samples when cohorts are unbalanced (e.g. 30 subjects vs 24 ?) It would be better if DESeq2 could do this, for instance by randomly excluding 6 subjects and re-running this analysis multiple times, rather than me manually discarding data from 6 samples... not sure if possible..?

extra Q : Would I also be able to find genes going up/down the same direction in both cohorts, when the slope is higher in one cohort than in the other? would those also be detected as DE?

Thank you for taking the time! Dany

deseq2 nested de • 1.3k views

ADD COMMENT • link updated 5.4 years ago by Michael Love 43k • written 5.4 years ago by gaio.transposon • 0

score 0 · Answer 1 · 2019-11-21

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 hours ago

United States

This is answered in the vignette, see “individual nested within groups”.

ADD COMMENT • link 5.4 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

I did try to apply the example from the vignette, but my output is not making sense when I plot it. For instance, I am getting genes that are DE at the same time point, while I am looking for genes that behave different in time between cohorts. This is my code so far:

m3 <- model.matrix(~ cohort + cohort:ind.n + cohort:date, new_coldata)
m4 <- model.matrix(~ date, new_coldata)

dds3 <- DESeqDataSetFromMatrix(countData = dds2,
                          colData = new_coldata,
                          design = m3)

dds3$cohort <- relevel(dds3$cohort, ref = "Control")

cts <- counts(dds3)
geoMeans2 <- apply(cts, 1, function(row) if (all(row == 0)) 0 else exp(sum(log(row[row != 0]))/length(row)))
dds3 <- estimateSizeFactors(dds3, geoMeans=geoMeans2)
eff <- DESeq(dds, "LRT", full = m1, reduced = m2)

this is what I get with resultsNames(eff): [1] "Intercept" "cohortNeomycin" "cohortControl.ind.n2" "cohortNeomycin.ind.n2" [5] "cohortControl.ind.n3" "cohortNeomycin.ind.n3" "cohortControl.ind.n4" "cohortNeomycin.ind.n4" [9] "cohortControl.ind.n5" "cohortNeomycin.ind.n5" "cohortControl.ind.n6" "cohortNeomycin.ind.n6" [13] "cohortControl.ind.n7" "cohortNeomycin.ind.n7" "cohortControl.ind.n8" "cohortNeomycin.ind.n8" [17] "cohortControl.ind.n9" "cohortNeomycin.ind.n9" "cohortControl.ind.n10" "cohortNeomycin.ind.n10" [21] "cohortControl.ind.n11" "cohortNeomycin.ind.n11" "cohortControl.ind.n12" "cohortNeomycin.ind.n12" [25] "cohortControl.ind.n13" "cohortNeomycin.ind.n13" "cohortControl.ind.n14" "cohortNeomycin.ind.n14" [29] "cohortControl.ind.n15" "cohortNeomycin.ind.n15" "cohortControl.ind.n16" "cohortNeomycin.ind.n16" [33] "cohortControl.ind.n17" "cohortNeomycin.ind.n17" "cohortControl.ind.n18" "cohortNeomycin.ind.n18" [37] "cohortControl.ind.n19" "cohortNeomycin.ind.n19" "cohortControl.ind.n20" "cohortNeomycin.ind.n20" [41] "cohortControl.ind.n21" "cohortNeomycin.ind.n21" "cohortControl.ind.n22" "cohortNeomycin.ind.n22" [45] "cohortControl.ind.n23" "cohortNeomycin.ind.n23" "cohortControl.datet2" "cohortNeomycin.datet2"

eff3 <- DESeq(dds3, "LRT", full = m3, reduced = m4)

Could you please tell me if the analysis above is correct to the scope of finding DE genes between cohorts? Then at least I would know if the problem is the setup or if it lies within my data.

thank you

ADD REPLY • link 5.4 years ago gaio.transposon • 0

0

Entering edit mode

I would recommend speaking to a statistician about how to formulate your hypotheses.

Above the LRT you are using (also read the vignette section on the LRT), is asking for which genes are the cohort, individual, or cohort-specific date effects not equal to zero. I'm sure that's actually a lot of genes, and not the ones you are interested in. This is a really important part of the analysis, maybe more important than e.g. how the sequencing was done, so it's worth a conversation with a statistician or someone who is familiar with linear models and testing.

ADD REPLY • link 5.4 years ago Michael Love 43k