Question

Help with design matrix dealing with 3 variables (DESeq2)

0

Entering edit mode

martin.weihrauch ▴ 20

@martinweihrauch-13664

Last seen 7.0 years ago

I have an RNA-Seq dataset with 3 variables and 45 samples:

muscle (A or B)

genotype (WT or TG)

region (1 or 2)

So I want to compare e.g. WT/A/1 against WT/A/2, but also all other possible comparisons.

So far I've used the following design:

dds <- DESeqDataSetFromMatrix(counttable, colData = coldata, design= ~ genotype + muscle + region)
dds <- dds[rowSums(counts(dds)) > 1, ] # Removes rows with zero reads in all samples
dds <- DESeq(dds) # Runs DESeq2

dds2 <- dds # Copy dds to dds2
dds2$group <- factor(paste0(dds2$genotype, dds2$muscle, dds2$region))
design(dds2) <- ~ group
dds2 <- DESeq(dds2)

resultsNames(dds2) # Here I get all the possible groups and then I could finally contrast them

results_WT/A/1_vs_WT/A/2 <- results(dds2, contrast=c("group", "WT/A/1", "WT/A/2"))

However, if I do the whole analysis with only one set of conditions I'd liek to compare, my differential expression list will look quite a lot different. e.g. I only use featureCounts on those .bam files that correspond to WT/A for region 1 and 2, so I can easily contrast WT/A/1 vs. WT/A/2. Just the result is quite different from using the design I showed above.

Am I doing this correctly in general, and which method should I use to correctly interpret my RNA-Seq dataset specifically (~ group design or doing the comparisons one by one with a simple design e.g. design = ~ region ... and only using the files for one condition).

Thanks.

deseq2 rna-seq • 3.6k views

ADD COMMENT • link updated 7.7 years ago by Michael Love 43k • written 7.7 years ago by martin.weihrauch ▴ 20

0

Entering edit mode

Can you give more description on what region?

If you consider only Muscle type and genotype this design will suite good design=~ genotype+ muscle +muscle:genotype

You will have the expression across the genotypes in different muscle groups

ADD REPLY • link 7.7 years ago Bio_Ram • 0

0

Entering edit mode

The region is the most important part, as it is laser-capture microdissected muscle tissue.

It would be ideal to be able to compare the two different types of muscle at the region of interest vs. the control region. The genotype would be maybe a little secondary for now, but still I want to see whether the genotype brings changes in the region of interest and between the muscles.

ADD REPLY • link 7.7 years ago martin.weihrauch ▴ 20

score 0 · Answer 1 · 2017-08-05

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

There doesn't seem to be any reason to run DESeq() twice here. The results from the first run are wiped out by the second. The first line with DESeq() can be deleted.

"the result is quite different [running altogether or as separate groups]"

This is actually quite a common question, and it's listed as a FAQ in the DESeq2 vignette.

The general recommendation is to run with all the samples together, but I have more specific advice in the vignette.

ADD COMMENT • link 7.7 years ago Michael Love 43k

0

Entering edit mode

Dear Michael,

Thanks for your advice. I'll read the vignette more carefully.

ADD REPLY • link 7.7 years ago martin.weihrauch ▴ 20