Hi,
I am doing DE analysis on three breed groups (FS, TX, F1) with two diet conditions (C, F, where C is the normal diet and F is the additional diet). I want to find DE genes between breeds (FS vs TX, FS vs F1 and TX vs F1) both with diet effect and without diet effect. In addition, I want to find that are differentially expressed within breed due to diet (FSc vs FSf, TXc vs TXf, F1c vs F1f). I designed the experiment like this:
sampleFiles<-list.files(directory, pattern = "*_counts") sampleinfo<-read.csv("SampleTable_mRNA_all.csv", header=TRUE) row.names(sampleinfo) sample<-sampleinfo$Sample breed<-sampleinfo$Breed diet<-sampleinfo$Diet id<-sampleinfo$SampleID sampletable<-data.frame(sampleName=sample, fileName=sampleFiles,breed=breed, diet=diet, id=id) group<-factor(paste(breed, diet, sep=".")) sampletable<-cbind(sampletable, group=group) ddsMat<-DESeqDataSetFromHTSeqCount(sampleTable = sampletable, directory = directory, design=~group) ddsMatcol<-collapseReplicates(ddsMat, sampletable$id) dds<-DESeq(ddsMatcol) res<-results(dds)
Then I made comparisions like this:
For example, within-breed diet effect on TX:
dTX_Diet<-results(dds, contrast=c("group", "TX.F", "TX.C"))
dTX_Diet_sig<-subset(dTX_Diet, padj< 0.05)
similarly, for identifying DE genes between FS and TX due to diet effect
dFS_TX<-results(dds, contrast=list(c("groupFS.F", "groupTX.C"), c("groupFS.C", "groupTX.F"))) dFS_TX_sig<-subset(dFS_TX, padj<0.05)
Could anybody please tell me if I would rather need to use LRT method?
In addition, I am interested to find DE genes in FS vs TX, FS vs F1 and TX vs F1 without including diet factor, in this case, how can I exclude the diet effect or the only way is to compare is that between FSc vs TXc and so on?
Kisun
Thanks a lot Mike. I am still confused with the later design... I found around 200 genes that are differentially expressed between FS and TX, can't there be some genes which were differentially expressed due to diet? One of the breed is responsive to diet while the other is not. I have 10 samples (4 normal diet + 6 flushing diet) from FS and 11 samples (5 normal diet + 6 flushing diet) from TX. Within breed diet effect showed that FS is not responsive to diet (i.e 0 de genes) whereas TX is responsive to diet (many de genes). Can I interpret results like this? Let's say,
FS vs TX (without considering diet effect) = 200
FS vs TX (taking diet as second factor) = 40
In order to find DE genes between breeds, I will simply take out those genes that were present in 40 from 200. It might not be the case that all 40 are also present in 200 but in a way it will be 200 - 40. Do you think this is a reasonable way to find DE genes between breeds that has nothing to do with diet conditions?
Hello Mike, I got the results but unable to figure out what the results are really trying to tell. The first comparision shows that there are very few differences between breeds due to diet differences and also the within-breed comparision in FS shows there is not diet effect in FS. However, FS.F vs TX.F shows that quite many genes are differentially expressed which included 117 genes upregulated in FS. I would be grateful if you could help me explain this:
FSvsTX(diet) = (FS.F - FS.C) - (TX.F - TX.C) = 35 (4up, 31down)
FS(diet) = FS.F - FS.C = 0
TX(diet) = TX.F - TX.C = 117 (71up, 46down)
Similarly,
FSvsTX(Flu) = FS.F - TX.F = 621 (117 up, 504 down)
FSvsTX(Con) = FS.C - TX.C = 3 (1up, 2 down)
Thanks!
I'd suggest you speak to someone locally for further assistance in interpreting results.
While I'm available as a software developer for supporting the software (if there are bugs, or if the documentation is not clear), and additionally to explain how to set up design formula and which contrasts can be used for various comparisons, the interpretations of results must be left to the analyst.
Hi Mike, finally I have been able to understand those results. Would it be possible to make a PCA plot for the subsets such as FS.F and TX.F? I am particularly interested to see the clustering for contrasts with many DE genes.
Yes just subset the dds using column indexing:
And then continue with the PCA plot instructions in the vignette or workflow.
Thanks! I slightly modified to include more than one groups.
Hi Mike,
Has there been any change/update in DESeq2 recently because of which I am unable to set up the contrast I mentioned in my question?
dFS_TX<-results(dds, contrast=list(c("groupFS.F", "groupTX.C"), c("groupFS.C", "groupTX.F")))
It was working fine until about a month ago and suddenly I get error
Error in checkContrast(contrast, resNames) :
all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)
Do you have any idea why I am getting this error or any solution to this? Thanks a lot. Kisun
hi,
Yes, we made an update to the default DESeq() steps regarding the way the coefficients are estimated. Any big change like this we put into the NEWS file, which you can find under Documentation on the package landing page, and I wrote a support post about it:
New function lfcShrink() in DESeq2
The short answer is: you can reproduce your old pipeline by running
DESeq(dds, betaPrior=TRUE)
.The longer answer is: for this comparison: whether the F/C ratio is different across FS and TX, I would probably set up an interaction model (where group is FS or TX and condition is F or C). It's in theory equivalent to your old code, although slightly different formulation and so don't expect identical results table, but this would be the way to do it using the new steps (edit: this last line won't work with the current release...)
Error in designAndArgChecker(object, betaPrior) : betaPrior=FALSE should be used for designs with interactions
colData looks like this:
I am using
R version 3.4.0 (2017-04-21) and DESeq2_1.16.1
Do you know why I am getting this error?
Thanks for this report. Sorry, I forgot that I hadn't implemented use of lfcShrink on interaction terms in the current release. I hope to have this in the next release by October.
So then your options are (1) to use your existing code but just add betaPrior=TRUE when you call DESeq(), or (2) to omit the lfcShrink() line in the above code, and so the tests, p-values are fine but the LFC are the maximum likelihood estimates, not the shrunken ones.
I will go with first option for now. :)