Question

DEA with 2 controls

0

Entering edit mode

albertoStrempo • 0

@albertostrempo-8810

Last seen 9.4 years ago

Switzerland

Dear all,

I'm trying to perform a particular kind of differential expression analysis. I'm using the DESeq2 package, because I think that its fold-change moderation is very important for the analysis, but I could also switch to other packages if needed.

I have three different biological conditions, say A,B,C, each with replicates. What I would like to perform is an analysis that could return genes whose expression is different between B and both A and C (in the same direction). The kind of fold-change and significance I would like to get in the end is something like an average of B vs A, B vs C.

But I wouldn't like to perform two pairwise comparisons: B vs A, B vs C, because I consider it important to rank the results and I wouldn't know how to weight the significance and fold-changes from the two analyses.

I also wouldn't like to simply treat A and C as the same condition. Because I think that would be a problem for the dispersion estimation, because A and C are quite different from each other.

I read DESeq2 and edgeR documentation, but I couldn't come up with a suitable method to get the results I'm interested in. Do you have any suggestion?

Thanks!

DESeq2 • 1.2k views

ADD COMMENT • link updated 9.4 years ago by Michael Love 43k • written 9.4 years ago by albertoStrempo • 0

0

Entering edit mode

So ... trying to summarize what you are asking for more succinctly: are you interested in finding genes that are differentially expressed between B and the average of A and C, ie. B / mean(A,C)?

ADD REPLY • link 9.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

sorry I was not really synthetic. I don't exactly want to get the mean of A and C. I can give you a few examples:

gene A B C

gene1 100 200 150

gene2 150 200 150

gene3 100 200 200

From this example, I would like to get gene1 as the top significant, followed by gene2 and last gene3. Notice that the fold change between B and the average of A and C is the same for gene2 and gene3, but gene2 should be much more significant. I was thinking that possibly the best way I could get this, would be by doing a geometric mean of the two fold changes (BvsA, BvsC). What do you think?

I basically want to get DEGs that are significant both between BvsA and BvsC, allowing different fold-changes. The magnitude of these two fold-changes should help me rank all the genes, even when one is close to 0, like for gene3.

ADD REPLY • link 9.4 years ago albertoStrempo • 0

0

Entering edit mode

oops, answer below...

ADD REPLY • link 9.4 years ago Michael Love 43k

score 3 · Accepted Answer · 2015-09-15

3

Entering edit mode

Michael Love 43k

@mikelove

Last seen 17 hours ago

United States

This has been asked on the forum before but i can't find my answer by searching the forum either.

I don't know of a way you can test this with one contrast. What I recommended before is to perform two comparisons and combine the p-value (e.g. Fisher's method). You can then manually adjust the combined p-values for multiple testing viap.adjust(pvalue, method="BH").

ADD COMMENT • link 9.4 years ago Michael Love 43k

0

Entering edit mode

Thank you very much Michael.

I think I read in some of your comments that the moderated log2FC estimations provided by DESeq2 could also be used in meta-analyses. Is it right? In my case, I would still compare log2FC within the same dataset, so many of the possible confounding factors for RNA-seq would not be relevant (same read length, same strandedness, same paired/non-paired, same batch effects...).

ADD REPLY • link 9.4 years ago albertoStrempo • 0

0

Entering edit mode

Yes, in general, moderated LFC are useful for meta-analysis, for example, a scatterplot of moderated LFC from two experiments.

I'm not sure exactly what you have in mind though. Can you be more explicit about what you are planning?

ADD REPLY • link 9.4 years ago Michael Love 43k

0

Entering edit mode

So the point is that I would also like to get a fold-change out of this analysis. This is because I want to apply some linear models on the results, in order to predict, at least partially, the differential expression of a gene, based on some of its properties. I find it easier to use a log2 FC for this, since they are usually normally distributed. Therefore I was thinking to combine the log2 FC of the two analyses by computing a geometric mean of the two. Would that sound reasonable to you?

ADD REPLY • link 9.4 years ago albertoStrempo • 0

0

Entering edit mode

You can average the two fold changes, but I guess again I'm missing the point of doing this. You have two dimensions of information (imagine a scatterplot of the two comparisons B vs A and B vs C), and you can imagine a grid with 9 squares superimposed on this plot (up/up, up/no-DE, up/down, etc.). Can you squash the information in this 2D plot into one dimension without losing information? No. You can think through some examples: If you have the LFC of B vs A equal to x, and B vs C equal to -x, then you get 0 from the average.

ADD REPLY • link 9.4 years ago Michael Love 43k

0

Entering edit mode

Yes, squashing these two quantities in one is definitely a simplification, but I think I really need it for further comparisons. Thank you very much for your answers Michael. Your support is invaluable.

ADD REPLY • link 9.4 years ago albertoStrempo • 0