I am working with edgeR to perform some differential expression work. EdgeR has been a dream to work with and I have quickly reached the point where I am trying to ask some more complicated questions.
My dataset consists of the following:
I have a human cell line, which is either infected or uninfected with a pathogen. I also have two different drug treatments (A and B for simplicity). Therefore I have 6 conditions (with triplicate reps). I have successfully made the simple comparisons like: what genes are changing when uninfected cells are treated with drug A, and what genes are changing when cells are infected with the pathogen?
My more complicated questions take the form:
What gene changes are induced when infected cells are treated with drug A vs when uninfected cells are treated with drug A?
Basically, the hypothesis is that the drugs affect the pathogens, causing them to act differently and we want to find the host gene changes caused by the changed pathogen behavior.
Here is what my samples/targets look like:
Pathogen Treatment Group No_Pathogen-No_drug-rep1 negative negative negative.negative No_Pathogen-No_drug-rep2 negative negative negative.negative No_Pathogen-No_drug-rep3 negative negative negative.negative No_Pathogen-Drug_A-rep1 negative Drug_A negative.Drug_A No_Pathogen-Drug_A-rep2 negative Drug_A negative.Drug_A No_Pathogen-Drug_A-rep3 negative Drug_A negative.Drug_A No_Pathogen-Drug_B-rep1 negative Drug_B negative.Drug_B No_Pathogen-Drug_B-rep2 negative Drug_B negative.Drug_B No_Pathogen-Drug_B-rep3 negative Drug_B negative.Drug_B Pathogen-No_drug-rep1 positive negative positive.negative Pathogen-No_drug-rep2 positive negative positive.negative Pathogen-No_drug-rep3 positive negative positive.negative Pathogen-Drug_A-rep1 positive Drug_A positive.Drug_A Pathogen-Drug_A-rep2 positive Drug_A positive.Drug_A Pathogen-Drug_A-rep3 positive Drug_A positive.Drug_A Pathogen-Drug_B-rep1 positive Drug_B positive.Drug_B Pathogen-Drug_B-rep2 positive Drug_B positive.Drug_B Pathogen-Drug_B-rep3 positive Drug_B positive.Drug_B
I am using the makeContrasts method to set up the comparisons. I am wondering specifically about something like this contrast:
Pathogen_induced_changes_with_drug_A = (positive.DrugA) - (negative.DrugA - negative.negative)
My intention with this contrast is to have the baseline be changes in pathogen negative cells when DrugA is added, and then compare that to changes in pathogen positive cells with DrugA to find the gene changes I mentioned above.
I am wondering if I shouldn't do this instead:
Pathogen_induced_changes_with_drug_A = (positive.DrugA - positive.negative) - (negative.DrugA - negative.negative)
To also remove changes which come from simply the presence of the pathogen.
I would really appreciate any tips / pointers on this.
Thanks!
Hello Ryan,
Thanks for this response. I am not specifying my design using the
~0 + Group
format. I am going to just include my whole R script. I am not opposed to digging into the results manually but I would like to clarify if what I am doing initially is even correct.For A_vs_B comparisons, are those valid (and just require some cross checking) or should I only make simple comparisons and merge everything manually?
If I am not making sense, I apologise.
Thanks again for all your help
You have made a mistake here:
Your design has an intercept as the first column and fold changes as the other 5, but you are naming them as if they all had the same meaning. (Look at your design matrix and note that the first column looks different from the rest.) This is causing you to misinterpret the meaning of the coefficients and is the reason for your confusion. I strongly recommend you use a no-intercept design (i.e.
~0 + Group)
as I described earlier. If you use a no-intercept model, the column names you are assigning to the design matrix will match the meaning of those columns and the contrasts you have formulated in your code should all be correct. (The edgeR user's guide has a section on formulating a model with no intercept.)Furthermore, the last two contrasts in your code, labeled "Complicated", are not actually interaction contrasts. For example, note that the
- negative.negative
part cancels out of both terms in the first contrasts, leaving you with justnegative.A - negative.B
. The same is true for the next one. An interaction contrast involves 4 distinct groups, not 3. (The contrasts you have written are still valid, since the extra parts just cancel out.)Also, unrelated to your contrast formulation, your choice to use common dispersion is a bit odd, unless you're working with a very small number of genes. If you're working with RNA-seq or similar genome-scale data, I would recommend you use
estimateDisp
,glmQLFit
, andglmQLFTest
instead to run the standard quasi-likelihood method.