I am trying to use edgeR to analyse differential gene expression from my RNA-seq dataset. The experimental design is as follows: I had 3 different treatment conditions i.e. 1 factor with 3 different levels. And there are 3 replicate RNA samples (counts libraries) for each of the 3 treatments i.e. 9 samples in total. I have made a DGEList with the counts for the 9 samples. I have filtered to remove features with metatags and low expression. I have also made a design matrix giving the information that there are 3 groups with 3 samples assigned to each group. The design matrix (called design) looks like this:
Group1 Group2 Group3 Sample1 1 0 0 Sample2 1 0 0 Sample3 1 0 0 Sample4 0 1 0 Sample5 0 1 0 Sample6 0 1 0 Sample7 0 0 1 Sample8 0 0 1 Sample9 0 0 1 I have also estimated dispersion using the following DG <- estimateDisp(DG, design, robust=TRUE)
where DG is my DGEList and design is my design matrix. Because my experimental design is very simple I don't think I need to use a GLM approach to analyse differential expression and I only want to do pair-wise comparisons i.e. find DE between group 1 and group 2 and between group 1 and group 3. So I think I want to use the edgeR classic approach. However, when I try the following code:
> et <- exactTest(DG, pair=c("group1","group2"))
I get the following error:
At least one element of given pair is not a group. Groups are: 1
Can someone please help explain where I am going wrong? I think that my DGEList is not getting the information from my data matrix (design) or perhaps my data matrix is over-complicated for my experimental design and I need to somehow include the group information in my DGEList. At the moment if I print my DGEList the $samples table looks like this:
$samples files group lib.size norm.factors counts3D7_1-1 counts3D7_1-1 1 13251744 2.0632398 counts3D7_2-1 counts3D7_2-1 1 13809955 0.9912349 counts3D7_3-1 counts3D7_3-1 1 12328705 1.3071140 counts3D7_1-2 counts3D7_1-2 1 12605616 0.6537380 counts3D7_2-2 counts3D7_2-2 1 13392599 0.6290064
(with not all rows printed) so I think that the correct group information isn't there. Any help is greatly appreciated.
Many thanks for your helpful suggestions. I am now calculating differential expression using the GLM approach. One further question: you say that supplying
design
toestimateDisp
will use the GLM approach for estimating the dispersions. Does this mean that I don't need to go back and useqlmQLFit
to estimate dispersions prior to calculating differential expression usingglmQLFTest
?You still need to use both of them.
estimateDisp
estimates the negative binomial dispersions, whileglmQLFit
estimates the quasi-likelihood dispersions. The book chapter has more details on this:http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_19