Good day.
I have two cultivar tree samples at different stages collected from area N and Z. The trees at area Z is 12-year-old while the trees at area Z is 6-year-old. Below is the description of my samples.
files cv stage area age description
./M1_St5.count.txt M 5st N 12 stage5
./M2_St5.count.txt M 5st N 12 stage5
./M3_St5.count.txt M 5st N 12 stage5
./P1_St5.count.txt P 5st Z 6 stage5
./P4_St5.count.txt P 5st Z 6 stage5
./P5_St5.count.txt P 5st Z 6 stage5
./M1_St6.count.txt M 6st N 12 stage6
./M2_St6.count.txt M 6st N 12 stage6
./M3_St6.count.txt M 6st N 12 stage6
./P1_St6.count.txt P 6st Z 6 stage6
./P4_St6.count.txt P 6st Z 6 stage6
./P5_St6.count.txt P 6st Z 6 stage6
I defined the factors like:
> stage <- factor(nt$sample$stage)
> area <- factor(nt$sample$area)
> age <- factor(nt$sample$age, level=c("12","6"))
> cv <- factor(nt$sample$cv)
I want to learn genes that response to area effect together with stage factor at FDR < 0.05; hence, I described the experiment setup like below and continued with common, trended, tagwise dispersions estimation. I can proceed with my test with no error message reported.
> nt.model2 <- model.matrix(~area+stage)
> nt.model2
(Intercept) areaZ stageSt6
1 1 0 0
2 1 0 0
3 1 0 0
4 1 1 0
5 1 1 0
6 1 1 0
7 1 0 1
8 1 0 1
9 1 0 1
10 1 1 1
11 1 1 1
12 1 1 1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$area
[1] "contr.treatment"
> nt.m2 <- estimateGLMCommonDisp(nt, nt.model2, verbose=T)
Disp = 0.165 , BCV = 0.407
> nt.m2 <- estimateGLMTrendedDisp(nt.m2, nt.model2)
> nt.m2 <- estimateGLMTagwiseDisp(nt.m2, nt.model2)
> nt.m2.areaStage <- glmLRT(nt.m2.fit)
> summary(decideTestsDGE(nt.m2.areaStage, adjust.method="BH", p.value=0.05))
However, when I want to learn genes that response to area effect together with age factor with the experiment setup below, I received an error message when I continued with dispersion estimation.
> nt.model3 <- model.matrix(~area+age)
> nt.model3
(Intercept) areaZ age6
1 1 0 0
2 1 0 0
3 1 0 0
4 1 1 1
5 1 1 1
6 1 1 1
7 1 0 0
8 1 0 0
9 1 0 0
10 1 1 1
11 1 1 1
12 1 1 1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$area
[1] "contr.treatment"
attr(,"contrasts")$age
[1] "contr.treatment"
> nt.m3 <- estimateGLMCommonDisp(nt, nt.model3, verbose=T)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, :
Design matrix not of full rank. The following coefficients not estimable:
age6
Could the community please help in this case?
Thanks for your time and experience sharing.
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.20.9 limma_3.34.9
Dear Prof. Gordon, thanks for the feedback. Can you please share with me the right way to test for the genes that respond to area effect together with the age factor? Thank you for your time.
area and age are completely confounded with one another in your experiment. It is obviously impossible to separate the effects of the two factors.
I see. That's pitty then. Thanks for your prompt reply.
I have edited my answer. Just treat your experiment as having four groups.
All right, good to know that. Thanks for your suggestion and your time 🙂