Dear Bioconductor community,
I'm analyzing some RNA-seq data, with the following design.
> targets
animal genotype treatment
1 1 0 0
2 1 0 1
3 2 0 0
4 2 0 1
5 3 1 0
6 3 1 1
7 4 1 0
8 4 1 1
I'm interested in finding genes that
1) respond to treatment in genotype0
2) respond to treatment in genotype1
3) how the genotype affects gene expression in treatment0
4) how the genotype affects gene expression in treatment1
5) Do genes respond differently on treatment in genotyp0 and genotype1?
After studying quite a lot the edgeR manual and asking also in bioconductor support site some time ago, I think of the following ideas (please tell me what is correct and if you have other suggestions).
1) it seems that it is a paired-samples design (corresponding to 3.4.1 section of edgeR manual).
However, I cannot exclude just the effect of animal because it is a linear combination with the genotype. Thus, then, I follow the advices of section 3.5 (comparisons within and between samples). This seems to fit nicely.
Thus, I form the model matrix:
design = model.matrix(~genotype + genotype:animal + genotype:treatment)
This results to a model matrix with the following headers
(Intercept)
genotype1
genotype0:animal1
genotype1:animal1
genotype0:treatment1
genotype1:treatment1
question1: what is the meaning of the intercept here? When I am comparing for a certain coefficient do I compare against the intercept?
and to get the effect of treatment in genotype0, I ask for:
lrt <- glmLRT(fit, coef="genotype0:treatment1")
This seems correct to me, assuming that the intercept means genotype0:treatment0 (is that right?).
question2: why with this way I have taken care of the effect of the animal?
In a similar way I find the effect of treatment in genotype1, i.e. lrt <- glmLRT(fit, coef="genotype1:treatment1")
2) If I need to find the genes that respond differently with treatment in genotype1 vs genotype0, then I guess that I need to form contrasts. Will this be something like c(0,0,0,0,-1,1) ?
3) The aforementioned model matrix does not help me with the effects of genotypes on treatments. Thus, I form another model.matrix aka
design2 <- model.matrix(~ treatment + treatment:animal + treatment:genotype )
question3: is it correct to analyze the same data with 2 independent design matrices? or this can lead to some bias (multiple testing?)
4) Then, I thought that perhaps it would be better to put all these together in one model matrix as:
design.together <-model.matrix( ~ genotype + treatment + genotype:treatment + treatment:animal + genotype:animal)
To do the comparisons, I formed contrasts (as suggested here http://seqanswers.com/forums/showpost.php?p=153077&postcount=13) as follows:
cntr1 <- colMeans(m[genotype==1 & treatment == 0,]) - colMeans(m[genotype == 0 & treatment == 0,])
this results to:
cntr1
(Intercept) genotype1 treatment1
0.0 1.0 0.0
genotype1:treatment1 treatment0:animal1 treatment1:animal1
0.0 0.0 0.0
genotype1:animal1
0.5
question4: should the 0.5 term be there in genotype1:animal1? is this approach correct? if yes, is it equivalent to the previous design?
Thanks a lot, and sorry for all these questions