There are non-negligible differences in the number of differentially expressed genes when the linear regression is performed on a design with or without intercept for factors whose regression should not depend on the intercept.
Specifically, I have design matrix with 13 cancer types and the presence of mutation A. Each sample must belong to one cancer type, but it may or may not have a mutation A. So the design matrix, without intercept, is:
> design.noI <- model.matrix(~0+ct+mutA)
where ct is a nSamples x 13 binary matrix and mutA is a nSamples x 1 binary vector. The version with the intercept is trivially:
> design.wI <- model.matrix(~ct+mutA)
where one of the columns of ct is dropped and replaced by the intercept vector. The two designs are linearly dependent, therefore the estimated coefficients in the two scenarios may be derived with some simple algebra. In particular, the coefficients for the factor mutA should be identical in both scenarios, and thereby their statistical significance.
When I run the voom pipeline, however, the results are slightly different:
> v <- voom(y,design.noI,plot=T) > fit <- lmFit(v,design.noI) > eB.noI <- eBayes(fit) > v <- voom(y,design.wI,plot=T) > fit <- lmFit(v,design.wI) > eB.wI <- eBayes(fit)
As predicted, the weights returned by voom and the coefficients for mutA are in both cases identical (besides negligible rounding errors). However, when decideTest is performed in the two cases, the n° of DE genes for factor mutA is different:
> adjPFactor.noI <- decideTests(eB.noI, method="global", adjust.method="BH", p.value=minP, lfc=minFC) > adjPFactor.wI <- decideTests(eB.wI, method="global", adjust.method="BH", p.value=minP, lfc=minFC) > table(adjPFactor.wI[,"mutA"]) -1 0 1 47 18861 48 > table(adjPFactor.noI[,"mutA"]) -1 0 1 59 18841 56
Where does this slight difference arise from?
Thanks!
I see, the problem resides purely in the adjustment of the p.values, that is sensitive to the contrasts included in the design. Thanks!