If you're looking for the genes that are dimorphic based on sex, you've just found them. If you follow through with your "standard" limma dge pipeline, you'll get the genes differentially expressed across the sexes of your samples, ie.
design <- model.matrix(~Sex, data = pData(eset))
fit <- lmFit(eset, design)
fit <- eBayes(fit)
res <- topTable(fit, n = Inf)
The p-values and logFCs reported in the res
data.frame are those for male vs female, the order of which is determined by the levels(eset$Sex)
. Just look at the 2nd column name of your design
matrix. Positive logFC's will be those that are more highly expressed in that group vs the intercept (the other sex).
Still, if you really want the residuals matrix, you could do:
resids <- limma::removeBatchEffect(eset, batch = eset$Sex)
Caveat emptor: this should work "just fine" assuming the data in exprs(eset)
is appropriate (ie. on log2 scale and more-or-less normally distributed)
Thanks! I did find the sex-DEG. And also need the residuals. I wasn't sure whether using removeBatchEffect to regress sex out, would use the same linear model as lmFit.
I was wondering if there is an equivalent of resid() applied on the lm() output.
Well, I'll be.
It does look like there is a
limma::residuals.MArrayLM
function defined. Tryresiduals(fit, eset)
and see how it compares toresids
above ...Yes that's right. Using
removeBatchEffect
is not the same as it does not subtract out the intercept.I have another question about subtracting the intercept. How does limma calculate the logFC for an observation were the mean value of Test is -0.072431315 and Reference is 0.06725765?. I cannot do log2(T/R) in this case. Thanks
These values (-0.07 and -0.06) are already in log space, no? If not, where are you getting these numbers from?
Anyway, assuming we're talking about numbers already in log-space, this shakes out to just being T - R ...
The values -0.07 and 0.06 are the residuals for Test and Reference respectively. As I work with the residuals the average expression of any gene across all samples is zero. I al puzzled on how to calculate logFC in that case.
Yes, by definition, for any design matrix that can be (re)formulated with an intercept.
If you only have residuals, you can't calculate the log-fold change between sexes, because you've explicitly thrown out that information. If you want to find DE genes, just follow the standard limma workflow. There's no need to mess around with residuals.