Hello, Bioconductor, I'm working lately with data from small-RNA seq in an organism without small RNA annotation. I would like to perform DE analysis on Differentially Expresed Regions and I would like to ask about the creation of the model matrix. Following the example of the vignette :
Build models
mod <- model.matrix(~ pheno$group + pheno$gender)
mod0 <- model.matrix(~ pheno$gender)
I only have pheno$group
, does that mean I just follow the "main" instructions from the limma vignette?
getF <- function(fit, fit0, theData) {
rss1 <- rowSums((fitted(fit) - theData)^2)
df1 <- ncol(fit$coef)
rss0 <- rowSums((fitted(fit0) - theData)^2)
df0 <- ncol(fit0$coef)
fstat <- ((rss0 - rss1) / (df1 - df0)) / (rss1 / (ncol(theData) - df1))
f_pval <- pf(fstat, df1 - df0, ncol(theData) - df1, lower.tail = FALSE)
fout <- cbind(fstat, df1 - 1, ncol(theData) - df1, f_pval)
colnames(fout)[2:3] <- c("df1", "df0")
fout <- data.frame(fout)
return(fout)
}
If I want to perform the single base-level F-statistics, instead of using the:
models <- makeModels(sampleDepths,
testvars = pheno$group,
adjustvars = pheno[, c("gender")]
)
Do I put the testvars
only?
Thank you in advance!
Konstantinos
Dear Leonardo,
in the section "6.1.2 Find DERs with limma" of the derfinder vignette it is explained how to perform an analysis in the case that the experimental design has a covariate (
pheno[, c("gender")]
)I wanted to know if in a simpler design (such as in the example with only the
pheno$group
) the limma approach would be as the one in the 9.2 or 9.3 of the "limma users guide":9.3 Several Groups
For example, a design I would like to test is:
Does that mean I have to follow this:
Or should I find a way to apply the
getF()
proposed in the derfinder vignette?Yes, that was what I was searching for:
I wasn't completely sure if using the
sampleDepths
and thepheno$group
would be sufficient to get results without the input of adjustvars.Hi,
Thanks for the clarifications. The single base-level F-statistics approach is quite slow, and well, I would recommend that you use the limma-based approach instead. With
limma
you can then use whatever model is best for your question and your question really becomes a modeling question instead of aderfinder
one.I see that
targets$group
has three options:Ctrl
,C1
andC2
. If you use themodel.matrix(~0 + targets$group)
approach, you are then modeling without an intercept. If you usemodel.matrix(~ targets$group)
one of your groups will become the reference. You can visually inspect this withExploreModelMatrix
with:versus
That visual inspection will help you understand what are the coefficients that you are obtaining and how to interpret them.
If you are interested in the difference between
C1
andCtrl
, as well as the difference ofC2
vsCtrl
, then you might want to usemodel.matrix(~targets$group)
where you set uptargets$group
to be afactor
withlevels(targets$group)
to bec("Ctrl", "C1", "C2")
, then get the t-statistics for the 2nd coefficient (C1 vs Ctrl) and the 3rd coefficient (C2 vs Ctrl) by runninglimma:topTable(coef = 2)
andlimma::topTable(coef = 3)
on the sameeBayes(lmFit())
object.Or you can use the
model.matrix(0 ~ targets$group)
model that has no intercept and use theconstrasts.fit()
function like they explain in thelimma
docs.You might be able to get more specific modeling support by asking the
limma
authors.Best, Leo
Thank you much for the elucidation, it is clear now and I will proceed as you suggested!
Kind regards, Konstantinos