Question

derfinder model.matrix with limma or single base-level F-statistics analysis

0

Entering edit mode

Konstantinos Yeles ▴ 80

@konstantinos-yeles-8961

Last seen 16 months ago

Italy

Hello, Bioconductor, I'm working lately with data from small-RNA seq in an organism without small RNA annotation. I would like to perform DE analysis on Differentially Expresed Regions and I would like to ask about the creation of the model matrix. Following the example of the vignette :

Build models

mod <- model.matrix(~ pheno$group + pheno$gender)
mod0 <- model.matrix(~ pheno$gender)

I only have pheno$group, does that mean I just follow the "main" instructions from the limma vignette?

getF <- function(fit, fit0, theData) {
    rss1 <- rowSums((fitted(fit) - theData)^2)
    df1 <- ncol(fit$coef)
    rss0 <- rowSums((fitted(fit0) - theData)^2)
    df0 <- ncol(fit0$coef)
    fstat <- ((rss0 - rss1) / (df1 - df0)) / (rss1 / (ncol(theData) - df1))
    f_pval <- pf(fstat, df1 - df0, ncol(theData) - df1, lower.tail = FALSE)
    fout <- cbind(fstat, df1 - 1, ncol(theData) - df1, f_pval)
    colnames(fout)[2:3] <- c("df1", "df0")
    fout <- data.frame(fout)
    return(fout)
}

If I want to perform the single base-level F-statistics, instead of using the:

models <- makeModels(sampleDepths,
    testvars = pheno$group,
    adjustvars = pheno[, c("gender")]
)

Do I put the testvars only?

Thank you in advance!

Konstantinos

derfinder • 1.5k views

ADD COMMENT • link 3.7 years ago Konstantinos Yeles ▴ 80

score 2 · Accepted Answer · 2021-08-01

2

Entering edit mode

Leonardo Collado Torres ★ 1.1k

@lcolladotor

Last seen 9 weeks ago

United States

Hi,

Can you provide an example that breaks? The following R code works for me:

library("derfinder")
set.seed(20210801)
pheno <- data.frame(
    group = sample(c("A", "B"), 10, replace = TRUE),
    gender = sample(c("M", "F"), 10, replace = TRUE)
)

sampleDepths <- runif(10, 1e7, 1e8)

models <- makeModels(sampleDepths,
    testvars = pheno$group,
    adjustvars = pheno[, c("gender")]
)

as well as

models_noadj <- makeModels(sampleDepths,
    testvars = pheno$group,
    adjustvars = NULL
)

which I think is the situation you are describing of not adjusting for any covariates (adjustment variables). You can see how the models change with http://bioconductor.org/packages/ExploreModelMatrix/. Here are how the models look:

> sapply(models, colnames)
$mod
[1] "(Intercept)"  "testvarsB"    "sampleDepths" "adjustVar1M"

$mod0
[1] "(Intercept)"  "sampleDepths" "adjustVar1M"

> sapply(models_noadj, colnames)
$mod
[1] "(Intercept)"  "testvarsB"    "sampleDepths"

$mod0
[1] "(Intercept)"  "sampleDepths"

Package info:

> packageVersion("derfinder")
[1] ‘1.26.0’

Best, Leo

ADD COMMENT • link 3.7 years ago Leonardo Collado Torres ★ 1.1k

1

Entering edit mode

Dear Leonardo,

in the section "6.1.2 Find DERs with limma" of the derfinder vignette it is explained how to perform an analysis in the case that the experimental design has a covariate (pheno[, c("gender")])

I wanted to know if in a simpler design (such as in the example with only the pheno$group) the limma approach would be as the one in the 9.2 or 9.3 of the "limma users guide":

9.3 Several Groups

f <- factor(targets$Target, levels=c("RNA1","RNA2","RNA3"))
design <- model.matrix(~0+f)
colnames(design) <- c("RNA1","RNA2","RNA3")
fit <- lmFit(eset, design)
contrast.matrix <- makeContrasts(RNA2-RNA1, RNA3-RNA2, RNA3-RNA1, levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

For example, a design I would like to test is:

transformedCov <- log2(coverageMatrix + 32)

targets <- data.frame(files=c("Control_1", "Control_2", "Cell_1", "Cell_2", "No_Cell_1", "No_Cell_2"), group=as.factor(c("Ctrl","Ctrl", "C1","C1","C2","C2")))
design <- model.matrix(~0 + targets$group)
colnames(design) <- c("C1","C2","Ctrl")

Does that mean I have to follow this:

fit <- lmFit(transformedCov , design = design)
contrast.matrix <- makeContrasts(C1-Ctrl, C2-Ctrl, C1-C2, levels=design)
fit_2<- contrasts.fit(fit , contrast.matrix)
fit_2 <- eBayes(fit_2, robust = TRUE)

Or should I find a way to apply the getF() proposed in the derfinder vignette?

About the single base-level F-statistics

Yes, that was what I was searching for:

as well as
models_noadj <- makeModels(sampleDepths,
    testvars = pheno$group,
    adjustvars = NULL )
which I think is the situation you are describing of not adjusting for any covariates (adjustment variables). You can see how the models change with http://bioconductor.org/packages/ExploreModelMatrix/. Here are how the models look:

I wasn't completely sure if using the sampleDepths and the pheno$group would be sufficient to get results without the input of adjustvars.

ADD REPLY • link 3.7 years ago Konstantinos Yeles ▴ 80

1

Entering edit mode

Hi,

Thanks for the clarifications. The single base-level F-statistics approach is quite slow, and well, I would recommend that you use the limma-based approach instead. With limma you can then use whatever model is best for your question and your question really becomes a modeling question instead of a derfinder one.

I see that targets$group has three options: Ctrl, C1 and C2. If you use the model.matrix(~0 + targets$group) approach, you are then modeling without an intercept. If you use model.matrix(~ targets$group) one of your groups will become the reference. You can visually inspect this with ExploreModelMatrix with:

app <- ExploreModelMatrix(
  sampleData = targets,
  designFormula = ~ 0 + group
)
shiny::runApp(app)

versus

app <- ExploreModelMatrix(
  sampleData = targets,
  designFormula = ~ group
)
shiny::runApp(app)

That visual inspection will help you understand what are the coefficients that you are obtaining and how to interpret them.

If you are interested in the difference between C1 and Ctrl, as well as the difference of C2 vs Ctrl, then you might want to use model.matrix(~targets$group) where you set up targets$group to be a factor with levels(targets$group) to be c("Ctrl", "C1", "C2"), then get the t-statistics for the 2nd coefficient (C1 vs Ctrl) and the 3rd coefficient (C2 vs Ctrl) by running limma:topTable(coef = 2) and limma::topTable(coef = 3) on the same eBayes(lmFit()) object.

Or you can use the model.matrix(0 ~ targets$group) model that has no intercept and use the constrasts.fit() function like they explain in the limma docs.

You might be able to get more specific modeling support by asking the limma authors.

Best, Leo

ADD REPLY • link 3.7 years ago Leonardo Collado Torres ★ 1.1k

0

Entering edit mode

Thank you much for the elucidation, it is clear now and I will proceed as you suggested!

Kind regards, Konstantinos

ADD REPLY • link 3.7 years ago Konstantinos Yeles ▴ 80