Question

Limma analysis with multiple variables

0

Entering edit mode

Will • 0

@will-23665

Last seen 3.7 years ago

Italy

Hi, I'm making a rnaseq analysis using Limma, and I have a lot of samples. I trying to compute the design matrix and then the lmFit(). But when I call lmFit it returns

coefficient not estimable

refering for the last coefficient in the design matrix. There were some NA values for some rows, so before to compute the design matrix i deleted them. This is the code:

delete = rownames(x$samples)[!complete.cases(x$samples)]
x$samples = x$samples[!rownames(x$samples) %in% delete,]
x$counts = x$counts[,!colnames(x$counts) %in% delete]

design <- model.matrix(~0+group+gender+y+age, data=x$samples)
colnames(design) <- gsub("group", "", colnames(design))
    v <- voom(x, design) 
    vfit <- lmFit(v, design)

where x is a DGE object. y is a numeric variable containing almost different values. So when i create the design matrix, it contains a lot of columns because the variable y has a lot of levels. So maybe I should create a new varaible y2 that divide the values in y in some categorie using ifelse() function.

limma model.matrix de design rnaseq • 2.5k views

ADD COMMENT • link updated 4.7 years ago by Gordon Smyth 52k • written 4.7 years ago by Will • 0

0

Entering edit mode

I'm curious as to your naming conventions of your factors, because:

"y" is often reserved for an outcome variable, but here you have it as a covariate. Of course there is nothing intrinsically wrong with this, it just is odd to see.
"group" is often reserved for labelling the combinations of experimental factors that are present in your samples, and would then appear alone in your design formula, viz: design <- model.matrix(~0+group).

If you share a little more about your experiment, such as the values that "group" and "y" in fact take on and what your outcome variable is, it may lend some context to aid in better understanding your underlying issue.

ADD REPLY • link 4.7 years ago Malcolm Cook ★ 1.6k

score 0 · Answer 1 · 2020-06-10

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 11 minutes ago

WEHI, Melbourne, Australia

I think you need to clarify whether y is a numeric covariate or a factor. If it is numeric, then it should produce only one column in the design matrix. A numeric variable should not be declared as a factor, unless you want to group it into a small number of distinct levels.

BTW, if x is a DGEList you could do the subsetting by:

j <- complete.cases(x$samples)
x <- x[,j]

ADD COMMENT • link 4.7 years ago Gordon Smyth 52k

0

Entering edit mode

y is a numeric covariate, but I use as.factor() to change it as a factor; you suggest to pass it as numeric vector to the model.matrix ? So without using as.factor() ? I don't want to group it into a group level

ADD REPLY • link 4.7 years ago Will • 0

1

Entering edit mode

If you treat y as numeric, normally you will be modelling expression as a linear function of y. If you treat it as a factor, you are modelling each distinct value of y has having a completely distinct effect regardless of how close each value is to the other values. For example, if the distinct values of y were 3, 3.01, and 10, you would be treating these 3 values as entirely distinct, ignoring that 3.01 happens to be very close to 3 and far from 10. That is probably not desirable.

If neither of these analysis strategies is what you are looking for, it is also possible to model expression as a smooth but non-linear function of y. See the Limma User's Guide Section 9.6.2: "Many time points" for how to do this.

ADD REPLY • link 4.7 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

ok thanks, so I can combine in the model.matrix both variable as numeric and as factor ?

ADD REPLY • link 4.7 years ago Will • 0

1

Entering edit mode

As I said in my answer above, you must not use as.factor() on a numeric covariate. That will produce meaningless results. Factors are only for categorical variables. You cannot combine a numeric variable as both numeric and as a factor. You must not make it a factor at all. Just include it in the model as a numeric variable.

ADD REPLY • link 4.7 years ago Gordon Smyth 52k

0

Entering edit mode

Ok, but I referred to the other variables, so y as numeric while group and gender as a factor

ADD REPLY • link 4.7 years ago Will • 0