Hi, I'm making a rnaseq analysis using Limma, and I have a lot of samples. I trying to compute the design matrix and then the lmFit(). But when I call lmFit it returns
coefficient not estimable
refering for the last coefficient in the design matrix. There were some NA values for some rows, so before to compute the design matrix i deleted them. This is the code:
delete = rownames(x$samples)[!complete.cases(x$samples)]
x$samples = x$samples[!rownames(x$samples) %in% delete,]
x$counts = x$counts[,!colnames(x$counts) %in% delete]
design <- model.matrix(~0+group+gender+y+age, data=x$samples)
colnames(design) <- gsub("group", "", colnames(design))
v <- voom(x, design)
vfit <- lmFit(v, design)
where x is a DGE object. y is a numeric variable containing almost different values. So when i create the design matrix, it contains a lot of columns because the variable y has a lot of levels. So maybe I should create a new varaible y2 that divide the values in y in some categorie using ifelse()
function.
I'm curious as to your naming conventions of your factors, because:
"y" is often reserved for an outcome variable, but here you have it as a covariate. Of course there is nothing intrinsically wrong with this, it just is odd to see.
"group" is often reserved for labelling the combinations of experimental factors that are present in your samples, and would then appear alone in your design formula, viz:
design <- model.matrix(~0+group)
.If you share a little more about your experiment, such as the values that "group" and "y" in fact take on and what your outcome variable is, it may lend some context to aid in better understanding your underlying issue.