This isn't the place to be asking questions about the lme4
package, as that's a CRAN package, not Bioconductor. However, it's an invaluable skill to be able to read error messages and diagnose your issue from there. So let's see what it says and try to decipher. You got an error message that said this:
Error message: Error in model.frame.default(drop.unused.levels = TRUE, formula = adj.m ~ : variable lengths differ (found for 'pheno$condition')
adj.m dimensions = 586229 46 pheno dimensions = 46 15
It says variable lengths differ and then provides the dimensions of the two data sets. One has over 500k rows and 46 columns and the other has 46 rows and 15 columns. You have the same number of columns in adj.m as you do rows in pheno. And the error says your variable lengths differ. What does that imply?
Also, when you are fitting your model, on the right hand side you are using individual columns from your pheno object, yet on the left hand side you are using an entire matrix that has over 500K rows! Does that seem like a thing that could possibly work? Is it more likely that you have to fit each CpG value individually?
And getting back to Bioconductor, I would imagine you used minfi
to preprocess these data. That package also includes facilities to make comparisons (by CpG as well as by genomic region). Is there a particular reason to ignore that and use lme4
instead? You are just fitting a random intercept for each subject (which I infer by the 'pairing' in your pheno object). If you have complete observations for all subjects you can simply block on subject instead of fitting a random intercept. Or if you don't have complete observations, you could use the limma
package to fit the model using generalized least squares.
When replying to an answer, please use the ADD COMMENT button, not the ADD ANSWER button (you are not adding an answer after all!).
You can provide
limma
the entire matrix of M-values because it is natively designed to deal with high-throughput data. On the other hand,lme4
is not meant to do that, so you have to make adjustments for that fact. In other words, you have to transpose the data and then feed one column at a time tolmer
. Or you could not transpose and feed one row at a time. Your choice. You could also use thevariancePartition
package, which useslmer
under the hood, and is meant to understand that in high-throughput data are normally transposed as compared to conventional statistics.I don't know what 'the data seems to disappear when I transpose the dataset' means. In my experience that's not a thing.
I am interested in what assumptions for the
limma
package you and your supervisor disagree with. Thelimma
package is the preeminent package for analyzing high-throughput data and is the top non-infrastructure package in all of Bioconductor, so you have a decidedly non-consensus viewpoint on the subject.