Our data set contains 10 samples from three different conditions (We also have other timepoints, for now we are though only interested in what happens at the beginning). The metadata is as such:
sampleName condition time urea weight
C20 CTRL0h 0h 8.0 8.0
C22 CTRL0h 0h 8.0 8.0
C24 CTRL0h 0h 8.0 8.0
HP11 HP0h 0h 2.0 5.0
HP12 HP0h 0h 4.0 5.0
HP14 HP0h 0h 5.0 7.0
CR4Wo1 CR4W0h 0h 2.0 3.0
CR4Wo2 CR4W0h 0h 2.0 2.0
CR4Wo3 CR4W0h 0h 2.0 1.0
CR4Wo4 CR4W0h 0h 2.0 2.0
We would like to apply a linear (mixed) model to the data set to understand how the two factors urea
and weight
affects gene expression. But I'm not sure if this is possible at all here with the samples we have, as I don't have a complete set of combinations for the two factors i would like to analyze. Do I need to have more samples in able to do that?
I would appreciate any ideas/help as to how i can (if at all) apply such analysis to the data I have.
thanks in advance Assa
If what you want to know is the proportion of variance (of gene expression) explained by the covariates, then you could try the
variancePartition
package. You can also use its'dream
function, which is a linear-mixed model extension oflimma-voom
, if you want to do repeated measures differential expression analysis later on. Remember to useddf = "Kenward-Roger"
for the small-samples adjustment for your DE analysis.From what experiments did you get your samples from? If you have minimal intra-group variations (like from mice studies or cell cultures), then perhaps you have (barely) enough number of samples.
Correct me if I’m wrong, but I think the duplicatecorrelation function in limma is equivalent to adding a random effect.
Yes indeed, however,
limma
estimates the random effect globally (i.e. for all genes). Whereasdream
estimates the random effect separately for each gene. So, if you have a lot of inter-individual variation on the gene expression, IMHOdream
will call less false positive results.