Question

Unequal sample pools

0

Entering edit mode

Jenny G • 0

@jenny-g-16528

Last seen 3.5 years ago

Germany

Good morning,

I have data generated from CSF samples from mice. CSF volume was very low - in some cases too low for analysis. In these cases two CSF samples (from same treatment group) were pooled. However there are also unpooled samples when the CSF volume allowed. I know which samples are a pool and which are single-source. The data are not actually gene expression data - it's a proteomics panel, with semi-quantitative protein expression values (log2 scaled). I'd like to use limma for this analysis, but I'm not sure about the pooling in this case.

I'm struggling to think how to handle this. Obviously individual-level covariates can't be used, which should be fine as it's a well-controlled study (all mice are the same strain, one sex, same age etc). But is the between-sample variance going to reflect true population variance? Do I need to account for this somehow? Is there anything I can do, or just analyze the data as usual and highlight this in the explanation of results?

Thanks, Jen

limma pooling • 1.0k views

ADD COMMENT • link 3.6 years ago • updated 3.5 years ago Jenny G • 0

score 1 · Answer 1 · 2021-10-05

We use genetically identical inbred mice in our experiments, and we find that the number of mice pooled into a sample has little or no effect on the precision of the log-expression values. The total amount of protein in a sample is more likely to affect precision than the number of mice pooled to obtain that sample.

If you want limma to check whether the 2-mice pools are more or less precise than the 1-mice pools you can run arrayWeights with var.group=n.mice, where n.mice is a vector specifying the number of mice in each pool. Running lmFit etc with the estimated sample weights will then allow for any difference in precision. However, as I say, I have not found the number of mice to be a good predictor of variability.