Question

Complicated unbalanced block DOE with partially paired samples and analysis

0

Entering edit mode

ML18 ▴ 10

@ml18-15258

Last seen 5.1 years ago

I have an one-factor (5 levels of disease staging) expt. We were given whatever samples we could get, so even though it's rather unbalanced, there's not much we could do.

These are the # of samples:

Stage	Total
0	43
1	32
2	44
3	36
4	52

Block size could be 5-10 (samples can be processed in such sized batches). So this is rather unbalanced. Further, about 60 samples are paired to about 25 patients (some patients have 2 samples from 2 separate visits, some have 3). Moreover, some paired samples are from the same stage, some have different stages (disease progressed over the visits).

We interest in finding the marker for disease progression. But what would be the best way to do this design? Should I deal with the unbalance just fill in some NA spots and randomly assign? Block size 10 would be convenient, and I would put paired samples in the same block but assigning stages will have to consider the overweight of some stages due to the paired samples, particularly in the case where 3 samples from the same pt has the same stage and in the same block, which will not be good. Should I just discard one sample?

Would limma still be good to analyze this? I guess if DOE is done well it should be fine? Thanks for any suggestion!

limma complex-design paired design • 1.5k views

ADD COMMENT • link updated 6.9 years ago by Aaron Lun ★ 28k • written 6.9 years ago by ML18 ▴ 10

score 2 · Answer 1 · 2018-03-20

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 4 hours ago

The city by the bay

There is no need to do crazy stuff like adding fake NA samples. There is no need to discard samples. limma will handle unbalanced designs just fine. In fact, your table suggests that the number of samples is actually quite comparable across stages. I wouldn't consider a design to be strongly unbalanced unless we were looking at order of magnitude differences in sample sizes.

While I don't really understand the blocking setup, I would hope that the experimenters had the sense to avoid putting all samples from the same stage in a single processing block. If so, you can use limma with an additive model, probably something along the lines of ~block + stage in your design matrix, and specifying patient as the blocking factor in duplicateCorrelation.

If most or all of the blocks consist of samples from the same stage... then it becomes more difficult, as the stage and block effects will be confounded in the above design. To get around this, I would only keep one sample from each patient, use ~stage in the design matrix and use block in duplicateCorrelation.

ADD COMMENT • link 6.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks a lot! My plan was (in an ideal world) to have blocks of size 10, with each of the 5 stages have 2 samples in every block, because I'm interested in comparing the stages and the balanced design would be remove block effect. That's why I'm a bit worried to have blocks to have some stages missing, other potentially having 3 of the same stages (paired samples from the same pt's 3 visits, which sometimes have the same stage). But as you said, limma can deal with such unbalance, then it's not an issue.

So for the design I described just now (ideally 2 samples per each stage in each block of size 10, but paired samples always get put into the same block), I intend to treat pt as random effect with duplicateCorrelation, and design matrix would have ~stage + block, that should work right? Or should I somehow block both pt and block in duplicateCorrelation (but how)?

ADD REPLY • link 6.9 years ago ML18 ▴ 10

1

Entering edit mode

~stage + block in the design with pt in duplicateCorrelation() sounds sensible to me.