Question

SVA for repeated measures design

0

Entering edit mode

sara.blocquiaux • 0

@sarablocquiaux-21717

Last seen 4.9 years ago

Hi all,

I have RNA-seq data (5 subjects are measured on 4 time points) and would like to do a SVA first to be able to include potential confounders into the statistical model (Deseq2 pipeline).

I am having troubles how to define my null and full model in the SVA:

Full model ~ TIME + SUBJECT.ID

Null model ~ SUBJECT.ID OR Null model ~ 1

Should the subjects ID be treated as a factor of interest or as a confounding factor?

Thanks in advance!

Best,

Sara

SVA deseq2 repeated measures • 1.7k views

ADD COMMENT • link updated 5.1 years ago by Robert Castelo ★ 3.4k • written 5.2 years ago by sara.blocquiaux • 0

score 1 · Answer 1 · 2019-10-02

You should probably use just an intercept for your null model. In general, if you have repeated measures (which I assume you do, given the subject ID), AND given that you have complete repeated measures (where you have measurements from each subject at each time), then the subject-specific changes are orthogonal to the measure of interest, and blocking on subject is the way to go. It also makes it easier to interpret your coefficients.

Put a different way, sva is intended to generate surrogate variables for unobserved variability. The subjects are by definition observed, so if you wanted to use the sva package to do something with them, you could consider them to be batch effects and use ComBat (note that I am not advocating this, but just noting that sva is for things you don't observe and ComBat is for things you know about.)

score 0 · Answer 2 · 2019-10-03

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 5 days ago

Barcelona/Universitat Pompeu Fabra

Hi,

I would say the answer is to include SUBJECT.ID in the null model because, as argued by Jeff Leek, author of SVA, in this thread about a similar design case, SUBJECT.ID will be used in the ultimate linear model you intend to fit to test for the effect of your variable of interest.

cheers,

robert.

ADD COMMENT • link 5.1 years ago Robert Castelo ★ 3.4k

0

Entering edit mode

I agree, that was what I was thinking at first.

But subject.ID is not just a covariate, it is a random factor. So it is still not clear to me whether to include it in the null model or not. Not including it in the null model, makes it kind of a variable of interest itself.

ADD REPLY • link 5.1 years ago sara.blocquiaux • 0

1

Entering edit mode

If SUBJECT.ID is a random factor, then you should not put it into the design matrix and use duplicateCorrelation() and the arguments correlation and block in the call to lmFit(); see section on Multi-level experiments from the limma User's Guide. If you don't need surrogate variables, then you can just follow that documentation.

The complication comes when you want to combine it with surrogate variables estimated with SVA. You can try to have a full model with TIME only and the null with the intercept. Then, estimate surrogate variables, paste them into the design matrix and proceed with the duplicateCorrelation() blocking on SUBJECT.ID. However, it may happen that SVA has already estimated part of the SUBJECT.ID variablity and this may lead to problems with duplicateCorrelation(); see this thread about that possibility. So, I'd suggest to include SUBJECT.ID in the full and null models that you give to SVA (next to TIME), just to ensure that the SUBJECT.ID variability is not picked up by SVA. Then, place TIME and the surrogate variables in a new design matrix, i.e., without SUBJECT.ID, and proceed with duplicateCorrelation() blocking on SUBJECT.ID.

ADD REPLY • link 5.1 years ago Robert Castelo ★ 3.4k