Question

Would I treat surrogate variables as batches?

0

Entering edit mode

MEME ▴ 10

@99e6c856

Last seen 13 months ago

Hong Kong

hi there, I am investigating age-related gene expression changes across different tissue regions with RNA-seq data using DESeq2. A part of my coldata looks like(I have a total of 100 samples):

> coldata
  tissue     age    sex
1    PFC  3month  female
2    Amy  3month  male
3    PFC 6month  female
4    Amy  6month  male
5    PFC 20month  female
6    Amy 20month  male

To remove technical variation, I run SVAseq with n.sv=2 on 100 samples. Having identified differentially expressed genes globally, I want to delve into age-related differential expression analyse for each tissue separately. I am considering whether treating surrogate variables as batches and employing Deseq2 with the design design = ~ sex + age would be appropriate.

Thank you for your insights and guidance.

DESeq2 sva • 714 views

ADD COMMENT • link updated 13 months ago by James W. MacDonald 68k • written 13 months ago by MEME ▴ 10

score 0 · Answer 1 · 2024-03-01

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

It's not clear what you mean by 'treating surrogate variables as batches' in this context. You generate surrogate variables explicitly to include as variables in your design matrix, so if you are asking 'should I add the surrogate variables to my design?', then the answer is yes. Please see the vignette for sva, in particular section 6.

ADD COMMENT • link 13 months ago James W. MacDonald 68k

0

Entering edit mode

I apologize for any confusion. I am seeking clarity regarding surrogate variables, which are continuous values derived to capture variance attributable to primary variables(in my case, age,sex, and tissue). However, when dividing the data by tissue, I wonder if it's appropriate to utilize surrogate variables computed from the entire dataset. Thank you very much.

ADD REPLY • link 13 months ago MEME ▴ 10

0

Entering edit mode

When you compute surrogate variables the null model should also include any other model coefficients (except the one of interest), in which case the surrogate variables are based on the model you plan to use.

ADD REPLY • link 13 months ago James W. MacDonald 68k