hi there, I am investigating age-related gene expression changes across different tissue regions with RNA-seq data using DESeq2. A part of my coldata looks like(I have a total of 100 samples):
> coldata
tissue age sex
1 PFC 3month female
2 Amy 3month male
3 PFC 6month female
4 Amy 6month male
5 PFC 20month female
6 Amy 20month male
To remove technical variation, I run SVAseq with n.sv=2
on 100 samples. Having identified differentially expressed genes globally, I want to delve into age-related differential expression analyse for each tissue separately. I am considering whether treating surrogate variables as batches and employing Deseq2 with the design design = ~ sex + age
would be appropriate.
Thank you for your insights and guidance.
I apologize for any confusion. I am seeking clarity regarding surrogate variables, which are continuous values derived to capture variance attributable to primary variables(in my case,
age
,sex
, andtissue
). However, when dividing the data by tissue, I wonder if it's appropriate to utilize surrogate variables computed from the entire dataset. Thank you very much.When you compute surrogate variables the null model should also include any other model coefficients (except the one of interest), in which case the surrogate variables are based on the model you plan to use.