Question

Adjusting for unwanted variation in DESeq2

0

Entering edit mode

Nikolay Ivanov • 0

@nikolay-ivanov-23079

Last seen 3.6 years ago

USA/New York City/Weill Cornell Medicine

I have a question regarding the best way to adjust for unwanted variation while using DESeq2.

Case 1: I have a dataset that came from one lab (so there are no known batch effects), and I wish to adjust for unwanted variation. I’m running svaseq on my count matrix, getting 17 SVs and adding them to my model.

dds=DESeqDataSetFromMatrix(countData = counts, colData = phenoData, 
design = ~ SV_1 + … + SV_17 + covariate_of_interest)

Is that an appropriate thing to do? Is ok to add this many SVs? Is there a better way to adjust for unwanted variation?

What if there are ~30 SVs, can you just add them into the model?

Case 2: I’m combining datasets generated by multiple labs, so now there are known batch effects. Should I include the known batch effects in my model in addition to the SVs estimated by svaseq?

Additional questions:

The instructions for using svaseq state that the input should be a “transformed data matrix”. Does that mean I can run svaseq on a count matrix, or does it have to transformed in some way?
When you are fitting an interaction model and you also have SVs, can you set up your model like so:

dds=DESeqDataSetFromMatrix(countData = counts, colData = phenoData,  
design = ~ SV_1 + … + SV_17 + genotype + condition+ genotype:condition)

Thank you!

DESeq2 deseq2 sva rna-seq differential expression anlaysis • 1.2k views

ADD COMMENT • link updated 5.0 years ago by James W. MacDonald 68k • written 5.0 years ago by Nikolay Ivanov • 0

score 0 · Answer 1 · 2020-03-11

You can hypothetically add SVs to the model until you have no more remaining degrees of freedom, but there is some point where you might consider it to be excessive and you might then want to do some better EDA to figure out what's up. If you have like 200 samples then 17 SVs is probably fine. If you have 20 samples, then that's probably too many?

In case 2 you should probably include the batches in the mod argument and then fit them as part of the model.

You run svaseq on counts. I don't know what 'transformed data matrix' means in that context (the help says samples in columns and genes in rows, so maybe that should be 'transposed'?), but both the example and the code indicate it should be counts.