Question

accounting for technical variation in DESEQ2

1

Entering edit mode

marco.trizzino83 ▴ 10

@marcotrizzino83-9987

Last seen 6.5 years ago

I have a question about DESEQ2 data normalization. I know that DESEQ2 requires raw reads counts, that the softwares normalizes by seq depth.

But what if I want to account for technical variation? Normally, I would quantile-normalize the data, but I understand that DESEQ2 does not support quantile normalized data, so how can I correct for this kind of variability?

Thanks in advance,

Marco

deseq2 normalization • 1.0k views

ADD COMMENT • link updated 6.5 years ago by Peter Langfelder ★ 3.0k • written 6.5 years ago by marco.trizzino83 ▴ 10

score 0 · Answer 1 · 2018-08-06

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 3 months ago

United States

Others may have more systematic answers, but here are my 2 cents regarding specifically quantile normalization. When I check quantiles in DESeq-normalized data (more precisely, normalized and variance-stabilized), the data always look "nearly quantile normalized", in that specific percentiles (I normally use 30%, 50%, 70%, 80%, 90%) vary within a very narrow range, certainly much less than the differences between the percentiles. In other words, in my experience DESeq normalization approximates quantile normalization very well.

ADD COMMENT • link 6.5 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thanks. So you would recommend performing variance-stabilization before looking for DE genes (or differentially accessible ATAC-seq regions) if I am concern about technical variation?

ADD REPLY • link 6.5 years ago marco.trizzino83 ▴ 10

0

Entering edit mode

You don’t need to apply VST before DE (you cannot actually supply transformed data to DESeq2 which instead uses original counts and models the heteroskedastocity via the NB GLM).

Let me check in later for more links to technical variation related software that can be useful here integrating with DESeq2 (RUV, cqn, sva, etc).

ADD REPLY • link 6.5 years ago Michael Love 43k

1

Entering edit mode

So, there are two categories as I see it for modeling extra technical variation, one based on covariates, e.g. gene GC content, and gene length:

cqn
EDASeq
etc.

and the other based on factor analysis:

RUVSeq
svaseq
etc.

We have examples of incorporating these in the vignette and workflow.

The covariate-based methods are useful if you have biased counts related to per-sample fluctuations in PCR or RNA degradation. If you use Salmon with --gcBias (and --posBias for positional bias), and then tximport, then you don't need to use those to deal with that type of technical variation, as Salmon has already corrected for these during its estimation steps and its passed along to DESeq2 via tximport. You can assess GC bias and positional bias with MultiQC (and FASTQC modules, also soon to come Salmon modules).

The factor analysis methods are useful for removing additional technical variation regardless the source, but if the bias is partially confounded with the biological covariates, its possible to remove some signal. This doesn't happen with Salmon or the covariate based methods because they are working on a per-sample basis, and only removing variation that can be explained based on gene, transcript or cDNA fragment features.

ADD REPLY • link 6.5 years ago Michael Love 43k

0

Entering edit mode

No, I recommend checking the normalized (and perhaps VST'd) data and unless there is good reason to worry about quantile normalization, don't (worry about QN). As Michael says below, if you feel you have inter-sample technical variation, you can look into SVA, RUV-seq and possibly other approaches to creating covariates that can be used within DESeq to account for inter-sample variation.

ADD REPLY • link 6.5 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thank you both for the replies, I'll check what you suggested and let you know if I have more questions.

ADD REPLY • link 6.5 years ago marco.trizzino83 ▴ 10