Question

Is the VST implemented in DESeq useful on time-series cell differentiation datasets?

0

Entering edit mode

jiab • 0

@jiab-12836

Last seen 8.0 years ago

The VSN proposed by [1], which generalises a VST method proposed in [2], is notorious for assuming that the various libraries in the dataset can be treated as technical replicates; i.e, most genes are not differentially expressed between conditions. When analysing a cell differentiation time-course, this is hardly an acceptable assumption.

With regards to the VST method implemented in DESeq, I am trying to figure out if it relies on the same assumption as above. While it is well established that there are undesirable artefacts introduced when the sequencing depths of each library are wildly different, I was not able to find a literature reference with regards to the type of limitation described above. However, a forum post by Huber appears to suggest that the limitation does indeed exist [3].

Could someone please confirm or deny whether the VST method implemented in DESeq assumes that the samples can be treated as technical replicates?

Thank you.

[1] Huber et al, 2002

[2] Durbin et. al, 2002

[3] http://seqanswers.com/forums/showpost.php?p=139198&postcount=8

deseq deseq2 vst • 1.7k views

ADD COMMENT • link updated 8.0 years ago by Wolfgang Huber ★ 13k • written 8.0 years ago by jiab • 0

score 3 · Answer 1 · 2017-04-13

hi,

See the section of the DESeq2 vignette talking about transformation and blind dispersion estimation. I would recommend you transform with blind=FALSE, which means, the dispersion is estimated using the experimental design, and then the global trend of dispersion over mean is used to calculate the VST.

(I don't agree with your use of the term "technical replicates" here, I would call these "biological replicates" when you have multiple samples in the same condition. Technical replicates are generally referring to the same cDNA library sequenced multiple times.)

score 2 · Answer 2 · 2017-04-13

In addition to Mike's answer, let me add, as a more general comment, that it is always helpful to distinguish between the route taken and the destination, i.e., between the assumptions made to come up with an algorithm for finding a data transformation, and the usefulness of the transformation that it produces for a dataset and scientific question at hand.

In other words: someone could write a paper saying that the logarithm is the appropriate VST for data with a constant coefficient of variation under the assumption that all samples are replicates of each other. But this does not mean that you are now only allowed to use the logarithm on data that are confirmed to follow these assumptions. You can still use the logarithm for other data, as long as it "makes sense" - a criterion that is of course subjective and requires some experience and expertise.