Hi, they finally updated the R on our cluster so I am now using DESeq2_1.10.1 with R 3.2.2.. However, I was surprised to see that the option "fast" in rlog was not available anymore, is there a reason why? Was the approximation not good/correct? I tried to run rlog in v1.10.1 but it's taking ages (3 days and it's not done). I have 683 samples and I know that, in the vignette, it is written that with more than 100 samples it is better to use the vst function. However, the library sizes are very different so I think that rlog would be a better choice in my case. If the approximation was correct I am thinking of taking the raw code from the old package and run just rlog with that on my dataset. Do you have other suggestions?
I would remove the samples with very low size factor, e.g. the minimum one here. These are essentially failed experiments, which can be identified by sample QA (e.g. FASTQC). These samples will throw off exploratory plots like PCA no matter how you transform.
It's not very good for exploratory plots like PCA or for differential expression testing if a technical factor such as the sequencing depth is confounded with the tissue type. I know you can't help this after the fact but it's good information to pass along and keep in mind when designing experiments.
I would suggest to use vst() which is in the current version of DESeq2 (1.12), after removing the sample with very low sequencing depth. You can compare with simple log transform of normalized counts with a large pseudocount, e.g. 10, using the function normTransform().
I see that your system administrators updated to R 3.2.2 but this is no longer current. If I were you I would ask them to offer the current version of R, which is 3.3.
Note that you can add a comment to an existing post using the Add Comment / Add Reply buttons. The Add Answer text box at the bottom is for posting an answer to the original post (your original question).