I didn't find an answer to this searching the forums. I have RNASeq samples with 5'/3' biases that are unevenly distributed amongst the samples. Some of my conditions have more samples with the bias, some less - the reason for these biases is almost certainly different levels of RNA sample fragmentation or other differences in sample prep (PS: I realise that this is not a good start). This makes DESeq2 call DE amongst the bias distributions.
What is the best method for accounting for this variation in an objective way: the RUV package, adding 5'/3' calculated bias ratios to the GLM (e.g. from Picard), using residuals? Any opinions would be greatly appreciated.
Many thanks, J
Thanks!
I've done exactly this and computed TINs for each gene, and performed the loess regression. I now have the raw logcounts and corrected logcounts. I guess I am not clear in my head which value is best to use in a normFactor offset matrix, before normalising each row to a geometric mean of 1 as described in the vignette.
PS: I used % values as absolute differences can be negative and the matrix has to be positive and the package authors explicitly warn against using log differences.
Looking at the documentation, I see DESeq2 uses a matrix of "normalization factors" on the scale of the raw counts rather than a GLM offset matrix. The raw counts are divided by the normalization factors to get the normalized counts. So if normcounts = rawcounts / normfactors, then normfactors = rawcounts / normcounts. So compute that, then normalize the geometric mean of each row to 1 as described in the DESeq2 manual, and store these norm factors in the DESeqDataSet object. Finally, you'll need to run estimateSizeFactors, since the TIN normalization only normalizes within samples and you still need to normalize between samples. After that, you should be able to run through your standard DESeq2 pipeline and have it use your TIN-derived normalization factors.
(Mike, please correct me if I got anything wrong here.)
Sounds right.
If you want to correct for library size on top of normalization factors, pass the normFactors matrix (with row-wise geometric means around 1) to the normMatrix argument of estimateSizeFactors: