Dear List,
I have read a useful segment from a BioStar post on using DESeq with ERCC controls to normalize RNAseq counts.
Contained on the page (https://www.biostars.org/p/81803/), is the statement;
"Read in the count data, subset the resulting matrix such that it includes only the spike-ins, create a DESeqDataSet from that and then just estimateSizeFactors() on the results. The size factors can then be placed in the appropriate slot on the DESeqDataSet for the full count matrix."
However, with edgeR, the process is possibly not as straightforward; DESeq has a sizeFactor slot in the CountDataSet object, whilst edgeR has lib.size and norm.factors slots in a DGEList object. lib size and size factor are different things. I can adjust the lib.size values based on weights calculated from estimateSizeFactors(). But is that valid to do (I make the assumption that norm.factors is produced by the TMM normalization step)?
I understand EdgeR does a TMM normalization step, so if the library sizes are changed manually, will the TMM normalization still be right?
So code I was thinking of could look something like;
library(DESeq) cds = newCountDataSet( Just_ERCC_Bclass, group ) cds = estimateSizeFactors( cds ) library(edgeR) my <- DGEList(counts=Not_ERCC, group=group) my$samples$lib.size<-my$samples$lib.size/sizeFactors( cds ) my <- calcNormFactors(my) .... and so on as in the manual.
What would be the right way to do this?
Thank you.
John.
Before you go down this path, you might consider what the SEQC/MAQC-III consortium has to say about using the ERCC controls for, like, anything. http://www.nature.com/nbt/journal/v32/n9/abs/nbt.2957.html
The short story is that they determined that the apparent amounts of the ERCC spike in samples varied widely, most likely due (IMO) to the fact that you have to aliquot microliter amounts of the spike-in solution, and most people use vacuum aspirating pipettes for this step, which is almost impossible to do accurately.
In other words, the manual for the ERCC spike in samples says you should aliquot 1 µl of a 1:10 dilution. So most people will do something laughably inaccurate like putting 1 µl of the concentrated solution into 9 µl RNAse free water, vortex, and then aliquot 1 µl out of that, using their trusty Rainin pipette, and you can see how that turns out by taking a look at the SEQC/MAQC-III paper I reference above.