Hello everybody,
I aim to use the QSEA R package for the analysis of my MEDIP data. My question concerns the TMM normalisation which is applied by the function qsea::addLibraryFactors. Here, a reference can be either chosen manually or if e.g. ref=1 is set, then the first sample in the sampleTable is automatically chosen.
My data might show some variety, that's why I assume, I have to make a good choice for my reference sample. What are important criteria for the appropriate choice of a reference sample?
I had a look at the function edgeR::calcNormFactors, where there is the default option that "the library whose upper quartile is closest to the mean upper quartile for all the libraries" is chosen as the reference. I would prefer to use this option, as well, but struggle to integrate it into code for qsea::addLibraryFactors / qsea::estimateLibraryFactors.
Could somebody help me with this issue?
explaining the criteria to choose an appropriate reference sample or (even better) tell me how to integrate the code from edgeR::calcNormFactors into qsea:addLibraryFactors ?
Or do I get anything wrong and might don't have concern too much about the choice of the reference sample?
Thank you very much!
Personally I have stopped using TMM normalisation in qsea, I don't think it is necessary provided you are removing poorly mapped windows correctly. As far as I'm aware, it is more important for RNAseq where you have ~30k genes and read counts spanning 4 orders of magnitude, than ~1M windows and 2 orders of magnitude (at least in my experience with MBD-Seq data). It doesn't make any difference for beta values, the scaling factors out for those.