I have a dataset analyzing global transcription shifts. And the samples have been spiked with a known but unequal amount of ERCC; Such that each sample has constant ratio of biological RNA to Spike-in . It states in the RUVseq vignette:
Note that one can relax the negative control gene assumption by requiring instead the identification of a set of positive or negative controls, with a priori known expression fold-changes between samples, i.e., known β. One can then use the centered counts for these genes (logY−Xβ) for normalization purposes
If I understand correctly, to adjust for a sample in which B has four times as much spike in as A I use the equation:
Log2(Spike-in_mat) - MatX * Matbeta
**Matrix X**
| | Intercept | ploidy |
| A | 1 | 1 |
| B | 3 | 2 |
**Matrix Beta**
| | Spike 1 | Spike 2 | ... |
| Intercept | 1 | 1 | |
| Ploidy | 1 | 1 | |
And I then feed RUVg this matrix as the spike.
Is this correct?