Currently, I'm interested in horizontal integration of two RNA-seq datasets derived from different platforms (S5 and Illumina), one containing normal samples and the other containing abnormal samples. Since DESeq2 input is one raw count matrix, I'm trying to find a "conversion factor" based on raw count reads of the two datasets to address the issue of the normal and abnormal samples coming from different platforms. The stategy would be using the housekeeping genes shared by the datasets. After this step, I'd use estimateSizeFactors with controlGenes.
Do you think this would be a reliable strategy? Thanks in advance
Just to be more clear: having the 2 raw count matrices, applying a conversion factor based on shared house keeping genes, and lastely combine the 2 count matrices for DESeq2 input. In the literature there are some algorithms that claim their methods work but when reading further, the integrated databases derive from the same platform.
Linear scaling is nothing different than the default normalization deseq2 does anyway. Different technologies measure different sets of genes, have different dynamic ranges and different ratios of genes compared to a set of housekeepers. A linear scaling will not do here. In any case, since here the treatment condition seems to be nested by the technology you anyway cannot do any integration. It's fully confounded, no stats magic will change that.