Entering edit mode
Is it statistically valid to compare biologically between two different TCGA RNAseq datasets? For instance, I want to compare between TCGA-lung cancer and TCGA-mesothelioma. Is it possible to perform differential expression analysis using limma or deseq2 or they should be generated from the same experimental settings? Can this be mitigated through batch correction tools?
If these are two different batches then they're fully confounded making it statistically invalid. Biologically, I find it at least questionable since different cancers comprise not only the different oncogenic processes but also the underlying expression of the cell/tissue-of-orogin which here are wildly different. Always ask yourself whether an analysis makes sense even if statistically "ok".