Hi all,
I am working on a large dataset consisting of multiple different RNAseq and Microarray studies from different labs and times. While we have a (functional) pipeline setup for this, I recently saw a post which mentioned using voomLmFit to counter the issues which might stem from having excess zeroes in the data.
Since we are combining RNAseq data and microarray data for a combined analysis, we see some of these data-sparsity issues; a number of the genes in the RNAseq dataset are simply not found in the microarray datasets, and not all of our microarray datasets share a full geneset either. Ideally, we would like not to simply remove the partially sparse genes from the dataset, since doing that would drastically reduce the amount of genes available for further analyses.
My question is therefore whether a voomLmFit pipeline could be used for both the RNAseq and microarray data? I.e. is voom transformation of microarray data harmful, and if so is there another way to account for these data-sparsity issues without having to cut down our genesets drastically?
Thanks, Adam
edit: For some more context, we are not combining samples across studies into any single groups. Rather, we want to perform a group-wise comparison (with the original groups from each study), contrasting the changes between conditions. We have no repeats of condition comparisons across studies.
Hi Gordon,
Thanks for the answer! This makes a lot of sense. Your suggested pipeline (separate processing of datasets -> gene set analyses -> comparison) actually seems to fit quite well with what we have discussed internally, so it's good to know we were not that far off originally.