I am trying to merge two datasets (not data frames!) the first a bulk RNA sequencing datasets with two conditions and the second a pseudobulk dataset with 2 conditions. I need some help regarding the pipeline I need to follow inorder to correctly get differential gene expression across all conditions
I managed to curate the pseudobulk dataset by subsetting the genes from a scRNAseq dataset and sub clustered for my condition of interest. I currently have gene level counts for the pseduobulk dataset and bulkseq dataset, so a common starting point for my analysis.
My plan was to normalise the data using DESeq2 and then perform batch correction and then normalise again. But I'm not to sure at what step in the downstream analysis do I need to do the second normalisation? Or if required at all. There might be two options here:
Option 1
- Normalise bulk and pseudobulk individually (DESeq2)
- Batch Correct (limma)
- Continue with differential analysis until table of DEG is acquired and then re-normalise (DESeq2)
- Downstream analysis
Option 2
- Normalise bulk and pseudobulk together
- Batch Correct (limma)
- Continue with DEG characterisation and downstream analysis.
- If anyone has any experience with this or has come across any relevant publications, please help out!
Thanks very much, A novice informatician!