Question

How to combine a bulk and pseudobulk dataset?

0

Entering edit mode

informatician • 0

@60367a29

Last seen 23 months ago

Belgium

I am trying to merge two datasets (not data frames!) the first a bulk RNA sequencing datasets with two conditions and the second a pseudobulk dataset with 2 conditions. I need some help regarding the pipeline I need to follow inorder to correctly get differential gene expression across all conditions

I managed to curate the pseudobulk dataset by subsetting the genes from a scRNAseq dataset and sub clustered for my condition of interest. I currently have gene level counts for the pseduobulk dataset and bulkseq dataset, so a common starting point for my analysis.

My plan was to normalise the data using DESeq2 and then perform batch correction and then normalise again. But I'm not to sure at what step in the downstream analysis do I need to do the second normalisation? Or if required at all. There might be two options here:

Option 1

Normalise bulk and pseudobulk individually (DESeq2)
Batch Correct (limma)
Continue with differential analysis until table of DEG is acquired and then re-normalise (DESeq2)
Downstream analysis

Option 2

Normalise bulk and pseudobulk together
Batch Correct (limma)
Continue with DEG characterisation and downstream analysis.
If anyone has any experience with this or has come across any relevant publications, please help out!

Thanks very much, A novice informatician!

DESeq2 limma RNASeq RNASeqData • 2.2k views

ADD COMMENT • link written 24 months ago by informatician • 0

score 1 · Answer 1 · 2023-05-09

You don't provide much information about your study, but as a general rule you shouldn't do what you are trying to do. Particularly if any of the groups are nested completely within either dataset, in which case you cannot distinguish between technical and biological differences anyway.

If the groups are not nested, you would likely be better off analyzing the data separately and then doing a meta-analysis (see e.g., the GeneMeta package).