Question

Pseudo-bulk celltype comparison across conditions from multiple-sources

0

Entering edit mode

EspressoKris ▴ 10

@84e617fb

Last seen 14 months ago

United Kingdom

Hi all,

I am aiming to merge my dataset with other available online. Data will be SCTv2 normalised, and visual integration will be done by Harmony.

For DGE analysis, I would like to perform pseudo-bulk comparisons (either via EdgeR or DESeq2). I have got two questions:

1) Should I aggregate raw counts or PrepSCTFindmarker counts 2) What's the best way to account for batch effect across datasets?

Thanks

DESeq2 edgeR scRNAseq • 2.0k views

ADD COMMENT • link updated 22 months ago by Peter Hickey ▴ 740 • written 22 months ago by EspressoKris ▴ 10

0

Entering edit mode

FYI a great resource for learning about this type of analysis is the relevant chapter of 'Orchestrating Single Cell Analysis with Bioconductor' book (OSCA): https://bioconductor.org/books/3.16/OSCA.multisample/multi-sample-comparisons.html

ADD REPLY • link 22 months ago Peter Hickey ▴ 740

score 1 · Answer 1 · 2023-03-10

1

Entering edit mode

zuljiamel1991 ▴ 10

@41691191

Last seen 18 months ago

Germany

Hi KuriGura

Regarding DESeq2:

1) In my opinion the safest would be to use counts from RNA assay as input for DESeq2. DESeq2 has pretty robust normalization methods and will account for sequencing depth. Also, i think going for SCTv2 corrected counts should not violate any underlying assumptions but lets see wether developers will comment. For future reference, it is always useful to use tags for software in question (DESeq2 and EdgeR in this case - you only used scRNAseq tag)

2) In DESeq2 you can account for batch with appropriate formula. For example:

`dds <- DESeqDataSetFromMatrix(bulk_clus_counts, colData = sample_meta, design = ~ batch + condition)`

where batch is column with your batch info and condition is defining groups which you want to compare.

ADD COMMENT • link 22 months ago zuljiamel1991 ▴ 10

1

Entering edit mode

Hi zuljiamel1991

Thanks for your prompt response! Just added the tag as advised.

I think what you say make sense, since technically raw counts should be provided on DESeq2. Interestingly though, I previously had a go using SCTv2 corrected counts and results made sense. It would be interesting to see how many DEGs would be shared across a corrected and non run.

I wonder if Michael Love could comment further on this.

Anyway, many thanks!

ADD REPLY • link 22 months ago EspressoKris ▴ 10

2

Entering edit mode

"SCTv2 corrected counts should not violate any underlying assumptions"

Corrected counts would violate assumptions.

DESeq2 accounts for the same issues as SCTv2. You should always use raw counts as input.