Question

How to aggregate pseudobulks: Normalization & Log-Transformation

2

Entering edit mode

Tadeoye ▴ 20

@98d490f8

Last seen 10 weeks ago

United States

I am currently working on a single-cell data analysis project, and I am facing a challenge regarding the aggregation of single-cell data into pseudobulks for input into the GSVA software. GSVA only accepts a gene X subject matrix, which means that pseudobulks must be created to facilitate this input. I have come across two different approaches to the aggregation process and I am unsure of which one to use.

In a recent paper by Blanchard et al., pseudobulk counts were aggregated after normalizing and log-transforming the data. The authors computed normalized gene expression profile averages first, using ACTIONet, and then obtained individual-cell-type-level aggregated expression profiles. On the other hand, a single-cell tutorial suggests aggregating raw counts first, followed by normalization and log transformation. This step is important because the gaussian kernel I intend to use in GSVA software only accepts continuous expression data in logarithmic scale and RNA-seq log-CPMs, log-RPKMs, or log-TPMs units of expression.

I am unsure which approach to take. Should I normalize and log-transform the data first before aggregation, or should I aggregate first before normalization? I would greatly appreciate any guidance or insights on this matter.

pseudobulk scRNAseq GSVA • 1.1k views

ADD COMMENT • link 13 months ago Tadeoye ▴ 20