Question

Right way to pseudobulk donors across replicates in edgeR

0

Entering edit mode

Jack S. ▴ 50

@9aa6de71

Last seen 7 months ago

United States

I have a 10x single-cell dataset with 6 replicates each containing cells from the same 5 donors. For the sake of simplicity, let's assume I have only two clusters, perturbed and unperturbed. I'd like to run a pseudobulk differential expression testing, comparing the two clusters. But I want to pseudobulk each donor -- not each replicate. The complication is that each donor appears in all replicates.

One way to do this is to first aggregate the replicates using cellranger aggr, which takes care of normalization across replicates. Then I'd pseudobulk the donors and run DE testing as below:

    y <- Seurat2PB(seurat_obj, sample="donor", cluster="perturbation_status")
    y <- normLibSizes(y)
    donor <- factor(y$samples$sample)
    cluster <- as.factor(y$samples$cluster) 
    design <- model.matrix(~cluster+donor)
    ... 
    fit <- glmQLFit(y, design, robust = TRUE)
    qlf <- glmQLFTest(fit, contrast = contrast_matrix)

My question is, what is the correct way to do this on an integrated Seurat object (ie, without aggregating the replicates)? It seems to me like pseudobulking the donors across replicates as above in an integrated Seurat object would be wrong due to different library sizes in each replicate.

Obviously, I can run the tests for each donor in each replicate separately. But that would reduce the power due to decreased cell counts in each test. Also, I'd rather run just one test for each donor than 6.

Thank you!

pseudobulk edgeR DifferentialExpression • 1.5k views

ADD COMMENT • link updated 4 months ago by Reza Ghamsari • 0 • written 8 months ago by Jack S. ▴ 50

2

Entering edit mode

If the your replicate samples were from different cells but the same biological samples, then you should probably group cells from the same donor, the same replicate, and from the same cluster. In your case, you would have 5x6x2 = 60 pseudo-bulk samples.

ADD REPLY • link 8 months ago Yunshun Chen ▴ 870

0

Entering edit mode

Can you please clarify what the replicates represent? Are you simply resequencing the same libraries so that they are purely technical replicates? Or are the replicates different cells from the same biological samples? Or are the replicates separate tissue samples? It's not at all obvious what the situation is.

ADD REPLY • link 8 months ago Gordon Smyth 51k

0

Entering edit mode

Hi Gordon, all cells come from lab-grown cell cultures. Same cell line from 5 different donors... Each replicate contains a different set of cells from the same 5-donor mixture.

ADD REPLY • link 8 months ago Jack S. ▴ 50

score 1 · Answer 1 · 2024-03-06

1

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 5 hours ago

WEHI, Melbourne, Australia

I agree with Yunshun, that you should pseudo-bulk by donor-replicate-cluster groups, i.e., 5 donors x 6 replicates x 2 clusters to get 60 pseudo-bulk samples. Then you can run a DE analysis using voomLmFit with block=replicate and with model.matrix(~cluster+donor) as the design matrix.

ADD COMMENT • link 8 months ago Gordon Smyth 51k

score 0 · Answer 2 · 2024-06-22

Hi Jack,

Sorry if this is a naive question, but I'm a bit lost here. If I understand correctly, you have six batches (run) of 10x single-cell datasets, each containing a mixture of five cell lines (from five donors). You have two conditions (clusters): perturbed and unperturbed, likely with three batches each. This would result in six barcode suffixes (1-6) in your cellranger -aggr output.

Is this correct? If so, how are you annotating each donor? Are you using clustering based on single-cell data analysis or genetic variation for this purpose? should you parameterize effect of batch in your model?

Additionally, shouldn't you use raw data rather than normalised data for pseudobulk analysis? You could use the argument --normalize=none in cellranger aggr to avoid depth normalization across libraries.