Right way to pseudobulk donors across replicates in edgeR
2
0
Entering edit mode
Jack S. ▴ 50
@9aa6de71
Last seen 9 months ago
United States

I have a 10x single-cell dataset with 6 replicates each containing cells from the same 5 donors. For the sake of simplicity, let's assume I have only two clusters, perturbed and unperturbed. I'd like to run a pseudobulk differential expression testing, comparing the two clusters. But I want to pseudobulk each donor -- not each replicate. The complication is that each donor appears in all replicates.

One way to do this is to first aggregate the replicates using cellranger aggr, which takes care of normalization across replicates. Then I'd pseudobulk the donors and run DE testing as below:

    y <- Seurat2PB(seurat_obj, sample="donor", cluster="perturbation_status")
    y <- normLibSizes(y)
    donor <- factor(y$samples$sample)
    cluster <- as.factor(y$samples$cluster) 
    design <- model.matrix(~cluster+donor)
    ... 
    fit <- glmQLFit(y, design, robust = TRUE)
    qlf <- glmQLFTest(fit, contrast = contrast_matrix)

My question is, what is the correct way to do this on an integrated Seurat object (ie, without aggregating the replicates)? It seems to me like pseudobulking the donors across replicates as above in an integrated Seurat object would be wrong due to different library sizes in each replicate.

Obviously, I can run the tests for each donor in each replicate separately. But that would reduce the power due to decreased cell counts in each test. Also, I'd rather run just one test for each donor than 6.

Thank you!

pseudobulk edgeR DifferentialExpression • 1.9k views
ADD COMMENT
2
Entering edit mode

If the your replicate samples were from different cells but the same biological samples, then you should probably group cells from the same donor, the same replicate, and from the same cluster. In your case, you would have 5x6x2 = 60 pseudo-bulk samples.

ADD REPLY
0
Entering edit mode

Can you please clarify what the replicates represent? Are you simply resequencing the same libraries so that they are purely technical replicates? Or are the replicates different cells from the same biological samples? Or are the replicates separate tissue samples? It's not at all obvious what the situation is.

ADD REPLY
0
Entering edit mode

Hi Gordon, all cells come from lab-grown cell cultures. Same cell line from 5 different donors... Each replicate contains a different set of cells from the same 5-donor mixture.

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 1 day ago
WEHI, Melbourne, Australia

I agree with Yunshun, that you should pseudo-bulk by donor-replicate-cluster groups, i.e., 5 donors x 6 replicates x 2 clusters to get 60 pseudo-bulk samples. Then you can run a DE analysis using voomLmFit with block=replicate and with model.matrix(~cluster+donor) as the design matrix.

ADD COMMENT
0
Entering edit mode
@849a6f74
Last seen 6 months ago
Melbourne

Hi Jack,

Sorry if this is a naive question, but I'm a bit lost here. If I understand correctly, you have six batches (run) of 10x single-cell datasets, each containing a mixture of five cell lines (from five donors). You have two conditions (clusters): perturbed and unperturbed, likely with three batches each. This would result in six barcode suffixes (1-6) in your cellranger -aggr output.

Is this correct? If so, how are you annotating each donor? Are you using clustering based on single-cell data analysis or genetic variation for this purpose? should you parameterize effect of batch in your model?

Additionally, shouldn't you use raw data rather than normalised data for pseudobulk analysis? You could use the argument --normalize=none in cellranger aggr to avoid depth normalization across libraries.

ADD COMMENT

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6