Custom design for scRNA DGE with pseudobulking
1
0
Entering edit mode
garcia • 0
@28115c4f
Last seen 10 months ago
Germany

Hello,

I have the problem to add a custom design (here: paired design) for pseudobulk scRNA.

My experiment looks like this:

> exp.design
# A tibble: 8 × 2
  Subject  Treatment
  <chr>    <chr>    
1 Animal_1 Control  
2 Animal_1 MFGE8    
3 Animal_2 Control  
4 Animal_2 MFGE8    
5 Animal_3 Control  
6 Animal_3 MFGE8    
7 Animal_4 Control  
8 Animal_4 MFGE8

This is a paired design. I want to compare two treatments on independent subjects. E.g I want to compare "Control" with "MFGE8" only for Animal_1. (see page 40: https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf)

I have two SeuratObjects in a list. One for "Control" and one for "MFGE8". Both for Animal_1.

head(mare1_control_mfge8$eq_control@meta.data)
                              nCount_RNA nFeature_RNA  status assignment      group animal
eq_control_AAACCCAAGCAACTCT-1      60879         6735 singlet          2 eq_control      1
eq_control_AAACCCAGTCCTACAA-1      40519         6066 singlet          2 eq_control      1
eq_control_AAACCCATCCAGCTCT-1      43961         6788 singlet          2 eq_control      1
eq_control_AAACGAAAGGCTCTCG-1      56282         7325 singlet          2 eq_control      1
eq_control_AAACGCTGTTATAGAG-1      68322         7057 singlet          2 eq_control      1
eq_control_AAAGGGCAGGAGATAG-1      42782         5965 singlet          2 eq_control      1

head(mare1_control_mfge8$mfge8@meta.data)
                            nCount_RNA nFeature_RNA  status assignment    group animal
eq_MFGE8_AAACCCACACTGCATA-1      36450         5802 singlet          2 eq_MFGE8      1
eq_MFGE8_AAACCCAGTTGCATCA-1      24512         4879 singlet          2 eq_MFGE8      1
eq_MFGE8_AAAGAACGTGCCAAGA-1      34070         6334 singlet          2 eq_MFGE8      1
eq_MFGE8_AAAGGATAGTCCCGGT-1      17672         3450 singlet          2 eq_MFGE8      1
eq_MFGE8_AAAGGGCTCCATCTGC-1      44901         6081 singlet          2 eq_MFGE8      1
eq_MFGE8_AAAGTCCAGTGCACAG-1      26657         5543 singlet          2 eq_MFGE8      1

Both SeuratObjects only contain one animal and one treatment (here: group column)

> table(mare1_control_mfge8$eq_control$animal)

   1 
1028 
> table(mare1_control_mfge8$eq_control$group)

eq_control 
      1028 
> table(mare1_control_mfge8$mfge8$animal)

   1 
1161 
> table(mare1_control_mfge8$mfge8$group)

eq_MFGE8 
    1161

mare1_control_mfge8
$eq_control
An object of class Seurat 
21426 features across 1028 samples within 1 assay 
Active assay: RNA (21426 features, 0 variable features)

$mfge8
An object of class Seurat 
21426 features across 1161 samples within 1 assay 
Active assay: RNA (21426 features, 0 variable features)

Then, I perfomed the integration as explained here: https://satijalab.org/seurat/articles/integration_introduction.html because the function Seurat2PB() needs an integrated SeuratObject.

> eq.combined
An object of class Seurat 
23426 features across 2189 samples within 2 assays 
Active assay: integrated (2000 features, 2000 variable features)
 1 other assay present: RNA
 2 dimensional reductions calculated: pca, umap
> table(eq.combined$animal)

   1 
2189 
> table(eq.combined$group)

eq_control   eq_MFGE8 
      1028       1161 

head(eq.combined@meta.data %>% select(nCount_RNA,nFeature_RNA,status,assignment,group,animal,seurat_clusters))
                              nCount_RNA nFeature_RNA  status assignment      group animal seurat_clusters
eq_control_AAACCCAAGCAACTCT-1      60879         6735 singlet          2 eq_control      1               3
eq_control_AAACCCAGTCCTACAA-1      40519         6066 singlet          2 eq_control      1               0
eq_control_AAACCCATCCAGCTCT-1      43961         6788 singlet          2 eq_control      1               1
eq_control_AAACGAAAGGCTCTCG-1      56282         7325 singlet          2 eq_control      1               0
eq_control_AAACGCTGTTATAGAG-1      68322         7057 singlet          2 eq_control      1               0
eq_control_AAAGGGCAGGAGATAG-1      42782         5965 singlet          2 eq_control      1               2

Now I created the DGElist as followed:

y <- Seurat2PB(eq.combined, sample="group", cluster="seurat_clusters")
dim(y)
[1] 21426    12

head(y$samples, n=10L)
                    group lib.size norm.factors     sample cluster
eq_control_cluster0     1 22013050            1 eq_control       0
eq_control_cluster1     1  8237757            1 eq_control       1
eq_control_cluster2     1  5389188            1 eq_control       2
eq_control_cluster3     1  8365253            1 eq_control       3
eq_control_cluster4     1  3729499            1 eq_control       4
eq_control_cluster5     1  1909997            1 eq_control       5
eq_MFGE8_cluster0       1 12178747            1   eq_MFGE8       0
eq_MFGE8_cluster1       1 10343051            1   eq_MFGE8       1
eq_MFGE8_cluster2       1  9297681            1   eq_MFGE8       2
eq_MFGE8_cluster3       1  3883156            1   eq_MFGE8       3

str(y)
Formal class 'DGEList' [package "edgeR"] with 1 slot
  ..@ .Data:List of 3
  .. ..$ : num [1:21426, 1:12] 335 9 29313 42 3081 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:21426] "POLE2" "ENSECAG00000019402" "ENSECAG00000052423" "LRR1" ...
  .. .. .. ..$ : chr [1:12] "eq_control_cluster0" "eq_control_cluster1" "eq_control_cluster2" "eq_control_cluster3" ...
  .. ..$ :'data.frame': 12 obs. of  5 variables:
  .. .. ..$ group       : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ lib.size    : num [1:12] 22013050 8237757 5389188 8365253 3729499 ...
  .. .. ..$ norm.factors: num [1:12] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ sample      : chr [1:12] "eq_control" "eq_control" "eq_control" "eq_control" ...
  .. .. ..$ cluster     : chr [1:12] "0" "1" "2" "3" ...
  .. ..$ :'data.frame': 21426 obs. of  1 variable:
  .. .. ..$ gene: chr [1:21426] "POLE2" "ENSECAG00000019402" "ENSECAG00000052423" "LRR1" ...
  ..$ names: chr [1:3] "counts" "samples" "genes"

Now in section "4.10.5 Design matrix" on page 127 of the edgeRUserGuide.pdf, one can create a design matrix.

For my paired design I want to do the following according to p.40 in the UserGuide.pdf

Subject <- factor(y$samples$animal) # Animal = Subject
Treat <- factor(y$samples$sample, levels = c("eq_control","eq_MFGE8")) # Treat = sample
design <- model.matrix(~Subject+Treat)

I can't create paired design, because the column "subject" is missing in y$samples.

str(y$samples)
'data.frame':   12 obs. of  5 variables:
 $ group       : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
 $ lib.size    : num  22013050 8237757 5389188 8365253 3729499 ...
 $ norm.factors: num  1 1 1 1 1 1 1 1 1 1 ...
 $ sample      : chr  "eq_control" "eq_control" "eq_control" "eq_control" ...
 $ cluster     : chr  "0" "1" "2" "3" ...

The documentation tells me, the Seurat2PB(object, sample, cluster="seurat_clusters") can only use two columns of the SeuratObject.

Question: How can I create a design matrix with paired design for a pseudo-bulk DGEList?

On page 40 of the EdgeRUserGuide.pdf (section: 3.4.1 Paired samples), this is only shown for a simple bulk RNA dataset.

Thank you for any recommendations and tipps. Sorry for my long explanation of the problem.

scRNAseq scrna paireddesign edgeR pseudobulking • 1.3k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

You need to create a new column in eq.combined@meta.data that combines group and animal before running Seurat2PB.

ADD COMMENT
0
Entering edit mode

Thank you Gordon for your fast response.

I created an new column "group_animal".

eq.combined@meta.data %>% select(nCount_RNA,nFeature_RNA,group,animal,group_animal,RNA_snn_res.0.5,integrated_snn_res.0.5,seurat_clusters) %>% head()
                              nCount_RNA nFeature_RNA      group animal group_animal RNA_snn_res.0.5 integrated_snn_res.0.5 seurat_clusters
eq_control_AAACCCAAGCAACTCT-1      60879         6735 eq_control      1 eq_control_1               2                      4               4
eq_control_AAACCCAGTCCTACAA-1      40519         6066 eq_control      1 eq_control_1               0                      3               3
eq_control_AAACCCATCCAGCTCT-1      43961         6788 eq_control      1 eq_control_1               3                      0               0
eq_control_AAACGAAAGGCTCTCG-1      56282         7325 eq_control      1 eq_control_1               0                      3               3
eq_control_AAACGCTGTTATAGAG-1      68322         7057 eq_control      1 eq_control_1               0                      3               3
eq_control_AAAGGGCAGGAGATAG-1      42782         5965 eq_control      1 eq_control_1               1                      2               2

Then I got this after ´Seurat2PB()´

y <- Seurat2PB(eq.combined, sample="group_animal", cluster="seurat_clusters")
y <- normLibSizes(y)

y$samples
                      group lib.size norm.factors       sample cluster
eq_control_1_cluster0     1 10960426    1.1449109 eq_control_1       0
eq_control_1_cluster1     1  8596128    0.9737348 eq_control_1       1
eq_control_1_cluster2     1  5357138    0.8822733 eq_control_1       2
eq_control_1_cluster3     1 13928352    1.0477759 eq_control_1       3
eq_control_1_cluster4     1  8787091    0.8948006 eq_control_1       4
eq_control_1_cluster5     1  1974773    1.1456809 eq_control_1       5
eq_MFGE8_1_cluster0       1 11289268    1.1746136   eq_MFGE8_1       0
eq_MFGE8_1_cluster1       1  8317256    0.9724887   eq_MFGE8_1       1
eq_MFGE8_1_cluster2       1  9380191    0.8004246   eq_MFGE8_1       2
eq_MFGE8_1_cluster3       1  6679310    1.0338969   eq_MFGE8_1       3
eq_MFGE8_1_cluster4       1  4158076    0.9019585   eq_MFGE8_1       4
eq_MFGE8_1_cluster5       1   983813    1.1101006   eq_MFGE8_1       5

After normalization, I tried to create the design matrix, but it failed. Maybe, because I have only one factor in "group" ?

Subject <- factor(y$samples$group)
Subject
 [1] 1 1 1 1 1 1 1 1 1 1 1 1
Treat <- factor(y$samples$sample, levels = c("eq_control_1","eq_MFGE8_1"))
Treat
 [1] eq_control_1 eq_control_1 eq_control_1 eq_control_1 eq_control_1 eq_control_1 eq_MFGE8_1   eq_MFGE8_1   eq_MFGE8_1   eq_MFGE8_1   eq_MFGE8_1   eq_MFGE8_1  
Levels: eq_control_1 eq_MFGE8_1

design <- model.matrix(~Subject+Treat)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Do I have to modify the "group" column of y$samples ?

Maybe it should look like this?

y$samples
                      group lib.size norm.factors       sample cluster
eq_control_1_cluster0     1 10960426    1.1449109 eq_control_1       0
eq_control_1_cluster1     1  8596128    0.9737348 eq_control_1       1
eq_control_1_cluster2     1  5357138    0.8822733 eq_control_1       2
eq_control_1_cluster3     1 13928352    1.0477759 eq_control_1       3
eq_control_1_cluster4     1  8787091    0.8948006 eq_control_1       4
eq_control_1_cluster5     1  1974773    1.1456809 eq_control_1       5
eq_MFGE8_1_cluster0       2 11289268    1.1746136   eq_MFGE8_1       0
eq_MFGE8_1_cluster1       2  8317256    0.9724887   eq_MFGE8_1       1
eq_MFGE8_1_cluster2       2  9380191    0.8004246   eq_MFGE8_1       2
eq_MFGE8_1_cluster3       2  6679310    1.0338969   eq_MFGE8_1       3
eq_MFGE8_1_cluster4       2  4158076    0.9019585   eq_MFGE8_1       4
eq_MFGE8_1_cluster5       2   983813    1.1101006   eq_MFGE8_1       5

The group column was created by ´Seurat2PB()´ itself. Which information did ´Seurat2PB()´ use in order to create group?

According to my experimental design, I want compare "Control" and "MFGE8" from "Animal_1". There is only one animal. Later I want to analyse Animal_2, Animal_3 etc.

Thank you for your help so far.

ADD REPLY
3
Entering edit mode

Pseudo-bulk (PB) is a method of analysing biological replicates. You cannot do any PB analysis on just one animal because there would be no biological replication. To do a PB analysis you need to

  1. Integrate all 8 samples (4 animals x 2 treatments) into one Seurat object.
  2. Cluster the cells using the integrated data so that the clusters have the same interpretation across all samples.
  3. Run Seurat2PB using a sample factor that identifies the 8 samples.

Having done that, the downstream edgeR analysis is exactly as for bulk RNA-seq.

ADD REPLY

Login before adding your answer.

Traffic: 773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6