Hello,
I have the problem to add a custom design (here: paired design) for pseudobulk scRNA.
My experiment looks like this:
> exp.design
# A tibble: 8 × 2
Subject Treatment
<chr> <chr>
1 Animal_1 Control
2 Animal_1 MFGE8
3 Animal_2 Control
4 Animal_2 MFGE8
5 Animal_3 Control
6 Animal_3 MFGE8
7 Animal_4 Control
8 Animal_4 MFGE8
This is a paired design. I want to compare two treatments on independent subjects. E.g I want to compare "Control" with "MFGE8" only for Animal_1. (see page 40: https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf)
I have two SeuratObjects in a list. One for "Control" and one for "MFGE8". Both for Animal_1.
head(mare1_control_mfge8$eq_control@meta.data)
nCount_RNA nFeature_RNA status assignment group animal
eq_control_AAACCCAAGCAACTCT-1 60879 6735 singlet 2 eq_control 1
eq_control_AAACCCAGTCCTACAA-1 40519 6066 singlet 2 eq_control 1
eq_control_AAACCCATCCAGCTCT-1 43961 6788 singlet 2 eq_control 1
eq_control_AAACGAAAGGCTCTCG-1 56282 7325 singlet 2 eq_control 1
eq_control_AAACGCTGTTATAGAG-1 68322 7057 singlet 2 eq_control 1
eq_control_AAAGGGCAGGAGATAG-1 42782 5965 singlet 2 eq_control 1
head(mare1_control_mfge8$mfge8@meta.data)
nCount_RNA nFeature_RNA status assignment group animal
eq_MFGE8_AAACCCACACTGCATA-1 36450 5802 singlet 2 eq_MFGE8 1
eq_MFGE8_AAACCCAGTTGCATCA-1 24512 4879 singlet 2 eq_MFGE8 1
eq_MFGE8_AAAGAACGTGCCAAGA-1 34070 6334 singlet 2 eq_MFGE8 1
eq_MFGE8_AAAGGATAGTCCCGGT-1 17672 3450 singlet 2 eq_MFGE8 1
eq_MFGE8_AAAGGGCTCCATCTGC-1 44901 6081 singlet 2 eq_MFGE8 1
eq_MFGE8_AAAGTCCAGTGCACAG-1 26657 5543 singlet 2 eq_MFGE8 1
Both SeuratObjects only contain one animal and one treatment (here: group column)
> table(mare1_control_mfge8$eq_control$animal)
1
1028
> table(mare1_control_mfge8$eq_control$group)
eq_control
1028
> table(mare1_control_mfge8$mfge8$animal)
1
1161
> table(mare1_control_mfge8$mfge8$group)
eq_MFGE8
1161
mare1_control_mfge8
$eq_control
An object of class Seurat
21426 features across 1028 samples within 1 assay
Active assay: RNA (21426 features, 0 variable features)
$mfge8
An object of class Seurat
21426 features across 1161 samples within 1 assay
Active assay: RNA (21426 features, 0 variable features)
Then, I perfomed the integration as explained here: https://satijalab.org/seurat/articles/integration_introduction.html
because the function Seurat2PB()
needs an integrated SeuratObject.
> eq.combined
An object of class Seurat
23426 features across 2189 samples within 2 assays
Active assay: integrated (2000 features, 2000 variable features)
1 other assay present: RNA
2 dimensional reductions calculated: pca, umap
> table(eq.combined$animal)
1
2189
> table(eq.combined$group)
eq_control eq_MFGE8
1028 1161
head(eq.combined@meta.data %>% select(nCount_RNA,nFeature_RNA,status,assignment,group,animal,seurat_clusters))
nCount_RNA nFeature_RNA status assignment group animal seurat_clusters
eq_control_AAACCCAAGCAACTCT-1 60879 6735 singlet 2 eq_control 1 3
eq_control_AAACCCAGTCCTACAA-1 40519 6066 singlet 2 eq_control 1 0
eq_control_AAACCCATCCAGCTCT-1 43961 6788 singlet 2 eq_control 1 1
eq_control_AAACGAAAGGCTCTCG-1 56282 7325 singlet 2 eq_control 1 0
eq_control_AAACGCTGTTATAGAG-1 68322 7057 singlet 2 eq_control 1 0
eq_control_AAAGGGCAGGAGATAG-1 42782 5965 singlet 2 eq_control 1 2
Now I created the DGElist as followed:
y <- Seurat2PB(eq.combined, sample="group", cluster="seurat_clusters")
dim(y)
[1] 21426 12
head(y$samples, n=10L)
group lib.size norm.factors sample cluster
eq_control_cluster0 1 22013050 1 eq_control 0
eq_control_cluster1 1 8237757 1 eq_control 1
eq_control_cluster2 1 5389188 1 eq_control 2
eq_control_cluster3 1 8365253 1 eq_control 3
eq_control_cluster4 1 3729499 1 eq_control 4
eq_control_cluster5 1 1909997 1 eq_control 5
eq_MFGE8_cluster0 1 12178747 1 eq_MFGE8 0
eq_MFGE8_cluster1 1 10343051 1 eq_MFGE8 1
eq_MFGE8_cluster2 1 9297681 1 eq_MFGE8 2
eq_MFGE8_cluster3 1 3883156 1 eq_MFGE8 3
str(y)
Formal class 'DGEList' [package "edgeR"] with 1 slot
..@ .Data:List of 3
.. ..$ : num [1:21426, 1:12] 335 9 29313 42 3081 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:21426] "POLE2" "ENSECAG00000019402" "ENSECAG00000052423" "LRR1" ...
.. .. .. ..$ : chr [1:12] "eq_control_cluster0" "eq_control_cluster1" "eq_control_cluster2" "eq_control_cluster3" ...
.. ..$ :'data.frame': 12 obs. of 5 variables:
.. .. ..$ group : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
.. .. ..$ lib.size : num [1:12] 22013050 8237757 5389188 8365253 3729499 ...
.. .. ..$ norm.factors: num [1:12] 1 1 1 1 1 1 1 1 1 1 ...
.. .. ..$ sample : chr [1:12] "eq_control" "eq_control" "eq_control" "eq_control" ...
.. .. ..$ cluster : chr [1:12] "0" "1" "2" "3" ...
.. ..$ :'data.frame': 21426 obs. of 1 variable:
.. .. ..$ gene: chr [1:21426] "POLE2" "ENSECAG00000019402" "ENSECAG00000052423" "LRR1" ...
..$ names: chr [1:3] "counts" "samples" "genes"
Now in section "4.10.5 Design matrix" on page 127 of the edgeRUserGuide.pdf, one can create a design matrix.
For my paired design I want to do the following according to p.40 in the UserGuide.pdf
Subject <- factor(y$samples$animal) # Animal = Subject
Treat <- factor(y$samples$sample, levels = c("eq_control","eq_MFGE8")) # Treat = sample
design <- model.matrix(~Subject+Treat)
I can't create paired design, because the column "subject" is missing in y$samples
.
str(y$samples)
'data.frame': 12 obs. of 5 variables:
$ group : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
$ lib.size : num 22013050 8237757 5389188 8365253 3729499 ...
$ norm.factors: num 1 1 1 1 1 1 1 1 1 1 ...
$ sample : chr "eq_control" "eq_control" "eq_control" "eq_control" ...
$ cluster : chr "0" "1" "2" "3" ...
The documentation tells me, the Seurat2PB(object, sample, cluster="seurat_clusters")
can only use two columns of the SeuratObject.
Question: How can I create a design matrix with paired design for a pseudo-bulk DGEList?
On page 40 of the EdgeRUserGuide.pdf (section: 3.4.1 Paired samples), this is only shown for a simple bulk RNA dataset.
Thank you for any recommendations and tipps. Sorry for my long explanation of the problem.
Thank you Gordon for your fast response.
I created an new column "group_animal".
Then I got this after ´Seurat2PB()´
After normalization, I tried to create the design matrix, but it failed. Maybe, because I have only one factor in "group" ?
Do I have to modify the "group" column of
y$samples
?Maybe it should look like this?
The
group
column was created by ´Seurat2PB()´ itself. Which information did ´Seurat2PB()´ use in order to creategroup
?According to my experimental design, I want compare "Control" and "MFGE8" from "Animal_1". There is only one animal. Later I want to analyse Animal_2, Animal_3 etc.
Thank you for your help so far.
Pseudo-bulk (PB) is a method of analysing biological replicates. You cannot do any PB analysis on just one animal because there would be no biological replication. To do a PB analysis you need to
Seurat2PB
using a sample factor that identifies the 8 samples.Having done that, the downstream edgeR analysis is exactly as for bulk RNA-seq.