edgeR design between and within subjects
1
0
Entering edit mode
b.nota ▴ 370
@bnota-7379
Last seen 4.3 years ago
Netherlands

Hello,

I am trying to do edgeR analysis, such as described in 3.5 of the manual. My experiment is very alike, but I don't exactly understand how to get all the contrasts of my interests. Here some code:

> data.frame(Genotype, Cell, Patient)
   Genotype Cell Patient
1        WT  TCM       1
2        WT  TEM       1
3        WT  TRM       1
4        WT  TCM       2
5        WT  TEM       2
6        WT  TRM       2
7        WT  TCM       3
8        WT  TEM       3
9        WT  TRM       3
10      MUT  TCM       1
11      MUT  TEM       1
12      MUT  TRM       1
13      MUT  TCM       2
14      MUT  TEM       2
15      MUT  TRM       2
16      MUT  TCM       3
17      MUT  TEM       3
18      MUT  TRM       3
> design <- model.matrix(~Genotype+Genotype:Cell+Genotype:Patient)
> colnames(design)
[1] "(Intercept)"         "GenotypeMUT"         "GenotypeWT:CellTEM"
[4] "GenotypeMUT:CellTEM" "GenotypeWT:CellTRM"  "GenotypeMUT:CellTRM"
[7] "GenotypeWT:Patient"  "GenotypeMUT:Patient"

Now to get some genes of interest:

# Genes respond different between TEM vs TCM in WT
lrt <- glmLRT(fit, coef="GenotypeWT:CellTEM")
topTags(lrt)

# Genes respond different between TEM vs TCM in MUT
lrt <- glmLRT(fit, coef="GenotypeMUT:CellTEM")
topTags(lrt)
# Genes respond different between TRM vs TCM in MUT
lrt <- glmLRT(fit, coef="GenotypeMUT:CellTRM")
topTags(lrt)

# Genes respond different between TRM in MUT vs TRM in WT
lrt <- glmLRT(fit, contrast=c(0,0,0,0,-1,1,0,0))
topTags(lrt)

I hope I understood this correctly.

First Question: Is it possible with this design to get genes that respond different between TRM vs TEM in MUT? Or do I have to make another design first for this specific contrast?

Second question: Is it possible to get genes that respond different between TCM in MUT vs TCM in WT?

 

edgeR design and contrast matrix • 1.6k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 4 hours ago
The city by the bay

Fiddling around with interaction terms is confusing. Make your life easier, and do this instead:

grouping <- paste0(Genotype, ".", Cell)
patient <- paste0(Genotype, Patient)
# ... assuming WT/Mut patients are different people.
# also ensures 'patient' is a factor, not a covariate.

design <- model.matrix(~0 + patient + grouping)
design <- design[,!grepl("TCM$", colnames(design))] # get to full rank

Your first 6 terms are the patient-specific blocking terms, which are not scientifically interesting. The next 2 terms represent the log-fold change of mutant TEM/TRM cells over mutant TCM cells. The last 2 terms represent the log-fold change of wild-type TEM/TRM cells over wild-type TCM cells. To compare cell types within each genotype, you can drop each of the last four terms individually to compare cell types to TCM, or use makeContrasts to set up the desired comparison between TEM and TRM:

con <- makeContrasts(groupingMUT.TEM - groupingMUT.TRM, levels=design)

However, you cannot use edgeR to directly compare between mutant and wild-type cell types, regardless of how you parametrize the design matrix. This is because the differences in genotypes are confounded with the patient factor when both are present as fixed effects in the design matrix. If you want to compare between genotypes, you can only do so "indirectly". For example, you could test whether the differences between TEM and TCM are the same between mutant and wild-type genotypes:

con <- makeContrasts(groupingMUT.TEM - groupingWT.TEM, levels=design)

... keeping in mind that grouping*.TEM already represents the log-fold change between TEM and TCM. If you want to directly compare cell types between genotypes, you will need to use limma and voom with duplicateCorrelation.

ADD COMMENT
0
Entering edit mode

Thanks for your explanation. So for the last part, which is only possible as indirect comparison (MUT.TEM vs WT.TEM). What will be the meaning of the fold changes in the topTags results? So I understand that the patient blocking does not make sense for such comparison (since WT patient 1 is not the same as MUT patient 1). Alternatively, do you think I can safely make a new design without patient blocking for this comparison? I don't wanna go to limma, since this is count data from ATAC-seq and not 'normal' RNAseq.

ADD REPLY
1
Entering edit mode

I assume you're referring to MUT.TEM vs WT.TEM. The log-fold changes for an indirect comparison will be the TEM/TCM log-fold change in mutant, minus the TEM/TCM log-fold change in the wild-types.

If you want to compare TCM in MUT against TCM in WT, I would suggest performing an edgeR analysis with only the TCM samples, using a single factor for the genotype. This avoids any problems due to unmodelled correlations between samples from the same patient, as each patient now only contributes one sample to the data subset.

However, if you want to analyze all cell types together, it is unlikely that you can avoid blocking on patient. Individual-specific effects are particularly prominent in human patient data, you will have to deal with it somehow.

ADD REPLY

Login before adding your answer.

Traffic: 637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6