Question

edgeR design between and within subjects

0

Entering edit mode

b.nota ▴ 370

@bnota-7379

Last seen 4.6 years ago

Netherlands

Hello,

I am trying to do edgeR analysis, such as described in 3.5 of the manual. My experiment is very alike, but I don't exactly understand how to get all the contrasts of my interests. Here some code:

> data.frame(Genotype, Cell, Patient)
   Genotype Cell Patient
1        WT  TCM       1
2        WT  TEM       1
3        WT  TRM       1
4        WT  TCM       2
5        WT  TEM       2
6        WT  TRM       2
7        WT  TCM       3
8        WT  TEM       3
9        WT  TRM       3
10      MUT  TCM       1
11      MUT  TEM       1
12      MUT  TRM       1
13      MUT  TCM       2
14      MUT  TEM       2
15      MUT  TRM       2
16      MUT  TCM       3
17      MUT  TEM       3
18      MUT  TRM       3
> design <- model.matrix(~Genotype+Genotype:Cell+Genotype:Patient)
> colnames(design)
[1] "(Intercept)"         "GenotypeMUT"         "GenotypeWT:CellTEM"
[4] "GenotypeMUT:CellTEM" "GenotypeWT:CellTRM"  "GenotypeMUT:CellTRM"
[7] "GenotypeWT:Patient"  "GenotypeMUT:Patient"

Now to get some genes of interest:

# Genes respond different between TEM vs TCM in WT
lrt <- glmLRT(fit, coef="GenotypeWT:CellTEM")
topTags(lrt)

# Genes respond different between TEM vs TCM in MUT
lrt <- glmLRT(fit, coef="GenotypeMUT:CellTEM")
topTags(lrt)

# Genes respond different between TRM vs TCM in MUT
lrt <- glmLRT(fit, coef="GenotypeMUT:CellTRM")
topTags(lrt)

# Genes respond different between TRM in MUT vs TRM in WT
lrt <- glmLRT(fit, contrast=c(0,0,0,0,-1,1,0,0))
topTags(lrt)

I hope I understood this correctly.

First Question: Is it possible with this design to get genes that respond different between TRM vs TEM in MUT? Or do I have to make another design first for this specific contrast?

Second question: Is it possible to get genes that respond different between TCM in MUT vs TCM in WT?

edgeR design and contrast matrix • 1.8k views

ADD COMMENT • link updated 5 weeks ago by wuwenhui2 • 0 • written 7.7 years ago by b.nota ▴ 370

score 3 · Answer 1 · 2017-08-22

3

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 1 hour ago

The city by the bay

Fiddling around with interaction terms is confusing. Make your life easier, and do this instead:

grouping <- paste0(Genotype, ".", Cell)
patient <- paste0(Genotype, Patient)
# ... assuming WT/Mut patients are different people.
# also ensures 'patient' is a factor, not a covariate.

design <- model.matrix(~0 + patient + grouping)
design <- design[,!grepl("TCM$", colnames(design))] # get to full rank

Your first 6 terms are the patient-specific blocking terms, which are not scientifically interesting. The next 2 terms represent the log-fold change of mutant TEM/TRM cells over mutant TCM cells. The last 2 terms represent the log-fold change of wild-type TEM/TRM cells over wild-type TCM cells. To compare cell types within each genotype, you can drop each of the last four terms individually to compare cell types to TCM, or use makeContrasts to set up the desired comparison between TEM and TRM:

con <- makeContrasts(groupingMUT.TEM - groupingMUT.TRM, levels=design)

However, you cannot use edgeR to directly compare between mutant and wild-type cell types, regardless of how you parametrize the design matrix. This is because the differences in genotypes are confounded with the patient factor when both are present as fixed effects in the design matrix. If you want to compare between genotypes, you can only do so "indirectly". For example, you could test whether the differences between TEM and TCM are the same between mutant and wild-type genotypes:

con <- makeContrasts(groupingMUT.TEM - groupingWT.TEM, levels=design)

... keeping in mind that grouping*.TEM already represents the log-fold change between TEM and TCM. If you want to directly compare cell types between genotypes, you will need to use limma and voom with duplicateCorrelation.

ADD COMMENT • link 7.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for your explanation. So for the last part, which is only possible as indirect comparison (MUT.TEM vs WT.TEM). What will be the meaning of the fold changes in the topTags results? So I understand that the patient blocking does not make sense for such comparison (since WT patient 1 is not the same as MUT patient 1). Alternatively, do you think I can safely make a new design without patient blocking for this comparison? I don't wanna go to limma, since this is count data from ATAC-seq and not 'normal' RNAseq.

ADD REPLY • link 7.7 years ago b.nota ▴ 370

1

Entering edit mode

I assume you're referring to MUT.TEM vs WT.TEM. The log-fold changes for an indirect comparison will be the TEM/TCM log-fold change in mutant, minus the TEM/TCM log-fold change in the wild-types.

If you want to compare TCM in MUT against TCM in WT, I would suggest performing an edgeR analysis with only the TCM samples, using a single factor for the genotype. This avoids any problems due to unmodelled correlations between samples from the same patient, as each patient now only contributes one sample to the data subset.

However, if you want to analyze all cell types together, it is unlikely that you can avoid blocking on patient. Individual-specific effects are particularly prominent in human patient data, you will have to deal with it somehow.

ADD REPLY • link 7.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Aaron Lun Hi Aaron,

Your explanation is very helpful. If having three time points instead of three cell types measured for each patient, using the design matrix you recommended, how to set up contrast to extract genes that have different expression profiles (different at any time point) between the two genotypes?

Thank you very much! Look forward to your reply.

ADD REPLY • link 5 weeks ago wuwenhui2 • 0