Question

EdgeR RNAseq -- best way to deal with random effects

0

Entering edit mode

emb13 • 0

@emb13-18397

Last seen 6.0 years ago

Hello,

I am working on an RNAseq project in EdgeR and I have a random effect variable that I am not sure how to best deal with since EdgeR doesn't use mixed models. I have three main questions: 1) how have people dealt with random effects in EdgeR, or do you use different programs to accomplish this? 2) How are blocking and batch effects different in EdgeR (the manual suggests using the same additive model for both), and 3) is it possible to block for a variable that is unreplicated (see below)?

Set up: I have a control and three dose groups (low, medium, high) each with five replicates. Within each treatment group, samples came from one of twelve different mesocosms, with each mesocosm representing a single dose. Thus, there are three mesocosms contributing between one and three individuals to a treatment group. For example, the five high dose individuals had three individuals from the S2h mesocosm and one each from the S3h and C1h mesocosms. After reading the EdgeR manual, I tried blocking for mesocosm, but because I don't have replicates for each mesocosm, I got an error message due to my design matrix not having a full rank:

> mfit <- glmQLFit(e2, design4)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 groupH groupL groupM

Here is the code that I was using to set this up:

group <- factor(c("C","H","L","M","H","C","L","M","C","M","H","M","L","H","C","L","H","C","M","L"))
mesocosm <- factor(c("C1c", "C1h", "S2l", "S2m", "S2h", "S2c", "S2l", "S2m", "S2c", "S2m", "S2h", "C1m", "S2l", "S2h", "S3c", "S3l", "S3h", "S3c", "S3m", "S3l")) 
design4 <- model.matrix(~mesocosm+group)
mfit <- glmQLFit(e2, design4)
mqlf <- glmQLFTest(fit, coef=4:6)

Thank you for your help!

edger rnaseq • 3.3k views

ADD COMMENT • link updated 5.4 years ago by gil.hornung • 0 • written 6.4 years ago by emb13 • 0

0

Entering edit mode

Would I be correct to guess that each "mesocosm" in your experiment is a separate physical field station with its own environmental setup? Does any physical station contribute individuals with more than one dose? In other words, do you really have 12 different physical environments or only 3? I would guess that it would usually be possible to apply more than one dose within the same physical environmental.

ADD REPLY • link 6.4 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon,

Thank you for your reply. All mesocosms are at the same physical site, but the site is broken down into fenced in "pads" which contain the mesocosms (which are individual cattle tanks). Thus, the S1 mesocosms (S1h, S1m, S1l, S1c) are within one fenced pad. The pads are very close to one another and any sort of environmental variation (i.e. shading from sun) is just as likely across mesocosms as it is across pads.

ADD REPLY • link 6.4 years ago emb13 • 0

score 1 · Answer 1 · 2018-11-20

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 54 minutes ago

The city by the bay

1. edgeR doesn't use GLMMs so there's no way to handle random effects directly. You can either switch to voom with duplicateCorrelation, or you can subset the data so that only one sample is present from each level of the random effect blocking factor. I don't think this is relevant here, though (see my response to 3).

2. I think of the batch effect as the change in expression between batches, while blocking terms are the analytical constructs (i.e., coefficients in the design matrix) that enable estimation of the batch effect. So when you "block on batch", you're adding a blocking term for each batch to your design matrix, which allows estimation of the batch effect as the fitted values for the corresponding GLM coefficients.

3. Not really. As it stands, mesocosm is fully nested within group, so you can't block on mesocosm if you have group in your model. But I would say that you've formulated mesocosm somewhat incorrectly, because it seems to combine things like "C1", "S2" and "S3" (presumably environmental levels?) with the group levels. This suggests two choices for further analysis. The first is to use the additive model like what you're already doing:

actual.mesocosm <- sub("[lhmc]$", "", mesocosm)
design5 <- model.matrix(~group + actual.mesocosm)

... which assumes that the group and actual mesocosm effects are additive. Alternatively, you could just use mesocosm alone:

# Using no-intercept for ease of interpretation.
design6 <- model.matrix(~ 0 + mesocosm)

... which is effectively an interaction model for group/mesocosm. For the latter, you can test for the average effect of each group by comparing the grand averages across mesocosm levels, e.g.:

con <- makeContrasts((C1h + S2h + S3h)/3 - (C1m + S2m + S3m)/3, levels=design6)

ADD COMMENT • link 6.4 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron,

Thank you so much for your reply. It is correct that dose is completely nested in mesocosm (see my comment to Gordon above). Because of this I have decided to try the third option as this best represents the interaction between group and mesocosm which is apparent in my MDS plot. Is setting the contrasts in this way this the same as taking the average expression for each MSTRG for each mesocosm (average C1 individuals, S3 individuals, and S2 individuals for each dose) reducing my sample size to three instead of five?

ADD REPLY • link 6.4 years ago emb13 • 0

0

Entering edit mode

Given the clarification about your design, I suspect that the interaction model is no longer appropriate. It seems like the different mesocosms represent a level of replication rather than being factors of interest in their own right. To wit; if someone were to repeat the experiment, does it even make sense to talk about re-using the same mesocosms? Would their "S1" mesocosm correspond to your "S1" mesocosm, or are these just arbitrary labels you used to distinguish the different mesocosms? If the latter, there's not much point studying the interaction effect as it's not something that's reproducible. Only the effect of scientific interest (i.e., group) would be something that could be replicated, in which case you should use the additive design. Any variation between mesocosms will be absorbed into the dispersion, which is a good thing, as this accurately represents the variation that you'd expect to see if someone were to replicate your experiment. (Otherwise you'd get findings that were specific to your mesocosms.)

ADD REPLY • link 6.4 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you again for the feedback, Aaron!

ADD REPLY • link 6.4 years ago emb13 • 0

score 0 · Answer 2 · 2019-11-20

0

Entering edit mode

gil.hornung • 0

@gilhornung-9503

Last seen 5.4 years ago

There is also another package, called dream which enables the use of random effects in RNA-seq data analysis. I haven't used it personally.

https://www.biorxiv.org/content/biorxiv/early/2018/10/03/432567.full.pdf

https://bioconductor.riken.jp/packages/devel/bioc/vignettes/variancePartition/inst/doc/dream.html

ADD COMMENT • link 5.4 years ago gil.hornung • 0