understanding edgeR glmLRT
2
0
Entering edit mode
arubio • 0
@arubio-12877
Last seen 5.0 years ago

Hi, I have some transcriptomics data from a microorganism collected at different locations. I've been trying to perform an LRT using edgeR to test the location effect on transcription. What I want is to make a full model with all the coefficients, and compare it with a null model in which location plays no role. I thought I had the correct formulas but after the test absolutely all the features/tags are significant (with absurdly small FDR), so I must be doing something wrong. A mockup of my code is as follows:

x <- normalized_counts
site <- factor(c("a","a","b","b","c","c","d","d","e","e"))
design <- model.matrix(~ 0 + site)

y <- DGEList(counts=x,group=site)
y <- calcNormFactors(y)
y <- estimateDisp(y,design)
fit <- glmFit(y,design)
lrt <- glmLRT(fit, coef=c(1:ncol(fit$design)))

In case anyone knows the sleuth package, I previously used the lrt implemented there with the following configuration:

so <- sleuth_prep(s2c
                  , ~ site
                  , target_mapping = t2g
                  , aggregation_column = "ens_gene"
                  , transformation_function = function(x) log2(x + 0.5)
)
so <- sleuth_fit(so)
so <- sleuth_fit(so
                 , ~1
                 , 'reduced')
so <- sleuth_lrt(so, 'reduced', 'full')

And wanted to make a similar test using edgeR. I would greately appreciate any insights.

edger lrt • 5.1k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Your model matrix is computing the mean of each group, and then the call to glmLRT is doing an ANOVA-like test to see if any of the coefficients is different from zero. Which by definition will be true for most genes (e.g., the mean expression for most of the genes will be arguably different from zero). But that's not what you care about, but instead you care about differences between groups, in particular if there are differences in expression due to the location.

The model in sleuth appears to be fitting a factor effects model, where there is a baseline and all the other coefficients are differences between a given sample type and the baseline, and then the test is a LRT between the full and reduced model.

If you want to fit a cell means model in edgeR, you then have to define a contrast (or contrasts) between different coefficients that you want to test. This is all covered in the edgeR User's Guide, and if you haven't read that (like probably several times), you shouldn't be trying to do stuff with edgeR.

ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

Following on from James' answer, you actually need this:

site <- factor(c("a","a","b","b","c","c","d","d","e","e"))
design <- model.matrix(~site)

y <- DGEList(counts=x,group=site)
i <- filterByExpr(y)
y <- y[i, , keep.lib.size=FALSE]
y <- calcNormFactors(y)
y <- estimateDisp(y,design)
fit <- glmFit(y,design)
lrt <- glmLRT(fit, coef=2:4)

but note that edgeR assumes x to be raw counts, not "normalized counts" (whatever they are!).

ADD COMMENT

Login before adding your answer.

Traffic: 453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6