Question

edgeR::estimateDisp gives Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1) (minimal working example included)

0

Entering edit mode

endrebak85 ▴ 40

@endrebak85-10660

Last seen 5.6 years ago

github.com/endrebak/

require(edgeR) # edgeR_3.12.1

counts = "Sample1 Sample2 Sample3 Sample4
1 14 19 38 39
2 18 22 38 37
3 23 29 35 41
4 24 34 37 39
5 22 37 40 41"

design = "Name Sample
1 Sample1 1
2 Sample2 1
3 Sample3 0
4 Sample4 0"

totals = c(9490840, 9491499, 9483062, 9488195)

counts.df = read.table(text=counts, sep=" ")
design.df = read.table(text=design, sep=" ")

print(counts.df)
print(dim(counts.df))
print(rownames(counts.df))
## print(samples.df)

y <- DGEList(counts.df, lib.size=totals)

y <- estimateDisp(y, design.df)

If you copy paste the code above you should get the error:

Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1)

Calls: estimateDisp ... glmFit -> glmFit.default -> nonEstimable -> qr -> qr.default
In addition: Warning message:
In qr.default(x) : NAs introduced by coercion
Execution halted

Why does this happen? What can I do to fix it?

Ps. the traceback is

> traceback()
6: qr.default(x)
5: qr(x)
4: nonEstimable(design)
3: glmFit.default(y$counts[sel, ], design, offset = offset[sel,
       ], dispersion = 0.05, prior.count = 0)
2: glmFit(y$counts[sel, ], design, offset = offset[sel, ], dispersion = 0.05,
       prior.count = 0)
1: estimateDisp(y, design.df)

edgeR estimatedisp • 5.4k views

ADD COMMENT • link 8.2 years ago endrebak85 ▴ 40

score 0 · Answer 1 · 2017-02-17

0

Entering edit mode

endrebak85 ▴ 40

@endrebak85-10660

Last seen 5.6 years ago

github.com/endrebak/

My design matrix is wrong. The first column should be rownames, not a data column.

ADD COMMENT • link 8.2 years ago endrebak85 ▴ 40

1

Entering edit mode

Even so, you should not be using Sample alone as a design matrix. If you did, you would get:

design <- design.df[,"Sample",drop=FALSE]

##   Sample
## 1      1
## 2      1
## 3      0
## 4      0

Now, in a log-link GLM, the fitted value is equal to the exponential of the offset plus the a linear sum of coefficients. However, all your predictors for samples 3 and 4 are zero, meaning that the fitted value is just the exponential of the offset, which is equal to the library size. In other words, no matter what the gene is, your design matrix is saying that the mean expression in samples 3 and 4 should be equal to the total number of reads in the library. This is surely nonsensical. What you should be doing is something like this:

sample <- factor(design.df$Sample)
design <- model.matrix(~sample)

##   (Intercept) sample1
## 1           1       1
## 2           1       1
## 3           1       0
## 4           1       0
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$sample
## [1] "contr.treatment"

... which gives you an intercept term to account for gene-specific expression in all samples.