Entering edit mode
require(edgeR) # edgeR_3.12.1 counts = "Sample1 Sample2 Sample3 Sample4 1 14 19 38 39 2 18 22 38 37 3 23 29 35 41 4 24 34 37 39 5 22 37 40 41" design = "Name Sample 1 Sample1 1 2 Sample2 1 3 Sample3 0 4 Sample4 0" totals = c(9490840, 9491499, 9483062, 9488195) counts.df = read.table(text=counts, sep=" ") design.df = read.table(text=design, sep=" ") print(counts.df) print(dim(counts.df)) print(rownames(counts.df)) ## print(samples.df) y <- DGEList(counts.df, lib.size=totals) y <- estimateDisp(y, design.df)
If you copy paste the code above you should get the error:
Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1)
Calls: estimateDisp ... glmFit -> glmFit.default -> nonEstimable -> qr -> qr.default In addition: Warning message: In qr.default(x) : NAs introduced by coercion Execution halted
Why does this happen? What can I do to fix it?
Ps. the traceback is
> traceback() 6: qr.default(x) 5: qr(x) 4: nonEstimable(design) 3: glmFit.default(y$counts[sel, ], design, offset = offset[sel, ], dispersion = 0.05, prior.count = 0) 2: glmFit(y$counts[sel, ], design, offset = offset[sel, ], dispersion = 0.05, prior.count = 0) 1: estimateDisp(y, design.df)
Even so, you should not be using
Sample
alone as a design matrix. If you did, you would get:Now, in a log-link GLM, the fitted value is equal to the exponential of the offset plus the a linear sum of coefficients. However, all your predictors for samples 3 and 4 are zero, meaning that the fitted value is just the exponential of the offset, which is equal to the library size. In other words, no matter what the gene is, your design matrix is saying that the mean expression in samples 3 and 4 should be equal to the total number of reads in the library. This is surely nonsensical. What you should be doing is something like this:
... which gives you an intercept term to account for gene-specific expression in all samples.
Thanks for the reply. I did need an intercept :)