How to look at the output of the Poisson data model

0

Entering edit mode

Milena Gongora ▴ 80

@milena-gongora-2035

Last seen 10.2 years ago

Dear Mark (and edgeR users), Thanks so much for your feedback. That solved the problem and also was good advice regarding the low tag counts. I am facing another obstacle: I tried fitting the Poisson model instead of the negative binomial. The MAplot and boxplots from this method look good. Now my problem is, how do I get a table with the resulting differentially expressed tags of out of the Poisson model? topTags() does not work on the object resulting from exactTestNB(), because it is not a deDGEList... My full code is below. Thanks again! #----------------------- d <- new("DGEList", list(data=as.matrix(mydata), group=grouping, lib.size=lib_size)) d head(d$data) A.REP1 A.REP2 B.REP1 B.REP2 100168004_100169791_- 2 4 3 1 100175732_100176961_- 0 3 2 3 100471575_100982094_- 0 0 0 4 101147427_101152326_- 1 0 0 4 101147433_101152326_- 3 1 1 2 101152469_101153215_- 4 21 8 46 qA <- quantileAdjust(d, r.init=1000, n.iter=1) par(mfrow=c(1,2)) boxplot(as.data.frame(sqrt(d$data))) boxplot(as.data.frame(sqrt(qA$pseudo))) f <- exactTestNB(qA$pseudo, d$group, qA$N * qA$ps$p, r=1000) #------------------------ Mark Robinson wrote: > Hi Milena. > > Small oversight on my part. If you use 'as.matrix(mydata)' in place of > 'mydata', you should have no problems. This is imposed on the data matrix > in the next version (which might take a day or so to come online). > > A couple of other small points: > > 1. You may want to use the new constructor (instead of creating the list > yourself): > > d<-DGEList(data=y,group=grouping,lib.size=lib_size) > > 2. In terms of analysis of differential expression, there may not be much > value in the rows of your table that have VERY low counts (e.g. your > 100134814_100136947). In the past, I've noticed some instabilities in the > NB calculations with very low counts and I've generally filtered out the > rows with (say) 3 or less total counts. > > Cheers, > Mark > > > > >> Dear edgeR developers and users, >> >> Just started using edgeR with next generation sequence (count) data. >> When calculating alpha, I am getting the following error: >> >> alpha <- alpha.approxeb(d) >> [quantileAdjust] Iteration (dot=1000) 1 :Error in y1/matrix(rep(1, >> nrow(y1)), ncol = 1) %*% lib.size1 : >> non-numeric argument to binary operator >> >> I can't work out why I am getting this... any ideas? >> Thanks! >> >> The full code is below: >> -------------- >> mydata <- read.table("myFile.txt", header=TRUE, row.names="ID") >> head(mydata) >> A.REP1 A.REP2 B.REP1 B.REP2 >> 100011872_100012727_- 0 0 0 2 >> 100017569_100017878_- 1 0 0 0 >> 100134814_100136947_- 0 0 0 0 >> 100134931_100136947_- 0 2 0 0 >> 100137054_100138100_- 1 0 0 1 >> 100137831_100138100_- 1 0 0 0 >> >> grouping <- c(1,1,2,2) >> lib_size <- as.numeric(apply(mydata,2,sum)) >> lib_size >> [1] 352812 573571 401573 719698 >> >> d <- new("DGEList", list(data=mydata, group=grouping, lib.size=lib_size)) >> alpha <- alpha.approxeb(d) >> [quantileAdjust] Iteration (dot=1000) 1 :Error in y1/matrix(rep(1, >> nrow(y1)), ncol = 1) %*% lib.size1 : >> non-numeric argument to binary operator >> --------------------- >> >> -- >> Milena >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor- >> Milena

edgeR edgeR • 1.0k views

ADD COMMENT • link updated 16.1 years ago by Mark Robinson ★ 1.1k • written 16.2 years ago by Milena Gongora ▴ 80

0

Entering edit mode

Mark Robinson ★ 1.1k

@mark-robinson-2171

Last seen 10.2 years ago

Hi Milena. Thank you for your comments. I had originally added the Poisson special case as somewhat of an afterthought. But, it appears that it is worthwhile to have it covered by the 'deDGE' command itself. So, ... I've made the following changes: 1. In the 'deDGE' command, there is now an argument 'doPoisson', which when set to TRUE skips the dispersion moderation step, but still calculates the 'exact' test on the quantile-adjusted data. So, the Poisson analysis returns an object of the correct type for 'topTags'. 2. I've added a 'plotMA' method for 'deDGEList' objects, similar to limma's ... it saves some of the mucking about forming M and A that was previously done by hand. 3. Vignette updated. Sorry I hadn't thought of doing this sooner! Cheers, Mark > Dear Mark (and edgeR users), > > Thanks so much for your feedback. That solved the problem and also was > good advice regarding the low tag counts. > > I am facing another obstacle: > I tried fitting the Poisson model instead of the negative binomial. The > MAplot and boxplots from this method look good. > Now my problem is, how do I get a table with the resulting > differentially expressed tags of out of the Poisson model? > topTags() does not work on the object resulting from exactTestNB(), > because it is not a deDGEList... > > My full code is below. > Thanks again! > > #----------------------- > > d <- new("DGEList", list(data=as.matrix(mydata), group=grouping, > lib.size=lib_size)) > d > head(d$data) > A.REP1 A.REP2 B.REP1 B.REP2 > 100168004_100169791_- 2 4 3 1 > 100175732_100176961_- 0 3 2 3 > 100471575_100982094_- 0 0 0 4 > 101147427_101152326_- 1 0 0 4 > 101147433_101152326_- 3 1 1 2 > 101152469_101153215_- 4 21 8 46 > > qA <- quantileAdjust(d, r.init=1000, n.iter=1) > > par(mfrow=c(1,2)) > boxplot(as.data.frame(sqrt(d$data))) > boxplot(as.data.frame(sqrt(qA$pseudo))) > > f <- exactTestNB(qA$pseudo, d$group, qA$N * qA$ps$p, r=1000) > > #------------------------ > > Mark Robinson wrote: >> Hi Milena. >> >> Small oversight on my part. If you use 'as.matrix(mydata)' in place of >> 'mydata', you should have no problems. This is imposed on the data >> matrix >> in the next version (which might take a day or so to come online). >> >> A couple of other small points: >> >> 1. You may want to use the new constructor (instead of creating the list >> yourself): >> >> d<-DGEList(data=y,group=grouping,lib.size=lib_size) >> >> 2. In terms of analysis of differential expression, there may not be >> much >> value in the rows of your table that have VERY low counts (e.g. your >> 100134814_100136947). In the past, I've noticed some instabilities in >> the >> NB calculations with very low counts and I've generally filtered out the >> rows with (say) 3 or less total counts. >> >> Cheers, >> Mark >> >> >> >> >>> Dear edgeR developers and users, >>> >>> Just started using edgeR with next generation sequence (count) data. >>> When calculating alpha, I am getting the following error: >>> >>> alpha <- alpha.approxeb(d) >>> [quantileAdjust] Iteration (dot=1000) 1 :Error in y1/matrix(rep(1, >>> nrow(y1)), ncol = 1) %*% lib.size1 : >>> non-numeric argument to binary operator >>> >>> I can't work out why I am getting this... any ideas? >>> Thanks! >>> >>> The full code is below: >>> -------------- >>> mydata <- read.table("myFile.txt", header=TRUE, row.names="ID") >>> head(mydata) >>> A.REP1 A.REP2 B.REP1 B.REP2 >>> 100011872_100012727_- 0 0 0 2 >>> 100017569_100017878_- 1 0 0 0 >>> 100134814_100136947_- 0 0 0 0 >>> 100134931_100136947_- 0 2 0 0 >>> 100137054_100138100_- 1 0 0 1 >>> 100137831_100138100_- 1 0 0 0 >>> >>> grouping <- c(1,1,2,2) >>> lib_size <- as.numeric(apply(mydata,2,sum)) >>> lib_size >>> [1] 352812 573571 401573 719698 >>> >>> d <- new("DGEList", list(data=mydata, group=grouping, >>> lib.size=lib_size)) >>> alpha <- alpha.approxeb(d) >>> [quantileAdjust] Iteration (dot=1000) 1 :Error in y1/matrix(rep(1, >>> nrow(y1)), ncol = 1) %*% lib.size1 : >>> non-numeric argument to binary operator >>> --------------------- >>> >>> -- >>> Milena >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor- >>> > Milena > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 16.1 years ago Mark Robinson ★ 1.1k

0

Entering edit mode

Hello Mark, Excellent news. Thanks a lot for implementing it so quickly! I used the Poisson model because I suspected that it might fit the data better, as the negative binomial has given me somewhat unexpected results. Maybe I can ask for your advice again.... When fitting the moderated negative binomial model, if alpha is very small, does it indicate that the model is not suitable? With my data, I got alpha=1.632020e-08. Continued with the process and the result of topTags() is that all tags come up with a P.Value=0 and adj.P.Val=0. Additionally, when plotting an MAplot, blue dots fall around the middle of the scatter, around an M value of 0, rather than on the edges of the scatter. (see code below) Would this reflect that the negative binomial is not the right model? or more likely this would be an artifact or me doing something wrong? In contrary when I fit the Poisson model and subsequently do an MAplot, it looks more like what I would have expected from an array, where the highlighted tags locate to the edges of the scatter, away from an M value of 0. Would this mean that the Poisson distribution is a better fit? Many thanks, Milena #-------------------- grouping <- c(1,1,2,2) lib_size <- as.numeric(apply(mydata,2,sum)) lib_size 324031 527758 370766 658864 d <- new("DGEList", list(data=as.matrix(mydata), group=grouping, lib.size=lib_size)) head(d$data) A.REP1 A.REP2 B.REP1 B.REP2 100168004_100169791_- 2 4 3 1 100175732_100176961_- 0 3 2 3 100471575_100982094_- 0 0 0 4 101147427_101152326_- 1 0 0 4 101147433_101152326_- 3 1 1 2 101152469_101153215_- 4 21 8 46 alpha <- alpha.approxeb(d) alpha EBList: alpha=1.632020e-08 ms <- deDGE(d, alpha=alpha$alpha) ms adj.p <- p.adjust(ms$exact, "fdr") k <- (adj.p < 0.05) plot((log2(ms$ps$p1) + log2(ms$ps$p2))/2, log2(ms$ps$p2/ms$ps$p1), pch=19, xlab="A", ylab="M", col=c("black", "blue")[k+1]) boxplot(as.data.frame(sqrt(d$data))) topTags(ms) A M P.Value adj.P.Val 100471575_100982094_- -59.08348 31.88121 0 0 101943878_101944171_- -41.62854 -33.39554 0 0 102221513_102227940_- -42.43538 -32.58870 0 0 103547901_103548588_- -58.27195 33.50425 0 0 103859430_103859667_- -42.18712 -32.83696 0 0 104130431_104130871_+ -58.42573 33.19671 0 0 104146565_104146648_+ -58.67857 32.69102 0 0 105193815_105194263_+ -42.33242 -32.69166 0 0 111619777_111620478_- -42.56783 -32.45625 0 0 115429727_115447242_+ -42.28483 -32.73925 0 0 #-------------------- Mark Robinson wrote: > Hi Milena. > > Thank you for your comments. I had originally added the Poisson special > case as somewhat of an afterthought. But, it appears that it is > worthwhile to have it covered by the 'deDGE' command itself. So, ... > > I've made the following changes: > > 1. In the 'deDGE' command, there is now an argument 'doPoisson', which > when set to TRUE skips the dispersion moderation step, but still > calculates the 'exact' test on the quantile-adjusted data. So, the > Poisson analysis returns an object of the correct type for 'topTags'. > > 2. I've added a 'plotMA' method for 'deDGEList' objects, similar to > limma's ... it saves some of the mucking about forming M and A that was > previously done by hand. > > 3. Vignette updated. > > Sorry I hadn't thought of doing this sooner! > > Cheers, > Mark > > > > > >> Dear Mark (and edgeR users), >> >> Thanks so much for your feedback. That solved the problem and also was >> good advice regarding the low tag counts. >> >> I am facing another obstacle: >> I tried fitting the Poisson model instead of the negative binomial. The >> MAplot and boxplots from this method look good. >> Now my problem is, how do I get a table with the resulting >> differentially expressed tags of out of the Poisson model? >> topTags() does not work on the object resulting from exactTestNB(), >> because it is not a deDGEList... >> >> My full code is below. >> Thanks again! >> >> #----------------------- >> >> d <- new("DGEList", list(data=as.matrix(mydata), group=grouping, >> lib.size=lib_size)) >> d >> head(d$data) >> A.REP1 A.REP2 B.REP1 B.REP2 >> 100168004_100169791_- 2 4 3 1 >> 100175732_100176961_- 0 3 2 3 >> 100471575_100982094_- 0 0 0 4 >> 101147427_101152326_- 1 0 0 4 >> 101147433_101152326_- 3 1 1 2 >> 101152469_101153215_- 4 21 8 46 >> >> qA <- quantileAdjust(d, r.init=1000, n.iter=1) >> >> par(mfrow=c(1,2)) >> boxplot(as.data.frame(sqrt(d$data))) >> boxplot(as.data.frame(sqrt(qA$pseudo))) >> >> f <- exactTestNB(qA$pseudo, d$group, qA$N * qA$ps$p, r=1000) >> >> #------------------------ >> >> Mark Robinson wrote: >> >>> Hi Milena. >>> >>> Small oversight on my part. If you use 'as.matrix(mydata)' in place of >>> 'mydata', you should have no problems. This is imposed on the data >>> matrix >>> in the next version (which might take a day or so to come online). >>> >>> A couple of other small points: >>> >>> 1. You may want to use the new constructor (instead of creating the list >>> yourself): >>> >>> d<-DGEList(data=y,group=grouping,lib.size=lib_size) >>> >>> 2. In terms of analysis of differential expression, there may not be >>> much >>> value in the rows of your table that have VERY low counts (e.g. your >>> 100134814_100136947). In the past, I've noticed some instabilities in >>> the >>> NB calculations with very low counts and I've generally filtered out the >>> rows with (say) 3 or less total counts. >>> >>> Cheers, >>> Mark >>> >>> >>> >>> >>> >>>> Dear edgeR developers and users, >>>> >>>> Just started using edgeR with next generation sequence (count) data. >>>> When calculating alpha, I am getting the following error: >>>> >>>> alpha <- alpha.approxeb(d) >>>> [quantileAdjust] Iteration (dot=1000) 1 :Error in y1/matrix(rep(1, >>>> nrow(y1)), ncol = 1) %*% lib.size1 : >>>> non-numeric argument to binary operator >>>> >>>> I can't work out why I am getting this... any ideas? >>>> Thanks! >>>> >>>> The full code is below: >>>> -------------- >>>> mydata <- read.table("myFile.txt", header=TRUE, row.names="ID") >>>> head(mydata) >>>> A.REP1 A.REP2 B.REP1 B.REP2 >>>> 100011872_100012727_- 0 0 0 2 >>>> 100017569_100017878_- 1 0 0 0 >>>> 100134814_100136947_- 0 0 0 0 >>>> 100134931_100136947_- 0 2 0 0 >>>> 100137054_100138100_- 1 0 0 1 >>>> 100137831_100138100_- 1 0 0 0 >>>> >>>> grouping <- c(1,1,2,2) >>>> lib_size <- as.numeric(apply(mydata,2,sum)) >>>> lib_size >>>> [1] 352812 573571 401573 719698 >>>> >>>> d <- new("DGEList", list(data=mydata, group=grouping, >>>> lib.size=lib_size)) >>>> alpha <- alpha.approxeb(d) >>>> [quantileAdjust] Iteration (dot=1000) 1 :Error in y1/matrix(rep(1, >>>> nrow(y1)), ncol = 1) %*% lib.size1 : >>>> non-numeric argument to binary operator >>>> --------------------- >>>> >>>> -- >>>> Milena >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor- >>>> >>>> >> Milena >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor -- Milena

ADD REPLY • link 16.1 years ago Milena Gongora ▴ 80

0

Entering edit mode

Hi Milena. See comments below. > When fitting the moderated negative binomial model, if alpha is very > small, does it indicate that the model is not suitable? > > With my data, I got alpha=1.632020e-08. Not at all. alpha is analogous to the prior degrees of freedom in a limma analysis (but not on such a nice scale). Basically, alpha controls how much moderation is done. There is a lot more to say about this, but its best to read the Bioinformatics paper and ask me if you have any questions. What might be of interest is the smoothed dispersion estimates. For example: 1/ms$r[1:100] # first 100 dispersion estimates If these are all close to 0, then thats a good indication that the NB is basically just Poisson ( .... since Poisson is the special case where the dispersion=0). > Continued with the process and the result of topTags() is that all > tags come up with a P.Value=0 and adj.P.Val=0. > Additionally, when plotting an MAplot, blue dots fall around the > middle of the scatter, around an M value of 0, rather than on the > edges of the scatter. > (see code below) I bet these are all very low total counts. It appears that this is a bug, but its not something I've come across in my own tests, so I don't know what is causing it. Do you mind making your data available for me? Feel free to add dummy rownames/colnames so I don't know what the data is. Email me offline if you can. > Would this reflect that the negative binomial is not the right > model? or more likely this would be an artifact or me doing > something wrong? > > In contrary when I fit the Poisson model and subsequently do an > MAplot, it looks more like what I would have expected from an array, > where the highlighted tags locate to the edges of the scatter, away > from an M value of 0. Would this mean that the Poisson distribution > is a better fit? Not necessarily. Poisson versus NB is a question of whether there is 'extra-Poisson' variation? i.e. more variability than explained by the model. You won't be able to tell that from alpha values or MA plots. There are goodness of fit tests for this but I don't know how well they'll work in small samples ... perhaps a mean-variance plot of the pseudo data would be a good starting point. My observation has been that for SAGE data with *biological* replicates there is definitely more variation than explained by Poisson (i.e. go with NB, moderation of dispersion is helpful), but next gen sequencing on technical replicates looks Poisson. I'd be happy to hear other people's impressions. Cheers, Mark ------------------------------ Mark Robinson Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robinson at garvan.org.au e: mrobinson at wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852

ADD REPLY • link 16.1 years ago Mark Robinson ★ 1.1k

Login before adding your answer.