Hi Bioconductor !
I have a time course RNAseq data from 8 time-points with 3 replicates per sample. These data were aligned with STAR, I get the featureCount output. With these file, I wan't to use MasigPro package to evaluate which genes are moving during this process. But I've got a problem with the notion of negative binomial model. I read a lot of post to understand that the RNAseq libraries follow this model due to differences between biological replicates inducing that variance differs from mean.
I've got a table with samples in columns and genes in rows, and effectively when I make a plot with log(row mean) vs log(var mean) I can see the deviation from the line mean=var.
But in this package, they said "θ must be specified and it can be computed by using available methods as edgeR" I understand that Theta is a parameter for the glm (seem to be the "shape" of the glm). But I don't know how to determine it. I read that MASS package can do this, I follow this tutoriel to better understand: https://stats.stackexchange.com/questions/10419/what-is-theta-in-a-negative-binomial-regression-fitted-with-r
but It not seem to work if i make
mean <- apply(data, 1, mean) var <- apply(data, 1, var) mod.NB<-glm.nb(mean~var) Error in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, : NA/NaN/Inf dans 'x' De plus : There were 50 or more warnings (use warnings() to see the first 50)
So I switch to the edgeR notice to understand how to calculate it. I find that "In this article, we call φg the dispersion ", so It seem to be the same theta as MasigPro defines it. But I cannot find the function to calculate it ...
Does-anybody can help me to understand this problem? Maybe I miss something, I'm new in bioinformatics and statistics
Thanks a lot nicolas
delete miss-localisation