Dear list,
I have a large collection of RNA-seq data (300 samples from different cells types, e.g. 2 samples from Basophils, 3 samples from cultured Mast cells etc; the number of replicates per cell type varies from 2 to 5). I would like to analyze the counts or the TPM data and find correlated RefSeq genes. I have estimated Kendall correlations of the TPM data. For comparison reasons, I would also like to model the association between the gene counts data by the GLM model of negative binomial distribution. In my data, I expect that non-parametric correlations and glm model will give quite different results for several gene pairs. For example,
gene1<-rnbinom(300,mu=50,size=5) gene2<-rnbinom(300,mu=50,size=5) plot(gene1,gene2) cor.test(gene1,gene2,method="kendall") summary(glm.nb(gene2 ~ gene1))
I understand that the above glm model is over-simplified (for example it does not take into account the replicates information) but at this
point I am not worried about that yet. My question is whether it is possible to incorporate the offset (library size * normalization
factor) to the glm model e.g. the way that edgeR works. Looking at edgeR source code at "mglmLS.R" function (a similar expression also exist at "mglmLevenberg.R"): mu <- exp(beta %*% t(X) + offset)
In my case, would it be correct if I estimated the offset by
offset <- getOffset()
and then I estimated the gene / gene associations by glm using
something like mu <- exp(beta %*% t(X-offset) + offset) ? if yes, is
there any reason for this model to follow the edgeR dispersion
estimation method or simply glm.nb would do what I want?
Thank you in advance,
Makis
Hi.
I have used this code to correlate the expression of 4 genes (each separately) to the the rest of genes. I've found that in some cases the correlations goes above 1 (logFC column) ¿How is this possible?
Just as control I've seen that the correlation is always 1 when comparing any of the 4 genes agains themself, meaning that genes with correlation above 1 are other genes (not the initial 4).
The only difference in my code is that my counts are RSEM not raw counts, but in theory should work as I've read somewhere else.
Thanks!
How to visualize this result (lrt table with lfc pvalues etc ) to be able to see the expression of GeneX correlated to the other differentially expressed genes in R?