Entering edit mode
Hi all,
I am trying to use maSigPro to conduct time-course analyis with only one condition. However, it is not feasible. I have checked the guideline or other blogs, but there is no answer. I wonder how to realize this. Thank you!
I use the example data here for practice.
library(maSigPro)
data(data.abiotic)
data(edesign.abiotic)
edesign.abiotic <- as.data.frame(edesign.abiotic)
design.n <- edesign.abiotic[1:9,1:3]
data.n <- data.abiotic[,rownames(design.n)]
colnames(design.n) <-c("Time","Replicate","Group")
design <- make.design.matrix(design.n)
design$groups.vector
# [1] "Group" "Group"
head(design.n)
# Time Replicate Group
#Control_3H_1 3 1 1
#Control_3H_2 3 1 1
#Control_3H_3 3 1 1
#Control_9H_1 9 2 1
#Control_9H_2 9 2 1
#Control_9H_3 9 2 1
head(data.n)
# Control_3H_1 Control_3H_2 Control_3H_3 Control_9H_1 Control_9H_2
#STMDF90 0.13735714 -0.36530651 -0.15329448 0.44754535 0.287476796
#STMCJ24 NA NA NA NA NA
#STMJH42 0.07864449 0.10023285 -0.17365488 -0.25279484 0.184855409
#STMDE66 0.22976991 0.47409748 0.46930716 0.37101059 -0.004992029
#STMIX74 0.14407618 -0.48018637 -0.07847999 0.05692331 0.013045420
#STMCL34 0.06494078 0.04472986 0.04280790 0.16128091 0.088048892
fit <- p.vector(data.n,
design,
Q = 0.05,
MT.adjust = "BH",
min.obs = 20)
# Error in dat[i, ] : subscript out of bounds
Yes, you are right! Great thanks! BTW, according to your working experience on maSigPro, do you think it is suitable for differential analysis of time-series RNA-seq data with only one condition or group design? I'm a beginner with this package so I'm not sure if this is appropriate. Thank you again!
Yeah, it's fine. What
maSigPro
does in this case is to reformulate the data. Your design has columns that are not useful, andmaSigPro
is smart enough to ignore them. If you run it under the debugger, you can see what happens to your design matrix:And then for each gene it fits a model using the reformulated data, which if you look at
design$dis
is fitting a linear and a quadratic term (time + time^2) to account for any possible curvature.And the fitted model looks like this, where you can see that it's estimating an intercept, a time and a time^ component:
It then fits a reduced model
That doesn't contain any time coefficients, and then uses the
anova
function to do a Chi-square test comparing the full and reduced models, which in effect asks if the model that includes time fits the data better than a model with just an intercept (in this case a flat line at 0.088).And you can see that this gene is significant before adjusting for multiplicity, but will not be significant after the adjustment.
This test is a likelihood ratio test (LRT), and is not that common for a gaussian fit - people normally use t-tests for conventional linear regression, like this:
Where you can see that the p-values for both the linear and quadratic term are significant. But
maSigPro
allows you to use count data as well, in which case you would use either the negative binomial or Poisson distributions, and in which case one could argue that the LRT is a better statistic than the Wald. In addition, using an LRT provides one p-value rather than two, so it's easier to say if a gene is significant or not based on a single p-value.Technically it's fitting an analysis of deviance (ANODEV) test, but the principle is the same - deviance, like the likelihood, measures goodness of fit for a linear regression. The ratio of the deviance (or likelihood) tells you how different the models fit the data, and under the null is distributed Chi-square, so you can test for a difference in the model fit by comparing the ratio to a Chi-square distribution.