Hello, I'm very sorry for posting again a question regarding RNA-seq with no replicates. I've read most of the posts and still don't know what's the best way to proceed. My experimental design was: two industrial fermenters, one as control, the other as the treatment and we took samples every 24 hours for a total of 4 samples. Due to a fail on the fermenter I have replicates only for the first time points. So My first approach was to use the last two time points as one (replicates). I've used edge R, and make contrast for each pairwise comparison I want to test. My BCV is really high (2.06) and I've used glmQLFit. So I got the FC for each of the contrasts. But my main problem is that the replicates are very very different from each other so my supervisor suggest to redo the analysis by individual fermenter run (which means without replicates). Does this seems sensible? If so, I've been checking edgeR on how to proceed but as soon as I get to calculate the dispersion it fails, I get this error: "Warning message:
In estimateGLMCommonDisp.default(y = y$counts, design = design, :
No residual df: setting dispersion to NA".
My code:
count_cols <- c("Gene","GLU24H","GLU48H","GLU72H","GLU.FA24H","GLU.FA48H","GLU.FA72H")
x<-read.delim('~/Desktop/PhDProject/3rd_Year_RESULTS/Counts/counts_run1.csv',skip=0, sep=",", check.names=FALSE, colClasses='character', na.strings=c())
x[,count_cols] <- apply(x[,count_cols], 2, function(v) as.numeric(v))
counts <- x[, count_cols]
keepMin <- apply(counts, 1, max) >= 1
keepCpm <- rowSums(cpm(counts)> 1) >= 1
keep <- keepMin & keepCpm
x <- x[keep,]
counts <- counts[keep,]
counts
group<-factor(c("GLU24H","GLU48H","GLU72H","GLU.FA24H","GLU.FA48H","GLU.FA72H"))
y <- DGEList(counts=counts, group = group)
y <- calcNormFactors(y, method="TMM")
y$samples
genes<-as.numeric(x$GENE)
typeof(genes)
PGRA_rpkm<-rpkm(counts, log=FALSE, gene.length = genes)
PGRA_rpkm2<-cbind(x$Gene,PGRA_rpkm)
plotMD(cpm(PGRA_rpkm, log=TRUE), column=6) > abline(h=0, col="red", lty=2, lwd=2)
design <- model.matrix(~0+group)
y <- estimateGLMCommonDisp(y,design)
Any help will be really useful as I'm currently stuck at this.
Thanks in advance!!
Reply to answers with "Add comment" rather than adding your own answer.
If you want to compare specific time points with
designL
, you can do something like:... in
makeContrasts
(note that I've replaced the colon with an underscore). This basically asks whether the value of the line at timeX
is the same between fermenters. Personally, though, I don't find this very interesting; it would be more meaningful to test for differences in the gradients to determine if the effect of time is different between fermenters. For example, it is possible for the gradients to go in the opposite direction w.r.t. time, but if the lines cross atX
, you would never detect DE atX
.Regarding the extra data, if you want to compare it to the two fermenter runs in your current data set, you'll have to put it in the same model. There will be clear batch effects because the two data sets were generated at different times, but if you're willing to assume that the batch effect is additive, then you can still safely compare the gradients between fermenters. However, you won't be able to directly compare expression values between fermenters, because this is confounded with the batch.
Ok, just to be sure I'm on the same page as you, you mean to compare between time and condition isn't it? that's what you mean by gradients? to determine wether, let's say, at 24H there's a difference between fermenter Glu and fermenter Glu.FA. Is that what you mean? That' what I'm looking for as well, clearly I didn't express myself right I'm sorry about that. Because I've already done that using the two fermentation runs and for all the comparisons I did I got no DEG, none!! this could be due, as you just said, the nature of the batch effect on the samples. That's why we decided to go back and use the data by run of fermentation and repeat the analysis.
So, getting back to my questions, I'm not sure what you mean by the gradients I'm sorry Aaron, could you explain to me again please? I do like your suggestion. Getting this data was really expensive and due to the failure our experimental design has suffered a lot...plus my lack of experience had make difficult the analysis.
No, we're not on the same page. I'm referring to the rate of change in expression over time (i.e., the fermenter-specific gradient of the linear regression). You can easily test whether the gradient is the same between fermenters, just do:
... in
makeContrasts
. However, if you want to test for differences in expression between fermenters at specific time points, you'll have to use the code in my first comment above. Of course, if you have batch effects between fermenters, there's not much point in doing such a comparison, as differences between fermenters would be confounded with the batch.I'm not sure what you've actually done to try to find DEGs, but failing to find any is not particularly strange; it happens all the time if you don't have enough power in your design, or there are low-quality samples, or there's just no DE in your system.
Finally, if you're having trouble, I would suggest consulting a local statistician or bioinformatician. The support site is good for dealing with specific problems, but more general questions require someone who understands the biology of the system, and that's hard to do over the Internet. Presumably you didn't sequence a whole bunch of stuff without having someone to analyze it.
Thank you very much Aaron, I'll try your suggestions now. The quality of the sequences are good and I've double check the pipeline I've used to mapped and to count the transcripts... I think the problem is the experimental design... anyway. Thank you very much! I've never asked anything before on this site and was kind of scared to do so... so very grateful with your help!!!
Hi Aaron, I have another question. So regarding the model you suggested, thank you it does lower the BCV and the average log expression looks much much better. The thing is I'm a bit confused about how to write the contrast (I know you gave me the code but I definitely most be doing something wrong as I get an error. My code below:
> GLU.FA24vrsGLU24<-makeContrasts(fermenterGLU.FA_time.as.numeric*X + fermenterGLU.FA-(fermenterGLU_time.as.numeric*24 + fermenterGLU ), levels = designL)
Error in makeContrasts(fermenterGLU.FA_time.as.numeric * X + fermenterGLU.FA - :
The levels must by syntactically valid names in R, see help(make.names). Non-valid names: fermenterGLU:time.as.numeric,fermenterGLU.FA:time.as.numeric
I've also tried:
> GLU.FA-GLU_wholeset<-makeContrasts(fermenterGLU_time.as.numeric - fermenterGLU.FA_time.as.numeric, levels=designL)
Error in makeContrasts(fermenterGLU_time.as.numeric - fermenterGLU.FA_time.as.numeric, :
The levels must by syntactically valid names in R, see help(make.names). Non-valid names: fermenterGLU:time.as.numeric,fermenterGLU.FA:time.as.numeric
I'm trying to get a comparison between each fermenter and the corresponding time point and between fermenters, can you spot what's wrong on the code? what am I missing or doing wrong?
Thank you in advance,
Stefany
You need to replace the colons in the column names of
design
with something else, as colons are special characters when writing formulae. I've assumed that colons are replaced by underscores; check outgsub
.