Hello, I have some RNA-Seq samples. These samples have no replicates, but I have lots of samples. What I am trying to do is find a trend based on a continuous variable (let's say age). I am basically using edgeR's cpm function with log as TRUE and prior count of 3. The first question is about Normalization. Is there a better way? Should I try the following? Find genes with the lowest variance, and use that to normalize the data. Which method would be better? Edger mentions this in the no-replicate section.
My main question here is how to use the ns() function when there is a batch effect. I also want to adjust for gender. Here is what I am doing, and I am wondering if there is a better way to do this. For example, can I use voom here? For the ns() function, I am using just one covariate.
X<-ns(x_ordered,df=5)
design2<-model.matrix(~X+gender_ordered+batch_ordered)
fit2<-lmFit(dat_ordered,design2)
fit2<-eBayes(fit2,trend=TRUE)
fitted_spline<-topTable(fit2,coef=2:6,n=Inf)
I am also trying a linear model as follows:
design=model.matrix(~x_ordered+gender_ordered+batch_ordered)
fit<-lmFit(dat_ordered,design)
fit<-eBayes(fit,trend = TRUE)
fitted_lm<-topTable(fit,coef=2,n=Inf)
Do I need to use the p-adjusted here? I am seeing nothing based on p-adjusted, but many based on just the p-values. This applies to both spline and linear model.
Also, using R's lm function, I am basically getting the same data.
I must say though that when I try a linear mixed effects using linear model, I am getting better fit in terms of the p-adjusted. Here, I am treating the batch effect as a random effect and everything else as fixed.
Then I am using lmertest to find significance in terms of the p-value. Finally, I am adjusting the p-value using the p.adjust() function. I am doing this without Limma, but I am wondering if there is a way to do this using Limma or similar packages.
model<-lmer(y~x+gender+(1|batches),data=new_dat)
The problem with above is that it is a linear model. Is there a way to combine the above with spline.
Hello Gordon, Thank you for your help.
For no replicates, how should I normalize the data? I thought TMM does not work without grouping? I decided to do cpm(log plus 1) to find an overall trend. My data is age data. Certainly, we have grouped the data, and did the normalization that way. We can group by, let's say, 10-20 yrs as one group, 20-30 as another group. We have generated data for that, and compared groups that way.
But what if I want to find the overall trend? Let's say, what gene trends up or down as you age?
Normalization is completely independent of whether you have replicates and it does not use groupings. And you do have replicates anyway. Your data all seems quite standard to me.
Hello Gordon, Thank you for the answer. In that case, this should not be very difficult. Nirad