Dear Limma/EdgeR users,
I have 2 treatment groups, 3x biological replicates for each. I also have 2 extra samples, a pool of each treatment group.
I am comparing a "vanilla" analysis with the biological replicates, to an analysis with the pooled samples. I.e 1 vs. 1 sample.
In the EdgeR manual, there are 4 clear ways/examples of how to do an analysis without biological replicates. This has been very useful, and there is no problem; great.
However, reading about different methods for RNAseq differential gene expression has suggested the voom - lmma is a more robust approach. E.g. less susceptible to the mean - variance relationship.
http://peterhickey.org/blog/2011/11/23/bioinf-seminar-gordon-smyth.html
In addition, a recent publication also promotes the use of voom-limma over other methods due to False Positive Rates.
http://biorxiv.org/content/early/2015/06/11/020784
Bearing that in mind, I want to compare using the biological replicates to using the pooled samples alone with voom-limma, as I am able to do with EdgeR.
Is there a way for voom-limma to "learn" the variance/dispersion/weights etc from the biological replicates I have, and then use them with the pooled samples alone?
Thank you very much.
Thank you for your answer and a solution to this.
Yes you are obviously right in this situation, we don't need to look at the pooled samples.
The reason we are looking at this approach, as can often be the case, is money. We have many conditions to look at, and to do 3 biological replicates of all those conditions will be very costly (there is a limit). So one idea has been to learn the dispersions and variations in the data (build an error model), and then to pool 3 biological replicates for subsequent conditions (so we can look at more conditions).
I would be interested in your view about this approach?
Well, you can do that using the method I have described.
In your situation however, I always advise my collaborators to barcode the biological samples before pooling them, and multiplex them onto the same Illumina sequencing lane. That costs almost the same as just sequencing the pool, but gives a separate FastQ file for each biological replicate.
See the related discussion to an earlier post: EdgeR: replicated pools, yes or not?
Thank you very much for that.
It really is a good idea, which gives a balance between saving money and still having biological replicates, albeit fewer reads per rep. I have passed this idea on to the people who make the decisions.
I also have similar situation only one replicate per sample. But I can run voom if I donot specify design and keep plot=TRUE (optional). This plots the graph for variance in different samples for each miRNA mean expression value (or similar).
I am now trying to calculate the variation separately and if the miRNA have variation more than 2 fold (or similar cutoff) from the weight (given by voom), I can select the miRNA as a candidate. Ofcourse without replicates no significance testing could be done.
Any suggestion on improving my approach is welcome.
My suggestion to improve your approach is to use replicates ;-P