Dear All,
I would like to ask about using the voom-limma workflow on RNAseq data. Usually, when I run the workflow, the samples contain similar number of reads (translating into a similar number of counts between the samples); however, I have received data for 60 samples, each having 4-5.5M reads, with 2 samples having approximately 26M reads each.
My question is whether the voom-limma workflow will be able to "deal" with such a situation of such different amounts of reads or would this skew the results? If the latter is correct, what would you suggest to do to allow using these 2 samples?
Thank you very much!
Following is a general code I'm using for preparing the data for the differential limma analysis (originally taken from the limma guide). I am not providing a code specific to this situation with 60 samples, since I see my question as a more general one.
Thank you! Any advice will be greatly appreciated!
Regards,
Anna
library(limma)
library(edgeR)
dge <- DGEList(counts=dataset) # dataset - a matrix of genes x samples, containing counts.
design<-model.matrix(~0+factor(c(rep(1,3),rep(2,3),rep(3,3),rep(4,3)))); # an example design
keep <- filterByExpr(dge, design)
dge <- dge[keep,,keep.lib.sizes=FALSE]
dge <- calcNormFactors(dge)
va<-voom(dge,design,plot=TRUE);
sessionInfo( )
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.24.3 limma_3.38.3
loaded via a namespace (and not attached):
[1] compiler_3.5.2 Rcpp_1.0.1 grid_3.5.2 locfit_1.5-9.1
[5] lattice_0.20-38
Thank you very much for your answer and the examples, Dr. Smyth! I am very glad that I'm not going to either loose or have to modify the extreme files I have.
Thank you!
Regards, Anna