Hi,
I am trying to apply voom to expected counts from RSEM (I don't have raw counts). My next step is to compute a differential test using limma.
# matrix I am trying to apply voom to has 1 row and 8021 columns: > dim(tmp) [1] 1 8021 # first five columns > tmp[,1:5] SRR1068687 SRR1068788 SRR1068808 SRR1068832 SRR1068855 ENSG00000149294.16 2528.61 756.53 36 158.95 1652.77 # no NAs in the data > which(is.na(tmp)) integer(0) # this is my design. shows that I have replicates. > rbind(head(design),tail(design)) studyGTEx studyTARGET SRR1068687 1 0 SRR1068788 1 0 SRR1068808 1 0 SRR1068832 1 0 SRR1068855 1 0 SRR1068880 1 0 f3b75630.42da.4b66.96c5.94e9a8142261 0 1 f62f350c.f8b5.4715.ace1.387f7e59cb91 0 1 f637ca92.407d.432e.ba08.2bebe05b96f4 0 1 f835cce2.1ed3.4c59.95a2.4bfef161082b 0 1 fb8c3046.eb74.43ff.ae1d.327453221555 0 1 fdee8bef.5ce2.4ca6.b6b9.3423219a1ea4 0 1 # I get an error when I try to apply voom: > voom(tmp, design = design) Error in approxfun(l, rule = 2) : need at least two non-NA values to interpolate
I searched for the error and I found that it is usually encountered when you have no replicates. However in my case I have enough replicates - studyGTEx has about 7000 and the remaining are studyTARGET. What could be the problem?
NOTE: I have already tried adding more rows (to make nrow(tmp)>1) and I still get the same error.
Crossposted: https://www.biostars.org/p/211073/
Yes. I cannot have all the genes in one matrix because I am facing a memory issue: I have ~60000 rows and 8000 columns if get all the data and process it. Can I not use voom on expected counts (not FPKM) from RSEM?
If you have 8000 observations, it's highly unlikely that you need to do anything other than fit a regular t-test or ANOVA or whatever. With that number of observations the central limit theorem will have fully (fully!) kicked in, and any differences in variability between groups will have been reduced to irrelevance by the sheer number of observations.
Things like edgeR and voom are intended for comparisons with small numbers of replicates, where the underlying distributions of your observations still actually matter. At 8000 observations, those concerns are well behind you.
The 8000 observations are broken down by tissue type and I compare each tissue type with every other. So there could be 5 replicates in one tissue type and 1000 in other. The biggest concern I have is if I can really voom the expected counts that I have for RSEM or not.