Question

Voom: need at least two non-NA values to interpolate

1

Entering edit mode

komal.rathi ▴ 120

@komalrathi-9163

Last seen 18 months ago

United States

Hi,

I am trying to apply voom to expected counts from RSEM (I don't have raw counts). My next step is to compute a differential test using limma.

# matrix I am trying to apply voom to has 1 row and 8021 columns:
> dim(tmp)
[1]    1 8021

# first five columns
> tmp[,1:5]
                    SRR1068687 SRR1068788 SRR1068808 SRR1068832 SRR1068855
ENSG00000149294.16     2528.61     756.53         36     158.95    1652.77

# no NAs in the data
> which(is.na(tmp))
integer(0)

# this is my design. shows that I have replicates.
> rbind(head(design),tail(design))

                                     studyGTEx studyTARGET
SRR1068687                                   1           0
SRR1068788                                   1           0
SRR1068808                                   1           0
SRR1068832                                   1           0
SRR1068855                                   1           0
SRR1068880                                   1           0
f3b75630.42da.4b66.96c5.94e9a8142261         0           1
f62f350c.f8b5.4715.ace1.387f7e59cb91         0           1
f637ca92.407d.432e.ba08.2bebe05b96f4         0           1
f835cce2.1ed3.4c59.95a2.4bfef161082b         0           1
fb8c3046.eb74.43ff.ae1d.327453221555         0           1
fdee8bef.5ce2.4ca6.b6b9.3423219a1ea4         0           1

# I get an error when I try to apply voom:
> voom(tmp, design = design)
Error in approxfun(l, rule = 2) : 
  need at least two non-NA values to interpolate

I searched for the error and I found that it is usually encountered when you have no replicates. However in my case I have enough replicates - studyGTEx has about 7000 and the remaining are studyTARGET. What could be the problem?

NOTE: I have already tried adding more rows (to make nrow(tmp)>1) and I still get the same error.

Crossposted: https://www.biostars.org/p/211073/

limma voom voom rsem fpkm • 6.6k views

ADD COMMENT • link 8.6 years ago komal.rathi ▴ 120

2

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 2.1 years ago

United States

Does your matrix only have one row and 8021 columns because you are only interested in the differential expression of one gene across 8021 samples? If so, you don't really want to use the data in this way. These differential expression methods (limma, edgeR, DESeq2, etc) borrow information across genes in order to do their magic, and you're not giving it a chance to do that here.

So, you'll want to build an expression matrix that has all of your genes across these samples at first, then remove genes that are very lowly expressed, then voom (or whatever) them.

If FPKM values are all you have, then instead of using voom, you'll want to do an "ordinary" limma analysis on the log2(fpkm + 0.25) of your FPKM values, and use eBayes(fit, trend=TRUE). Reference the following thread for more information:

A: Differential expression of RNA-seq data using limma and voom()

One last thing to note is that I think you should be careful about combining your "studyTARGET" data with GTEx. I'm not really sure exactly what you're doing, but if you find differential expression in studyTARGET vs GTEx, how will you convince yourself that your finding is a biological phenomenon instead of a purely technical one?

ADD COMMENT • link 8.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

I am trying to use voom on expected counts (not FPKM) from RSEM.

ADD REPLY • link 8.6 years ago komal.rathi ▴ 120

0

Entering edit mode

I understand that, and I am telling you that you should not use voom but rather the alternative method that i've outlined and linked to

Further, you need to fix your 1 gene at a time problem

ADD REPLY • link 8.6 years ago Steve Lianoglou ★ 13k

1

Entering edit mode

Wait, sorry -- I was confused from your original post where you said "expected counts from RSEM FPKM" and what you are saying now, which is that you are just using expected counts.

If you're using expected counts, voom should work OK. Read more here:

A: proper transformation for differential expression in normalized log expression d

ADD REPLY • link 8.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

That was a typo - removed FPKM from my question. Thanks!

ADD REPLY • link 8.6 years ago komal.rathi ▴ 120

score 1 · Accepted Answer · 2016-09-08

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 5 hours ago

WEHI, Melbourne, Australia

You say

NOTE: I have already tried adding more rows (to make nrow(tmp)>1) and I still get the same error.

With all respect, I don't think that can be true. The error message from voom() is caused by the fact that you gave voom() data for only one gene. That's what the error message is telling you. How could voom() possibly fit a mean-variance trend line through one point? If you included data for even as few as two genes, then the error message would disappear.

However you should not be using voom() with FPKM values anyway, as already covered by Steve's answer.

Why are you trying to analyze a single gene in isolation? Is it because you are running out of computer memory?

Edit: The question has since been edited to refer to RSEM expected counts instead of FPKM. Yes, voom() can run on expected counts.

A trick to reducing voom's memory footprint would be input unit weights as part of the voom call:

w <- array(1, dim(y))
v <- voom(y, design, weights=w)

This won't change the results, but will induce voom to run the linear model for each gene separately instead of all together. I have just run voom() in this way myself on a 60000 by 8000 matrix without any problems. It took a few minutes.

ADD COMMENT • link 8.6 years ago Gordon Smyth 52k

0

Entering edit mode

Yes. I cannot have all the genes in one matrix because I am facing a memory issue: I have ~60000 rows and 8000 columns if get all the data and process it. Can I not use voom on expected counts (not FPKM) from RSEM?

ADD REPLY • link 8.6 years ago komal.rathi ▴ 120

0

Entering edit mode

If you have 8000 observations, it's highly unlikely that you need to do anything other than fit a regular t-test or ANOVA or whatever. With that number of observations the central limit theorem will have fully (fully!) kicked in, and any differences in variability between groups will have been reduced to irrelevance by the sheer number of observations.

Things like edgeR and voom are intended for comparisons with small numbers of replicates, where the underlying distributions of your observations still actually matter. At 8000 observations, those concerns are well behind you.

ADD REPLY • link 8.6 years ago James W. MacDonald 68k

0

Entering edit mode

The 8000 observations are broken down by tissue type and I compare each tissue type with every other. So there could be 5 replicates in one tissue type and 1000 in other. The biggest concern I have is if I can really voom the expected counts that I have for RSEM or not.

ADD REPLY • link 8.6 years ago komal.rathi ▴ 120