Question

tximport's parameter for countsFromAbundance doesn't work

0

Entering edit mode

tim.ivanov.92 • 0

@timivanov92-14799

Last seen 6.2 years ago

I am trying to perform differential gene expression analysis, but i'm having trouble integrating data to edgeR library.

here is the code i'm running

dir<-"/Users/timofei.ivanov/mice_project/results"

setwd(dir)
samples=read.table(file.path(dir, "INDEX_FILE"), header = TRUE)
files=file.path(samples$results_name)
names(files)=samples$sample
print(files)
txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE,countsFromAbundance = "lengthScaledTPM")
library(edgeR)

cts <- txi.rsem$counts
normMat <- txi.rsem$length
normMat <- normMat/exp(rowMeans(log(normMat)))
o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))

and at the last line i got an error

Error in calcNormFactors.default(cts/normMat) : NA counts not permitted

I've googled the problem, and it points to this:

Do not do this: Do not manually pass txi$counts alone to downstream methods, if you have left countsFromAbundance="no". This is simply passing the summed estimated counts, and does not correct for potential differential isoform usage, which is the point of the tximport methods and package (Soneson, Love, and Robinson 2015). Passing uncorrected counts alone is not recommended by the tximport package authors.

But when i try to change the countsFromAbundance parameter of function tximport, it just won't change, yet raise no error.

What am i doing wrong?

tximport rsem edger • 3.0k views

ADD COMMENT • link 7.2 years ago tim.ivanov.92 • 0

score 0 · Answer 1 · 2018-02-07

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

You should either

use the original counts (countsFromAbundance="no") and the offset (e.g. the code producing 'o' above), or
use countsFromAbundance="lengthScaledTPM" or "scaledTPM" and do not add an offset.

You are doing both here. You have countsFromAbundance="lengthScaledTPM" and then you try to compute an offset. Note that this would create a bias because you've removed a bias and then you're trying to remove it again, so you would re-introduce the bias in the other direction. You should just follow the example in the vignette.

ADD COMMENT • link 7.2 years ago Michael Love 43k

0

Entering edit mode

I guess I will edit the above section of the vignette to say explicitly what I mean by "alone" (passing original counts without an offset).

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

I have update this section to read:

Do not do this: Do not manually pass the original counts to downstream methods without an offset. The original counts are in txi$countswhen tximport was run with countsFromAbundance="no". This is simply passing the summed estimated counts, and does not correct for potential differential isoform usage (the offset), which is the point of the tximport methods and package (Soneson, Love, and Robinson 2015). Passing uncorrected counts without an offset is not recommended by the tximport package authors. The two methods we provide here are: “original counts and offset” or “bias corrected counts without an offset”. Passing txi to DESeqDataSetFromTximport as outlined below is correct: the function creates the appropriate offset for you.

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

But this is the problem - when i use the countsFromAbundance="NO", i get this error:

> o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))
Error in calcNormFactors.default(cts/normMat) : NA counts not permitted

And if i change it to any of other values - it doesn't change!

txi.rsem <- tximport(files, type = "rsem", countsFromAbundance = "scaledTPM", txIn = FALSE, txOut = FALSE)

> txi.rsem$countsFromAbundance
[1] "no"

ADD REPLY • link 7.2 years ago tim.ivanov.92 • 0

0

Entering edit mode

ok, i've been able to work around this issue

Error in calcNormFactors.default(cts/normMat) : NA counts not permitted

by changing 0 lengths to 1

txi.rsem$length[txi.rsem$length == 0] <- 1

Is it ok to do this?

ADD REPLY • link 7.2 years ago tim.ivanov.92 • 0

0

Entering edit mode

Sure this is ok. I'd argue a transcript shouldn’t have a 0 length as output from a software, but anyway this works as a workaround.

ADD REPLY • link 7.2 years ago Michael Love 43k

1

Entering edit mode

This has come up a few times, so I just added some code inside of tximport that will threshold the RSEM transcript lengths at 1 bp.

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

in case you wondering - tried updating tximport library and still got this error

> o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))
Error in calcNormFactors.default(cts/normMat) : NA counts not permitted

and DEseq2 doesn't like this too:

> dds <- DESeqDataSetFromTximport(txi.rsem, sampleTable, ~condition)
using counts and average transcript lengths from tximport
Error: all(lengths > 0) is not TRUE

ADD REPLY • link 7.2 years ago tim.ivanov.92 • 0

1

Entering edit mode

Unless you are using the development branch of R you don’t have access to the devel Bioconductor packages until April. Sorry I didn’t make that clear in my earlier post.

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

Can you find where the NA are coming from? Are they in the counts or the length matrix or both? You can use the is.na() function to look for them.

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

> which(is.na(txi.rsem$counts))
integer(0)
> which(is.na(txi.rsem$length))
integer(0)

...Which is weird, because i've found NaN values in workspace manually in the length matrix

ADD REPLY • link 7.2 years ago tim.ivanov.92 • 0

score 0 · Answer 2 · 2018-02-07

Also, can you help me on different issue with tximport?

I'm trying to get transcripts id's from *isoforms.results files, that look like this:

transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct posterior_mean_count posterior_standard_deviation_of_count pme_TPM pme_FPKM IsoPct_from_pme_TPM TPM_ci_lower_boundTPM_ci_upper_bound FPKM_ci_lower_bound FPKM_ci_upper_bound

ENSMUST00000000001 ENSMUSG00000000001 3262 3213.00 635.00 34.94 25.37 100.00 635.00 0.00 32.81 24.98 100.00 30.2126 35.3257 22.9831 26.8735

ENSMUST00000000003 ENSMUSG00000000003 902 853.00 0.00 0.00 0.00 0.00 0.00 0.00 0.19 0.15 43.17 8.132e-07 0.579365 6.19837e-07 0.440877

ENSMUST00000114041 ENSMUSG00000000003 697 648.00 0.00 0.00 0.00 0.00 0.00 0.00 0.2

I'm doing the following lines to obtain them:

> names(files)=samples$sample
> print(files)
                              U32                               U34 
"mouse_D17-1481.isoforms.results" "mouse_D17-1482.isoforms.results" 
                              U35                              U36r 
"mouse_D17-1483.isoforms.results" "mouse_D17-1494.isoforms.results" 
                             u33r                               U42 
"mouse_D17-1495.isoforms.results" "mouse_D17-1484.isoforms.results" 
                              U43                              U45r 
"mouse_D17-1485.isoforms.results" "mouse_D17-1493.isoforms.results" 
> txi.rsem <- tximport(files, type = "rsem", countsFromAbundance = "no", txIn = TRUE, txOut = TRUE)
reading in files with read_tsv
1 2 3 4 5 6 7 8 
> head(txi.rsem$counts)
                      U32 U34    U35 U36r u33r    U42    U43   U45r
ENSMUSG00000000001 543.00 519 557.00  793  635 444.00 581.00 745.00
ENSMUSG00000000003   0.00   0   0.00    0    0   0.00   0.00   0.00
ENSMUSG00000000003   0.00   0   0.00    0    0   0.00   0.00   0.00
ENSMUSG00000000028   6.89   0   5.47    0    0   0.00   8.17  14.84
ENSMUSG00000000028   0.00   2   0.00    3    8  21.27  14.83  16.16
ENSMUSG00000000028   0.11   0   2.53    0    0   1.73   0.00   0.00

And as you can see, i'm still getting genes id's, instead of transcript id's.