I am trying to perform differential gene expression analysis, but i'm having trouble integrating data to edgeR library.
here is the code i'm running
dir<-"/Users/timofei.ivanov/mice_project/results"
setwd(dir)
samples=read.table(file.path(dir, "INDEX_FILE"), header = TRUE)
files=file.path(samples$results_name)
names(files)=samples$sample
print(files)
txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE,countsFromAbundance = "lengthScaledTPM")
library(edgeR)
cts <- txi.rsem$counts
normMat <- txi.rsem$length
normMat <- normMat/exp(rowMeans(log(normMat)))
o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))
and at the last line i got an error
Error in calcNormFactors.default(cts/normMat) : NA counts not permitted
I've googled the problem, and it points to this:
But when i try to change the countsFromAbundance parameter of function tximport, it just won't change, yet raise no error.
What am i doing wrong?
I guess I will edit the above section of the vignette to say explicitly what I mean by "alone" (passing original counts without an offset).
I have update this section to read:
Do not do this: Do not manually pass the original counts to downstream methods without an offset. The original counts are in
txi$counts
whentximport
was run withcountsFromAbundance="no"
. This is simply passing the summed estimated counts, and does not correct for potential differential isoform usage (the offset), which is the point of the tximport methods and package (Soneson, Love, and Robinson 2015). Passing uncorrected counts without an offset is not recommended by the tximport package authors. The two methods we provide here are: “original counts and offset” or “bias corrected counts without an offset”. Passingtxi
toDESeqDataSetFromTximport
as outlined below is correct: the function creates the appropriate offset for you.But this is the problem - when i use the countsFromAbundance="NO", i get this error:
And if i change it to any of other values - it doesn't change!
ok, i've been able to work around this issue
by changing 0 lengths to 1
Is it ok to do this?
Sure this is ok. I'd argue a transcript shouldn’t have a 0 length as output from a software, but anyway this works as a workaround.
This has come up a few times, so I just added some code inside of tximport that will threshold the RSEM transcript lengths at 1 bp.
in case you wondering - tried updating tximport library and still got this error
and DEseq2 doesn't like this too:
Unless you are using the development branch of R you don’t have access to the devel Bioconductor packages until April. Sorry I didn’t make that clear in my earlier post.
Can you find where the NA are coming from? Are they in the counts or the length matrix or both? You can use the is.na() function to look for them.
...Which is weird, because i've found NaN values in workspace manually in the length matrix