Dear all,
I am a newbie in this field and I am trying to solve the following issue: when I run tximport, I do not manage to get rid of transcript variants in my counts (ENSMUSG00000000001.5) Hereafter is the code I am running
library(GenomicFeatures)
txdb <- makeTxDbFromGFF("gencode.vM27.annotation.gtf.gz")
keytypes(txdb)
k <- keys(txdb, keytype = "TXNAME")
tx2gene <- select(txdb, k, "GENEID", "TXNAME")
> head(tx2gene)
TXNAME GENEID
1 ENSMUST00000193812.2 ENSMUSG00000102693.2
2 ENSMUST00000082908.3 ENSMUSG00000064842.3
3 ENSMUST00000192857.2 ENSMUSG00000102851.2
4 ENSMUST00000161581.2 ENSMUSG00000089699.2
5 ENSMUST00000192183.2 ENSMUSG00000103147.2
6 ENSMUST00000193244.2 ENSMUSG00000102348.2
library(tximport)
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreAfterBar = TRUE)
> head(txi$counts)
21L006446 21L006447 21L006449 21L006450 21L006451 21L006452
ENSMUSG00000000001.5 315.138 334.248 294.000 319.226 254.199 434.647
ENSMUSG00000000003.16 0.000 0.000 0.000 0.000 0.000 0.000
ENSMUSG00000000028.16 87.004 110.044 134.774 66.000 46.017 182.001
ENSMUSG00000000031.17 7870.144 49568.878 130934.525 4780.285 3096.264 133446.737
ENSMUSG00000000037.18 1.000 0.000 1.000 0.000 1.000 1.000
ENSMUSG00000000049.12 10.000 7.000 9.000 11.000 6.000 6.000
# I get the transcript variant, so I tried to run this other one
library(tximport)
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreTxVersion = TRUE)
# but I get this error message
> txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreTxVersion = TRUE)
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Error in .local(object, ...) :
None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.
Example IDs (file): [ENSMUST00000193812, ENSMUST00000082908, ENSMUST00000162897, ...]
Example IDs (tx2gene): [ENSMUST00000193812.2, ENSMUST00000082908.3, ENSMUST00000192857.2, ...]
This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.
#I also checked my files input (quant.sf) and they have this ENSMUST00000193812.2 ID
Do you have any suggestions? Or is there any mistake I do not see? Thank you a lot
Agree with ATpoint.
If you want to remove the version information, do that after tximport. If you are heading towards building a DESeqDataSet of DGEList, etc, you should wait until after you've built that object, and then use the code ATpoint provided on the rows of the object.
...this will replace
"<period><any other characters>"
with""
(empty string). You then take that new string and assign it to the rownames of the objectx
, e.g. that could be yourdds
.Thank both of you! Now it is much clear. I'll keep the version information for my DESeqDataSet and, change it later on.