Use local Txome database to avoid downloading GTF from FTP server
1
0
Entering edit mode
Hasher • 0
@3d3bd645
Last seen 2.1 years ago
Canada

Hi,

I am using linkedTxome function in tximeta in order to use a local transcriptome database. However, in some random runs, the GTF file is being downloaded from a FTP server which causes errors when the device is not connected to the internet.

Here is the code I use:

  s <- data.frame(files = quant, names = sample, label = label)

  # setting BiocFileCache directory for accessing and saving TxDb sqlite files
  BFCdir <- file.path(dirname(spath), "BFC")
  setTximetaBFC(BFCdir)

  makeLinkedTxome(indexDir = index_dir,
          source = "gencode",
          organism = "human",
          release = "v31",
          genome = "GRCh38",
          fasta = fasta_file,
          gtf = gtf_path,
          write = FALSE)

  txi.methname <- tximeta(s, type = ftype, useHub = FALSE)

And the error message:

NOTE: linkedTxome with source='GENCODE', set useHub=FALSE in tximeta
to avoid download of reference txome from AnnotationHub.
alternatively use a different string for source argument
linkedTxome was different than one in bfc, replacing
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10
found matching transcriptome:
[ GENCODE - Homo sapiens - release 31 ]
building TxDb with 'GenomicFeatures' package
Import genomic features from the file as a GRanges object ... trying URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/gencode.v31.annotation.gtf.gz'
Error in download.file(resource(con), destfile) :
  cannot open URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/gencode.v31.annotation.gtf.gz'
Calls: getTxiResults2 ... import -> import -> import -> .local -> download.file
In addition: Warning message:
In download.file(resource(con), destfile) :
  cannot open URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/gencode.v31.annotation.gtf.gz': FTP status was '421 Service not available, closing control connection'
Execution halted

I was wondering if there is a way to ensure the local GTF file is going to be used instead of attempting to download it from a FTP server.

Thanks in advance.

tximeta • 2.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

The very first place you should check, before coming here to ask, is the help page for the function you are using. Which says

Arguments:

indexDir: the local path to the Salmon index

  source: the source of transcriptome (e.g. "de-novo"). Note: if you
          specify "GENCODE" or "Ensembl", this will trigger behavior by
          tximeta that may not be desired: e.g. attempts to download
          canonical transcriptome data from AnnotationHub (unless
          useHub=FALSE when running tximeta) and parsing of Ensembl GTF
          using ensembldb (which may fail if the GTF file has been
          modified). For transcriptomes that are defined by local GTF
          files, it is recommended to use the terms "LocalGENCODE" or
          "LocalEnsembl"

Which I believe clearly answers your question.

ADD COMMENT
0
Entering edit mode

I have tried "localEnsembl"; however, the transcript ids in FASTA file (and salmon quantification files) have versions while the transcript ids in the GTF file don't have the version. This causes an error in tximeta:

Error in checkAssays2Txps(assays, txps) :
  none of the transcripts in the quantification files are in the GTF

I pass ignoreTxVersion = TRUE but it looks like it is not used in tximeta and only being passed to tximport. How can I fix this?

ADD REPLY
0
Entering edit mode

The argument is used in tximeta as well. I’ve used it successfully to deal with this issue.

Can you copy paste the code that gives the error?

ADD REPLY
0
Entering edit mode

Sure, here is the code:

  s <- data.frame(files = quant, names = sample, label = label)

  # setting BiocFileCache directory for accessing and saving TxDb sqlite files
  BFCdir <- file.path(dirname(spath), "BFC")
  setTximetaBFC(BFCdir)

  makeLinkedTxome(indexDir = index_dir,
          source = "LocalEnsembl",
          organism = "human",
          release = "v31",
          genome = "GRCh38",
          fasta = fasta_file,
          gtf = gtf_path,
          write = FALSE)

tximeta(s, type = "salmon", useHub = FALSE, ignoreTxVersion = TRUE)

BTW, tximeta version is 1.12.4.

ADD REPLY
0
Entering edit mode

What ignoreTxVersion does is to remove anything after a . in the quantification files, when it is looking up the transcript in the GTF. It is just:

txId <- sub("\\..*", "", txId)

Then txId is matched to the transcript_id in the GTF. Can you show an example of a transcript ID in the quantification file and one in the GTF (e.g. show a transcript line from the GTF)?

ADD REPLY
0
Entering edit mode

I searched for the first transcript (from quantification files) in the GTF file, but I removed the version in the search:

$ head -2 quant.sf
Name    Length  EffectiveLength TPM     NumReads
ENST00000415118.1       8       1.000   0.000000        0.000
$ grep ENST00000415118 gene_annotations.gtf
14      havana  transcript      22438547        22438554        .       +       .       gene_id "ENSG00000223997"; gene_version "1"; transcript_id "ENST00000415118"; transcript_version "1"; gene_name "TRDD1"; gene_source "havana"; gene_biotype "TR_D_gene"; transcript_name "TRDD1-201"; transcript_source "havana"; transcript_biotype "TR_D_gene"; tag "cds_end_NF"; tag "cds_start_NF"; tag "mRNA_end_NF"; tag "mRNA_start_NF"; tag "basic"; transcript_support_level "NA";
14      havana  exon    22438547        22438554        .       +       .       gene_id "ENSG00000223997"; gene_version "1"; transcript_id "ENST00000415118"; transcript_version "1"; exon_number "1"; gene_name "TRDD1"; gene_source "havana"; gene_biotype "TR_D_gene"; transcript_name "TRDD1-201"; transcript_source "havana"; transcript_biotype "TR_D_gene"; exon_id "ENSE00001661486"; exon_version "1"; tag "cds_end_NF"; tag "cds_start_NF"; tag "mRNA_end_NF"; tag "mRNA_start_NF"; tag "basic"; transcript_support_level "NA";
14      havana  CDS     22438547        22438554        .       +       0       gene_id "ENSG00000223997"; gene_version "1"; transcript_id "ENST00000415118"; transcript_version "1"; exon_number "1"; gene_name "TRDD1"; gene_source "havana"; gene_biotype "TR_D_gene"; transcript_name "TRDD1-201"; transcript_source "havana"; transcript_biotype "TR_D_gene"; protein_id "ENSP00000451042"; protein_version "1"; tag "cds_end_NF"; tag "cds_start_NF"; tag "mRNA_end_NF"; tag "mRNA_start_NF"; tag "basic"; transcript_support_level "NA";
ADD REPLY
0
Entering edit mode

Can you try with the latest:

install_github(“mikelove/tximeta”)

ADD REPLY
0
Entering edit mode

I get the same error with the latest (tximeta 1.15.3).

ADD REPLY
1
Entering edit mode

This seems to be a bug I hadn't covered, can you try again (v1.15.4 on GitHub).

Thanks for the report by the way!

ADD REPLY
0
Entering edit mode

Thank you for the bug fix. It worked this time with tximeta v1.15.4.

When is this new version going to available on bioconductor?

ADD REPLY
0
Entering edit mode

In about a week:

http://bioconductor.org/developers/release-schedule/

For others, I changed the code so that source="LocalEnsembl" will behave the same way as "Ensembl", in terms of internally fixing the issue that transcript IDs in the FASTA and quantification files have release versions that are not present in the GTF.

ADD REPLY

Login before adding your answer.

Traffic: 485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6