Question

Problems with oarfish + tximport + swish

0

Entering edit mode

brown.annaleigh • 0

@brownannaleigh-22443

Last seen 1 hour ago

United Kingdom

I've aligned long read data with oarfish and now I wanted to try out doing differential isoform usage with fishpond + swish

I've aligned to the gencode v42 human fasta file, which has the following header conformation

 head /SAN/vyplab/vyplab_reference_genomes/sequence/human/gencode/gencode.v42.transcripts.fa
>ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA|
GTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTC
TCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGA
TGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTG

I've got a dataframe of the file paths of the data and the names.

This is fine

k = tximeta(coldata = data.frame(files = m, names = names(m)), type = 'oarfish')
y <- scaleInfReps(k) # scales counts
y <- labelKeep(y) # labels features to keep
set.seed(120)
y <- swish(y, x="condition") # simplest Swish case

The errors start appearing here:

# run on the transcript-level dataset
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE

Ok that's fine, perhaps tximeta wasn't able to properly infer the right species/version, but I should be able to manually make that

tx_2_gene <- mcols(y) %>% 
    as.data.frame() %>% 
    rownames_to_column('gene') %>% 
    separate(gene,sep = "\\|",into = c("transcript","geneCol",NA,NA,"transcript_id"
                                       ,"symbol",'tx_leng','biotype'),
             remove = FALSE) %>% 
    select(gene,geneCol)

And let's see if that works

updated_m <- mcols(y) %>% 
    as.data.frame() %>% 
    rownames_to_column('gene') %>% 
    left_join(tx_2_gene) %>% 
    column_to_rownames('gene') 

mcols(y) <- DataFrame(updated_m)
# run on the transcript-level dataset
iso <- isoformProportions(y)
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE
> "geneCol" %in% names(mcols(y)) 
[1] TRUE

Let's try the summarizeToGene gene function

gse <- summarizeToGene(k,tx2gene = tx_2_gene)
Error in missingMetadata(object, summarize = TRUE) : 
  use of this function requires transcriptome metadata which is missing.
  either: (1) the object was not produced by tximeta, or
  (2) tximeta could not recognize the digest of the transcriptome.
  If (2), use a linkedTxome to provide the missing metadata and rerun tximeta
  or use tx2gene, txOut=FALSE (and skipMeta=TRUE if Salmon/piscem)

Okay following the linkedTxome path

fastaFTP <- c("https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/gencode.v42.transcripts.fa.gz")
> gtfPath <- file.path("/Users/annaleigh/Downloads/gencode.v42.annotation.gtf.gz")
> makeLinkedTxome(indexDir='/Users/annaleigh/cluster/vyplab/first_weeks/recalled_nanopore/oarfish/', 
+                 source="localgencode", 
+                 organism="Homo sapiens",
+                 release="42", 
+                 genome="GRCH38", 
+                 fasta=fastaFTP, 
+                 gtf=gtfPath, 
+                 write=FALSE,
+                 jsonFile = "provaLinkedTxome.json")
Error: lexical error: invalid char in json text.
                                      /Users/annaleigh/cluster/vyplab/
                     (right here) ------^

So - what am I doing wrong here?

Sessioninfo

> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Ventura 13.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.20.0         fishpond_2.12.0             lubridate_1.9.4             forcats_1.0.0               purrr_1.0.2                
 [6] tidyr_1.3.1                 tibble_3.2.1                tidyverse_2.0.0             tximeta_1.24.0              IsoformSwitchAnalyzeR_2.6.0
[11] pfamAnalyzeR_1.6.0          dplyr_1.1.4                 stringr_1.5.1               readr_2.1.5                 sva_3.54.0                 
[16] genefilter_1.88.0           mgcv_1.9-1                  nlme_3.1-166                satuRn_1.14.0               DEXSeq_1.52.0              
[21] RColorBrewer_1.1-3          AnnotationDbi_1.68.0        DESeq2_1.46.0               SummarizedExperiment_1.36.0 GenomicRanges_1.58.0       
[26] GenomeInfoDb_1.42.1         IRanges_2.40.1              S4Vectors_0.44.0            MatrixGenerics_1.18.1       matrixStats_1.5.0          
[31] Biobase_2.66.0              BiocGenerics_0.52.0         BiocParallel_1.40.0         limma_3.62.2                ggplot2_3.5.1              
[36] data.table_1.16.4

fishpond tximport tximeta • 197 views

ADD COMMENT • link 1 day ago • updated 1 hour ago brown.annaleigh • 0

score 0 · Answer 1 · 2025-01-15

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 hour ago

United States

This is fine

Did tximeta() print that a matching transcriptome was found?

ADD COMMENT • link 22 hours ago Michael Love 43k

0

Entering edit mode

I'm not certain that this will work with Oarfish, the documentation only mentions Salmon and piscem.

I'd use tximport for now.

ADD REPLY • link 22 hours ago Michael Love 43k

0

Entering edit mode

Success

k = tximport(file = m,type = 'oarfish',
         txOut = TRUE,
         tx2gene = tx2gene)

Failure

    y <- scaleInfReps(k) # scales counts
Error: unable to find an inherited method for function 'metadata' for signature 'x = "list"'




head(tx2gene)
                                                                                                                                        txid           geneCol
                                                                                                                                      <char>            <char>
1:                                               ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA| ENSG00000290825.1
2: ENST00000450305.2|ENSG00000223972.6|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene| ENSG00000223972.6
3:              ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene| ENSG00000227232.5
4:                                                                 ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA| ENSG00000278267.1
5:                     ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|MIR1302-2HG-202|MIR1302-2HG|712|lncRNA| ENSG00000243485.5
6:                     ENST00000469289.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002841.2|MIR1302-2HG-201|MIR1302-2HG|535|lncRNA| ENSG00000243485.5

Thank you for any advice

ADD REPLY • link 8 hours ago brown.annaleigh • 0

0

Entering edit mode

Lol I'm not smart

This solution didn't work because....the column name was wrong

updated_m <- mcols(y) %>% 
    as.data.frame() %>% 
    rownames_to_column('txid') %>% 
    left_join(tx2gene) %>% 
    column_to_rownames('txid') 
mcols(y) <- DataFrame(updated_m)
iso <- isoformProportions(y,geneCol = 'geneCol')

however, tximport does not produce a format that the fishpond data can read out of the gate

ADD REPLY • link 5 hours ago brown.annaleigh • 0

0

Entering edit mode

That's true. We haven't yet developed the Oarfish to tximeta pipeline yet.

Can you say what functionality you need? Do you want to do testing on isoform proportions (DTU)?

You could use tximeta with skipMeta=TRUE which would produce the correctly shaped SummarizedExperiment object, and then manually add the gene ID.

ADD REPLY • link 3 hours ago Michael Love 43k

0

Entering edit mode

Trying to do differential gene expression(for other reasons) and also DTU in the same bucket. I will try the skipMeta=TRUE and update with code for fellow lost Googlers

ADD REPLY • link 2 hours ago brown.annaleigh • 0

0

Entering edit mode

Thanks Annaleigh! I'll make sure to also reply here once we've worked out Oarfish to tximeta with metadata, or any other analysis pipeline updates wrt Oarfish and Bioc.

ADD REPLY • link 1 hour ago Michael Love 43k

0

Entering edit mode

More failures encountered

k_skip = tximeta(coldata = data.frame(files = m, names = names(m)), 
                 type = 'oarfish',skipMeta = TRUE)
    > updated_m <- mcols(k_skip) %>% 
    +     as.data.frame() %>% 
    +     rownames_to_column('txid') %>% 
    +     left_join(tx2gene) %>% 
    +     column_to_rownames('txid') 
    Joining with `by = join_by(txid)`
    > mcols(k_skip) <- DataFrame(updated_m)
    > gse <- tximport::summarizeToGene(k_skip,tx2gene = tx2gene)
    Error in missingMetadata(object, summarize = TRUE) : 
      use of this function requires transcriptome metadata which is missing.
      either: (1) the object was not produced by tximeta, or
      (2) tximeta could not recognize the digest of the transcriptome.
      If (2), use a linkedTxome to provide the missing metadata and rerun tximeta
      or use tx2gene, txOut=FALSE (and skipMeta=TRUE if Salmon/piscem)

ADD REPLY • link 1 hour ago brown.annaleigh • 0