Question

Problems with oarfish + tximport + swish

0

Entering edit mode

brown.annaleigh • 0

@brownannaleigh-22443

Last seen 5 hours ago

United Kingdom

I've aligned long read data with oarfish and now I wanted to try out doing differential isoform usage with fishpond + swish

I've aligned to the gencode v42 human fasta file, which has the following header conformation

 head /SAN/vyplab/vyplab_reference_genomes/sequence/human/gencode/gencode.v42.transcripts.fa
>ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA|
GTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTC
TCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGA
TGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTG

I've got a dataframe of the file paths of the data and the names.

This is fine

k = tximeta(coldata = data.frame(files = m, names = names(m)), type = 'oarfish')
y <- scaleInfReps(k) # scales counts
y <- labelKeep(y) # labels features to keep
set.seed(120)
y <- swish(y, x="condition") # simplest Swish case

The errors start appearing here:

# run on the transcript-level dataset
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE

Ok that's fine, perhaps tximeta wasn't able to properly infer the right species/version, but I should be able to manually make that

tx_2_gene <- mcols(y) %>% 
    as.data.frame() %>% 
    rownames_to_column('gene') %>% 
    separate(gene,sep = "\\|",into = c("transcript","geneCol",NA,NA,"transcript_id"
                                       ,"symbol",'tx_leng','biotype'),
             remove = FALSE) %>% 
    select(gene,geneCol)

And let's see if that works

updated_m <- mcols(y) %>% 
    as.data.frame() %>% 
    rownames_to_column('gene') %>% 
    left_join(tx_2_gene) %>% 
    column_to_rownames('gene') 

mcols(y) <- DataFrame(updated_m)
# run on the transcript-level dataset
iso <- isoformProportions(y)
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE
> "geneCol" %in% names(mcols(y)) 
[1] TRUE

Let's try the summarizeToGene gene function

gse <- summarizeToGene(k,tx2gene = tx_2_gene)
Error in missingMetadata(object, summarize = TRUE) : 
  use of this function requires transcriptome metadata which is missing.
  either: (1) the object was not produced by tximeta, or
  (2) tximeta could not recognize the digest of the transcriptome.
  If (2), use a linkedTxome to provide the missing metadata and rerun tximeta
  or use tx2gene, txOut=FALSE (and skipMeta=TRUE if Salmon/piscem)

Okay following the linkedTxome path

fastaFTP <- c("https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/gencode.v42.transcripts.fa.gz")
> gtfPath <- file.path("/Users/annaleigh/Downloads/gencode.v42.annotation.gtf.gz")
> makeLinkedTxome(indexDir='/Users/annaleigh/cluster/vyplab/first_weeks/recalled_nanopore/oarfish/', 
+                 source="localgencode", 
+                 organism="Homo sapiens",
+                 release="42", 
+                 genome="GRCH38", 
+                 fasta=fastaFTP, 
+                 gtf=gtfPath, 
+                 write=FALSE,
+                 jsonFile = "provaLinkedTxome.json")
Error: lexical error: invalid char in json text.
                                      /Users/annaleigh/cluster/vyplab/
                     (right here) ------^

So - what am I doing wrong here?

Sessioninfo

> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Ventura 13.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.20.0         fishpond_2.12.0             lubridate_1.9.4             forcats_1.0.0               purrr_1.0.2                
 [6] tidyr_1.3.1                 tibble_3.2.1                tidyverse_2.0.0             tximeta_1.24.0              IsoformSwitchAnalyzeR_2.6.0
[11] pfamAnalyzeR_1.6.0          dplyr_1.1.4                 stringr_1.5.1               readr_2.1.5                 sva_3.54.0                 
[16] genefilter_1.88.0           mgcv_1.9-1                  nlme_3.1-166                satuRn_1.14.0               DEXSeq_1.52.0              
[21] RColorBrewer_1.1-3          AnnotationDbi_1.68.0        DESeq2_1.46.0               SummarizedExperiment_1.36.0 GenomicRanges_1.58.0       
[26] GenomeInfoDb_1.42.1         IRanges_2.40.1              S4Vectors_0.44.0            MatrixGenerics_1.18.1       matrixStats_1.5.0          
[31] Biobase_2.66.0              BiocGenerics_0.52.0         BiocParallel_1.40.0         limma_3.62.2                ggplot2_3.5.1              
[36] data.table_1.16.4

fishpond tximport tximeta • 46 views

ADD COMMENT • link updated 2 hours ago by Michael Love 43k • written 6 hours ago by brown.annaleigh • 0

score 0 · Answer 1 · 2025-01-15

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 hour ago

United States

This is fine

Did tximeta() print that a matching transcriptome was found?

ADD COMMENT • link 2 hours ago Michael Love 43k

0

Entering edit mode

I'm not certain that this will work with Oarfish, the documentation only mentions Salmon and piscem.

I'd use tximport for now.

ADD REPLY • link 2 hours ago Michael Love 43k