Problems with oarfish + tximport + swish
1
0
Entering edit mode
@brownannaleigh-22443
Last seen 5 hours ago
United Kingdom

I've aligned long read data with oarfish and now I wanted to try out doing differential isoform usage with fishpond + swish

I've aligned to the gencode v42 human fasta file, which has the following header conformation

 head /SAN/vyplab/vyplab_reference_genomes/sequence/human/gencode/gencode.v42.transcripts.fa
>ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA|
GTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTC
TCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGA
TGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTG

I've got a dataframe of the file paths of the data and the names.

This is fine

k = tximeta(coldata = data.frame(files = m, names = names(m)), type = 'oarfish')
y <- scaleInfReps(k) # scales counts
y <- labelKeep(y) # labels features to keep
set.seed(120)
y <- swish(y, x="condition") # simplest Swish case

The errors start appearing here:

# run on the transcript-level dataset
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE

Ok that's fine, perhaps tximeta wasn't able to properly infer the right species/version, but I should be able to manually make that

tx_2_gene <- mcols(y) %>% 
    as.data.frame() %>% 
    rownames_to_column('gene') %>% 
    separate(gene,sep = "\\|",into = c("transcript","geneCol",NA,NA,"transcript_id"
                                       ,"symbol",'tx_leng','biotype'),
             remove = FALSE) %>% 
    select(gene,geneCol)

And let's see if that works

updated_m <- mcols(y) %>% 
    as.data.frame() %>% 
    rownames_to_column('gene') %>% 
    left_join(tx_2_gene) %>% 
    column_to_rownames('gene') 

mcols(y) <- DataFrame(updated_m)
# run on the transcript-level dataset
iso <- isoformProportions(y)
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE
> "geneCol" %in% names(mcols(y)) 
[1] TRUE

Let's try the summarizeToGene gene function

gse <- summarizeToGene(k,tx2gene = tx_2_gene)
Error in missingMetadata(object, summarize = TRUE) : 
  use of this function requires transcriptome metadata which is missing.
  either: (1) the object was not produced by tximeta, or
  (2) tximeta could not recognize the digest of the transcriptome.
  If (2), use a linkedTxome to provide the missing metadata and rerun tximeta
  or use tx2gene, txOut=FALSE (and skipMeta=TRUE if Salmon/piscem)

Okay following the linkedTxome path

fastaFTP <- c("https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/gencode.v42.transcripts.fa.gz")
> gtfPath <- file.path("/Users/annaleigh/Downloads/gencode.v42.annotation.gtf.gz")
> makeLinkedTxome(indexDir='/Users/annaleigh/cluster/vyplab/first_weeks/recalled_nanopore/oarfish/', 
+                 source="localgencode", 
+                 organism="Homo sapiens",
+                 release="42", 
+                 genome="GRCH38", 
+                 fasta=fastaFTP, 
+                 gtf=gtfPath, 
+                 write=FALSE,
+                 jsonFile = "provaLinkedTxome.json")
Error: lexical error: invalid char in json text.
                                      /Users/annaleigh/cluster/vyplab/
                     (right here) ------^

So - what am I doing wrong here?

Sessioninfo

> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Ventura 13.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.20.0         fishpond_2.12.0             lubridate_1.9.4             forcats_1.0.0               purrr_1.0.2                
 [6] tidyr_1.3.1                 tibble_3.2.1                tidyverse_2.0.0             tximeta_1.24.0              IsoformSwitchAnalyzeR_2.6.0
[11] pfamAnalyzeR_1.6.0          dplyr_1.1.4                 stringr_1.5.1               readr_2.1.5                 sva_3.54.0                 
[16] genefilter_1.88.0           mgcv_1.9-1                  nlme_3.1-166                satuRn_1.14.0               DEXSeq_1.52.0              
[21] RColorBrewer_1.1-3          AnnotationDbi_1.68.0        DESeq2_1.46.0               SummarizedExperiment_1.36.0 GenomicRanges_1.58.0       
[26] GenomeInfoDb_1.42.1         IRanges_2.40.1              S4Vectors_0.44.0            MatrixGenerics_1.18.1       matrixStats_1.5.0          
[31] Biobase_2.66.0              BiocGenerics_0.52.0         BiocParallel_1.40.0         limma_3.62.2                ggplot2_3.5.1              
[36] data.table_1.16.4
fishpond tximport tximeta • 46 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

This is fine

Did tximeta() print that a matching transcriptome was found?

ADD COMMENT
0
Entering edit mode

I'm not certain that this will work with Oarfish, the documentation only mentions Salmon and piscem.

I'd use tximport for now.

ADD REPLY

Login before adding your answer.

Traffic: 623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6