I've aligned long read data with oarfish and now I wanted to try out doing differential isoform usage with fishpond + swish
I've aligned to the gencode v42 human fasta file, which has the following header conformation
head /SAN/vyplab/vyplab_reference_genomes/sequence/human/gencode/gencode.v42.transcripts.fa
>ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA|
GTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTC
TCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGA
TGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTG
I've got a dataframe of the file paths of the data and the names.
This is fine
k = tximeta(coldata = data.frame(files = m, names = names(m)), type = 'oarfish')
y <- scaleInfReps(k) # scales counts
y <- labelKeep(y) # labels features to keep
set.seed(120)
y <- swish(y, x="condition") # simplest Swish case
The errors start appearing here:
# run on the transcript-level dataset
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE
Ok that's fine, perhaps tximeta wasn't able to properly infer the right species/version, but I should be able to manually make that
tx_2_gene <- mcols(y) %>%
as.data.frame() %>%
rownames_to_column('gene') %>%
separate(gene,sep = "\\|",into = c("transcript","geneCol",NA,NA,"transcript_id"
,"symbol",'tx_leng','biotype'),
remove = FALSE) %>%
select(gene,geneCol)
And let's see if that works
updated_m <- mcols(y) %>%
as.data.frame() %>%
rownames_to_column('gene') %>%
left_join(tx_2_gene) %>%
column_to_rownames('gene')
mcols(y) <- DataFrame(updated_m)
# run on the transcript-level dataset
iso <- isoformProportions(y)
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE
> "geneCol" %in% names(mcols(y))
[1] TRUE
Let's try the summarizeToGene gene function
gse <- summarizeToGene(k,tx2gene = tx_2_gene)
Error in missingMetadata(object, summarize = TRUE) :
use of this function requires transcriptome metadata which is missing.
either: (1) the object was not produced by tximeta, or
(2) tximeta could not recognize the digest of the transcriptome.
If (2), use a linkedTxome to provide the missing metadata and rerun tximeta
or use tx2gene, txOut=FALSE (and skipMeta=TRUE if Salmon/piscem)
Okay following the linkedTxome path
fastaFTP <- c("https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/gencode.v42.transcripts.fa.gz")
> gtfPath <- file.path("/Users/annaleigh/Downloads/gencode.v42.annotation.gtf.gz")
> makeLinkedTxome(indexDir='/Users/annaleigh/cluster/vyplab/first_weeks/recalled_nanopore/oarfish/',
+ source="localgencode",
+ organism="Homo sapiens",
+ release="42",
+ genome="GRCH38",
+ fasta=fastaFTP,
+ gtf=gtfPath,
+ write=FALSE,
+ jsonFile = "provaLinkedTxome.json")
Error: lexical error: invalid char in json text.
/Users/annaleigh/cluster/vyplab/
(right here) ------^
So - what am I doing wrong here?
Sessioninfo
> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Ventura 13.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Hs.eg.db_3.20.0 fishpond_2.12.0 lubridate_1.9.4 forcats_1.0.0 purrr_1.0.2
[6] tidyr_1.3.1 tibble_3.2.1 tidyverse_2.0.0 tximeta_1.24.0 IsoformSwitchAnalyzeR_2.6.0
[11] pfamAnalyzeR_1.6.0 dplyr_1.1.4 stringr_1.5.1 readr_2.1.5 sva_3.54.0
[16] genefilter_1.88.0 mgcv_1.9-1 nlme_3.1-166 satuRn_1.14.0 DEXSeq_1.52.0
[21] RColorBrewer_1.1-3 AnnotationDbi_1.68.0 DESeq2_1.46.0 SummarizedExperiment_1.36.0 GenomicRanges_1.58.0
[26] GenomeInfoDb_1.42.1 IRanges_2.40.1 S4Vectors_0.44.0 MatrixGenerics_1.18.1 matrixStats_1.5.0
[31] Biobase_2.66.0 BiocGenerics_0.52.0 BiocParallel_1.40.0 limma_3.62.2 ggplot2_3.5.1
[36] data.table_1.16.4
I'm not certain that this will work with Oarfish, the documentation only mentions Salmon and piscem.
I'd use tximport for now.
Success
Failure
Thank you for any advice
Lol I'm not smart
This solution didn't work because....the column name was wrong
however, tximport does not produce a format that the fishpond data can read out of the gate