I've aligned long read data with oarfish and now I wanted to try out doing differential isoform usage with fishpond + swish
I've aligned to the gencode v42 human fasta file, which has the following header conformation
head /SAN/vyplab/vyplab_reference_genomes/sequence/human/gencode/gencode.v42.transcripts.fa
>ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA|
GTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTC
TCTTAGCCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGA
TGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTG
I've got a dataframe of the file paths of the data and the names.
This is fine
k = tximeta(coldata = data.frame(files = m, names = names(m)), type = 'oarfish')
y <- scaleInfReps(k) # scales counts
y <- labelKeep(y) # labels features to keep
set.seed(120)
y <- swish(y, x="condition") # simplest Swish case
The errors start appearing here:
# run on the transcript-level dataset
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE
Ok that's fine, perhaps tximeta wasn't able to properly infer the right species/version, but I should be able to manually make that
tx_2_gene <- mcols(y) %>%
as.data.frame() %>%
rownames_to_column('gene') %>%
separate(gene,sep = "\\|",into = c("transcript","geneCol",NA,NA,"transcript_id"
,"symbol",'tx_leng','biotype'),
remove = FALSE) %>%
select(gene,geneCol)
And let's see if that works
updated_m <- mcols(y) %>%
as.data.frame() %>%
rownames_to_column('gene') %>%
left_join(tx_2_gene) %>%
column_to_rownames('gene')
mcols(y) <- DataFrame(updated_m)
# run on the transcript-level dataset
iso <- isoformProportions(y)
iso <- isoformProportions(y)
Error in isoformProportions(y) : geneCol %in% names(mcols(y)) is not TRUE
> "geneCol" %in% names(mcols(y))
[1] TRUE
Let's try the summarizeToGene gene function
gse <- summarizeToGene(k,tx2gene = tx_2_gene)
Error in missingMetadata(object, summarize = TRUE) :
use of this function requires transcriptome metadata which is missing.
either: (1) the object was not produced by tximeta, or
(2) tximeta could not recognize the digest of the transcriptome.
If (2), use a linkedTxome to provide the missing metadata and rerun tximeta
or use tx2gene, txOut=FALSE (and skipMeta=TRUE if Salmon/piscem)
Okay following the linkedTxome path
fastaFTP <- c("https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/gencode.v42.transcripts.fa.gz")
> gtfPath <- file.path("/Users/annaleigh/Downloads/gencode.v42.annotation.gtf.gz")
> makeLinkedTxome(indexDir='/Users/annaleigh/cluster/vyplab/first_weeks/recalled_nanopore/oarfish/',
+ source="localgencode",
+ organism="Homo sapiens",
+ release="42",
+ genome="GRCH38",
+ fasta=fastaFTP,
+ gtf=gtfPath,
+ write=FALSE,
+ jsonFile = "provaLinkedTxome.json")
Error: lexical error: invalid char in json text.
/Users/annaleigh/cluster/vyplab/
(right here) ------^
So - what am I doing wrong here?
Sessioninfo
> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Ventura 13.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Hs.eg.db_3.20.0 fishpond_2.12.0 lubridate_1.9.4 forcats_1.0.0 purrr_1.0.2
[6] tidyr_1.3.1 tibble_3.2.1 tidyverse_2.0.0 tximeta_1.24.0 IsoformSwitchAnalyzeR_2.6.0
[11] pfamAnalyzeR_1.6.0 dplyr_1.1.4 stringr_1.5.1 readr_2.1.5 sva_3.54.0
[16] genefilter_1.88.0 mgcv_1.9-1 nlme_3.1-166 satuRn_1.14.0 DEXSeq_1.52.0
[21] RColorBrewer_1.1-3 AnnotationDbi_1.68.0 DESeq2_1.46.0 SummarizedExperiment_1.36.0 GenomicRanges_1.58.0
[26] GenomeInfoDb_1.42.1 IRanges_2.40.1 S4Vectors_0.44.0 MatrixGenerics_1.18.1 matrixStats_1.5.0
[31] Biobase_2.66.0 BiocGenerics_0.52.0 BiocParallel_1.40.0 limma_3.62.2 ggplot2_3.5.1
[36] data.table_1.16.4
I'm not certain that this will work with Oarfish, the documentation only mentions Salmon and piscem.
I'd use tximport for now.