Hi,
I am attempting to work through the workflow described in "Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification." I am running into an error message when I try to make the counts dataframe for DRIMseq: Error in data.frame(gene_id = txdf$GENEID, feature_id = txdf$TXNAME, cts) : arguments imply differing number of rows: 35367, 28234
I assume this means there are transcripts in the counts object (cts) I generated from salmon that are not present in the txdb. I'm not sure if this is related to some warning messages I got during the construction of the txdb (see below) or something else, but I'm not sure what I would need to do to fix the problem. Any help would be much appreciated! Thanks.
> sample_information=read.csv("sample_information.csv")
> files <- file.path("salmon/transcripts_quant", sample_information$sample_id, "quant.sf")
> names(files) <- sample_information$sample_id
> txi <- tximport(files, type="salmon", txOut=TRUE,
+ countsFromAbundance="scaledTPM")
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
> cts <- txi$counts
> cts <- cts[rowSums(cts) > 0,]
> library(GenomicFeatures)
> Dmel_r6.38_gtf <- "dmel-all-r6.38.gtf.gz"
> txdb.filename <- "Dmel_r6.38_gtf.sqlite"
> txdb <- makeTxDbFromGFF(Dmel_r6.38_gtf)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning messages:
1: In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This information was
ignored.
2: In makeTxDbFromGRanges(gr, metadata = metadata) :
The following transcripts were dropped because their exon ranks could not be inferred (either because the
exons are not on the same chromosome/strand or because they are not separated by introns): FBtr0084079,
FBtr0084080, FBtr0084081, FBtr0084082, FBtr0084083, FBtr0084084, FBtr0084085, FBtr0307759, FBtr0307760
3: In .reject_transcripts(bad_tx, because) :
The following transcripts were rejected because they have stop codons that cannot be mapped to an exon:
FBtr0100857, FBtr0100863, FBtr0433500, FBtr0433501
> saveDb(txdb, txdb.filename)
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: dmel-all-r6.38.gtf.gz
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 35367
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2021-04-05 12:20:51 -0600 (Mon, 05 Apr 2021)
# GenomicFeatures version at creation time: 1.42.3
# RSQLite version at creation time: 2.2.5
# DBSCHEMAVERSION: 1.2
> txdf <- select(txdb, keys(txdb, "GENEID"), "TXNAME", "GENEID")
'select()' returned 1:many mapping between keys and columns
> tab <- table(txdf$GENEID)
> txdf$ntx <- tab[match(txdf$GENEID, names(tab))]
> dtu_counts <- data.frame(gene_id=txdf$GENEID,
+ feature_id=txdf$TXNAME,
+ cts)
Error in data.frame(gene_id = txdf$GENEID, feature_id = txdf$TXNAME, cts) :
arguments imply differing number of rows: 35367, 28234
# please also include the results of running the following in an R session
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] tximport_1.18.0 IsoformSwitchAnalyzeR_1.12.0 ggplot2_3.3.3 DEXSeq_1.36.0
[5] RColorBrewer_1.1-2 DESeq2_1.30.1 BiocParallel_1.24.1 GenomicFeatures_1.42.3
[9] GenomicAlignments_1.26.0 Rsamtools_2.6.0 Biostrings_2.58.0 XVector_0.30.0
[13] SummarizedExperiment_1.20.0 MatrixGenerics_1.2.1 matrixStats_0.58.0 GenomicRanges_1.42.0
[17] GenomeInfoDb_1.26.4 GenomeInfoDbData_1.2.4 ASpli_2.0.0 AnnotationDbi_1.52.0
[21] IRanges_2.24.1 S4Vectors_0.28.1 Biobase_2.50.0 BiocGenerics_0.36.0
[25] Rsubread_2.4.3 edgeR_3.32.1 limma_3.46.0
loaded via a namespace (and not attached):
[1] backports_1.2.1 AnnotationHub_2.22.0 Hmisc_4.5-0
[4] DRIMSeq_1.18.0 BiocFileCache_1.14.0 plyr_1.8.6
[7] igraph_1.2.6 lazyeval_0.2.2 tximeta_1.8.4
[10] splines_4.0.4 digest_0.6.27 ensembldb_2.14.0
[13] htmltools_0.5.1.1 fansi_0.4.2 magrittr_2.0.1
[16] checkmate_2.0.0 memoise_2.0.0 BSgenome_1.58.0
[19] cluster_2.1.1 readr_1.4.0 annotate_1.68.0
[22] askpass_1.1 prettyunits_1.1.1 jpeg_0.1-8.1
[25] colorspace_2.0-0 blob_1.2.1 rappdirs_0.3.3
[28] xfun_0.22 dplyr_1.0.5 jsonlite_1.7.2
[31] crayon_1.4.1 RCurl_1.98-1.3 genefilter_1.72.1
[34] survival_3.2-10 VariantAnnotation_1.36.0 glue_1.4.2
[37] gtable_0.3.0 zlibbioc_1.36.0 UpSetR_1.4.0
[40] DelayedArray_0.16.3 scales_1.1.1 futile.options_1.0.1
[43] DBI_1.1.1 Rcpp_1.0.6 xtable_1.8-4
[46] progress_1.2.2 htmlTable_2.1.0 foreign_0.8-81
[49] bit_4.0.4 Formula_1.2-4 DT_0.17
[52] htmlwidgets_1.5.3 httr_1.4.2 ellipsis_0.3.1
[55] pkgconfig_2.0.3 XML_3.99-0.6 Gviz_1.34.1
[58] nnet_7.3-15 dbplyr_2.1.0 locfit_1.5-9.4
[61] utf8_1.2.1 later_1.1.0.1 tidyselect_1.1.0
[64] rlang_0.4.10 reshape2_1.4.4 BiocVersion_3.12.0
[67] munsell_0.5.0 tools_4.0.4 cachem_1.0.4
[70] generics_0.1.0 RSQLite_2.2.5 evaluate_0.14
[73] stringr_1.4.0 fastmap_1.1.0 yaml_2.2.1
[76] knitr_1.31 bit64_4.0.5 purrr_0.3.4
[79] AnnotationFilter_1.14.0 mime_0.10 formatR_1.8
[82] xml2_1.3.2 biomaRt_2.46.3 BiocStyle_2.18.1
[85] compiler_4.0.4 rstudioapi_0.13 interactiveDisplayBase_1.28.0
[88] curl_4.3 png_0.1-7 tibble_3.1.0
[91] statmod_1.4.35 geneplotter_1.68.0 stringi_1.5.3
[94] futile.logger_1.4.3 lattice_0.20-41 ProtGenerics_1.22.0
[97] Matrix_1.3-2 vctrs_0.3.7 pillar_1.5.1
[100] lifecycle_1.0.0 BiocManager_1.30.12 data.table_1.14.0
[103] bitops_1.0-6 httpuv_1.5.5 rtracklayer_1.50.0
[106] R6_2.5.0 latticeExtra_0.6-29 hwriter_1.3.2
[109] promises_1.2.0.1 gridExtra_2.3 lambda.r_1.2.4
[112] dichromat_2.0-0 MASS_7.3-53.1 assertthat_0.2.1
[115] openssl_1.4.3 withr_2.4.1 hms_1.0.0
[118] VennDiagram_1.6.20 grid_4.0.4 rpart_4.1-15
[121] tidyr_1.1.3 rmarkdown_2.7 biovizBase_1.38.0
[124] shiny_1.6.0 base64enc_0.1-3 tinytex_0.31