Error generating counts df for use with DRIMSeq/DEXseq
0
0
Entering edit mode
jbono ▴ 10
@jbono-7682
Last seen 11 months ago
United States

Hi,

I am attempting to work through the workflow described in "Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification." I am running into an error message when I try to make the counts dataframe for DRIMseq: Error in data.frame(gene_id = txdf$GENEID, feature_id = txdf$TXNAME, cts) : arguments imply differing number of rows: 35367, 28234

I assume this means there are transcripts in the counts object (cts) I generated from salmon that are not present in the txdb. I'm not sure if this is related to some warning messages I got during the construction of the txdb (see below) or something else, but I'm not sure what I would need to do to fix the problem. Any help would be much appreciated! Thanks.

> sample_information=read.csv("sample_information.csv")
> files <- file.path("salmon/transcripts_quant", sample_information$sample_id, "quant.sf")
> names(files) <- sample_information$sample_id
> txi <- tximport(files, type="salmon", txOut=TRUE,
+                 countsFromAbundance="scaledTPM")
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
> cts <- txi$counts
> cts <- cts[rowSums(cts) > 0,]
> library(GenomicFeatures)
> Dmel_r6.38_gtf <- "dmel-all-r6.38.gtf.gz"
> txdb.filename <- "Dmel_r6.38_gtf.sqlite"
> txdb <- makeTxDbFromGFF(Dmel_r6.38_gtf)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning messages:
1: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information was
  ignored.
2: In makeTxDbFromGRanges(gr, metadata = metadata) :
  The following transcripts were dropped because their exon ranks could not be inferred (either because the
  exons are not on the same chromosome/strand or because they are not separated by introns): FBtr0084079,
  FBtr0084080, FBtr0084081, FBtr0084082, FBtr0084083, FBtr0084084, FBtr0084085, FBtr0307759, FBtr0307760
3: In .reject_transcripts(bad_tx, because) :
  The following transcripts were rejected because they have stop codons that cannot be mapped to an exon:
  FBtr0100857, FBtr0100863, FBtr0433500, FBtr0433501
> saveDb(txdb, txdb.filename)
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: dmel-all-r6.38.gtf.gz
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 35367
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2021-04-05 12:20:51 -0600 (Mon, 05 Apr 2021)
# GenomicFeatures version at creation time: 1.42.3
# RSQLite version at creation time: 2.2.5
# DBSCHEMAVERSION: 1.2
> txdf <- select(txdb, keys(txdb, "GENEID"), "TXNAME", "GENEID")
'select()' returned 1:many mapping between keys and columns
> tab <- table(txdf$GENEID)
> txdf$ntx <- tab[match(txdf$GENEID, names(tab))]
> dtu_counts <- data.frame(gene_id=txdf$GENEID,
+                          feature_id=txdf$TXNAME,
+                          cts)
Error in data.frame(gene_id = txdf$GENEID, feature_id = txdf$TXNAME, cts) : 
  arguments imply differing number of rows: 35367, 28234

# please also include the results of running the following in an R session 

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tximport_1.18.0              IsoformSwitchAnalyzeR_1.12.0 ggplot2_3.3.3                DEXSeq_1.36.0               
 [5] RColorBrewer_1.1-2           DESeq2_1.30.1                BiocParallel_1.24.1          GenomicFeatures_1.42.3      
 [9] GenomicAlignments_1.26.0     Rsamtools_2.6.0              Biostrings_2.58.0            XVector_0.30.0              
[13] SummarizedExperiment_1.20.0  MatrixGenerics_1.2.1         matrixStats_0.58.0           GenomicRanges_1.42.0        
[17] GenomeInfoDb_1.26.4          GenomeInfoDbData_1.2.4       ASpli_2.0.0                  AnnotationDbi_1.52.0        
[21] IRanges_2.24.1               S4Vectors_0.28.1             Biobase_2.50.0               BiocGenerics_0.36.0         
[25] Rsubread_2.4.3               edgeR_3.32.1                 limma_3.46.0                

loaded via a namespace (and not attached):
  [1] backports_1.2.1               AnnotationHub_2.22.0          Hmisc_4.5-0                  
  [4] DRIMSeq_1.18.0                BiocFileCache_1.14.0          plyr_1.8.6                   
  [7] igraph_1.2.6                  lazyeval_0.2.2                tximeta_1.8.4                
 [10] splines_4.0.4                 digest_0.6.27                 ensembldb_2.14.0             
 [13] htmltools_0.5.1.1             fansi_0.4.2                   magrittr_2.0.1               
 [16] checkmate_2.0.0               memoise_2.0.0                 BSgenome_1.58.0              
 [19] cluster_2.1.1                 readr_1.4.0                   annotate_1.68.0              
 [22] askpass_1.1                   prettyunits_1.1.1             jpeg_0.1-8.1                 
 [25] colorspace_2.0-0              blob_1.2.1                    rappdirs_0.3.3               
 [28] xfun_0.22                     dplyr_1.0.5                   jsonlite_1.7.2               
 [31] crayon_1.4.1                  RCurl_1.98-1.3                genefilter_1.72.1            
 [34] survival_3.2-10               VariantAnnotation_1.36.0      glue_1.4.2                   
 [37] gtable_0.3.0                  zlibbioc_1.36.0               UpSetR_1.4.0                 
 [40] DelayedArray_0.16.3           scales_1.1.1                  futile.options_1.0.1         
 [43] DBI_1.1.1                     Rcpp_1.0.6                    xtable_1.8-4                 
 [46] progress_1.2.2                htmlTable_2.1.0               foreign_0.8-81               
 [49] bit_4.0.4                     Formula_1.2-4                 DT_0.17                      
 [52] htmlwidgets_1.5.3             httr_1.4.2                    ellipsis_0.3.1               
 [55] pkgconfig_2.0.3               XML_3.99-0.6                  Gviz_1.34.1                  
 [58] nnet_7.3-15                   dbplyr_2.1.0                  locfit_1.5-9.4               
 [61] utf8_1.2.1                    later_1.1.0.1                 tidyselect_1.1.0             
 [64] rlang_0.4.10                  reshape2_1.4.4                BiocVersion_3.12.0           
 [67] munsell_0.5.0                 tools_4.0.4                   cachem_1.0.4                 
 [70] generics_0.1.0                RSQLite_2.2.5                 evaluate_0.14                
 [73] stringr_1.4.0                 fastmap_1.1.0                 yaml_2.2.1                   
 [76] knitr_1.31                    bit64_4.0.5                   purrr_0.3.4                  
 [79] AnnotationFilter_1.14.0       mime_0.10                     formatR_1.8                  
 [82] xml2_1.3.2                    biomaRt_2.46.3                BiocStyle_2.18.1             
 [85] compiler_4.0.4                rstudioapi_0.13               interactiveDisplayBase_1.28.0
 [88] curl_4.3                      png_0.1-7                     tibble_3.1.0                 
 [91] statmod_1.4.35                geneplotter_1.68.0            stringi_1.5.3                
 [94] futile.logger_1.4.3           lattice_0.20-41               ProtGenerics_1.22.0          
 [97] Matrix_1.3-2                  vctrs_0.3.7                   pillar_1.5.1                 
[100] lifecycle_1.0.0               BiocManager_1.30.12           data.table_1.14.0            
[103] bitops_1.0-6                  httpuv_1.5.5                  rtracklayer_1.50.0           
[106] R6_2.5.0                      latticeExtra_0.6-29           hwriter_1.3.2                
[109] promises_1.2.0.1              gridExtra_2.3                 lambda.r_1.2.4               
[112] dichromat_2.0-0               MASS_7.3-53.1                 assertthat_0.2.1             
[115] openssl_1.4.3                 withr_2.4.1                   hms_1.0.0                    
[118] VennDiagram_1.6.20            grid_4.0.4                    rpart_4.1-15                 
[121] tidyr_1.1.3                   rmarkdown_2.7                 biovizBase_1.38.0            
[124] shiny_1.6.0                   base64enc_0.1-3               tinytex_0.31
DRIMSeq DEXSeq • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 912 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6