Entering edit mode
I need help with a method/operation to rank each intron by transcript -- based on the granges start/end. Specifically, I'm looking for a way to generate a unique identifier corresponding to: 1st intron, 2nd intron, ... last intron for each transcript, and place them in a granges mcols() Thanks
tx<-txdbmaker::makeTxDbFromGFF("genomic.gff",format="auto",chrominfo=s,dataSource="GCA_007735645.1",organism="Venturia.effusa",taxonomyId=50376,dbxrefTag="GENEID")
int<-GenomicFeatures::intronsByTranscript(tx,use.names=TRUE)
tail(int)
GRangesList object of length 6:
$`rna-gnl|PRJNA551043|FKW77_001900-T1_mrna`
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] CP042204.1 460547-460594 -
-------
seqinfo: 21 sequences from ASM773564v1 genome
$`rna-gnl|PRJNA551043|FKW77_001938-T1_mrna`
GRanges object with 6 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] CP042204.1 466517-467022 -
[2] CP042204.1 467072-467209 -
[3] CP042204.1 467341-468214 -
[4] CP042204.1 468703-468750 -
[5] CP042204.1 469072-469125 -
[6] CP042204.1 469246-469709 -
-------
seqinfo: 21 sequences from ASM773564v1 genome
$`rna-gnl|PRJNA551043|FKW77_001959-T1_mrna`
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] CP042204.1 471047-471094 -
[2] CP042204.1 471781-471826 -
-------
seqinfo: 21 sequences from ASM773564v1 genome
...
<3 more elements>
sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] BSgenome.Vinaequalis.NCBI.ASM976944v1_1.0.0
[2] BSgenome.Vnashicola.NCBI.ASM452265v2_1.0.0
[3] BSgenome.Veffusa.NCBI.ASM773564v1_1.0.0
[4] shape_1.4.6.1
[5] RMariaDB_1.3.2
[6] modeest_2.4.0
[7] seqLogo_1.70.0
[8] GenomicDistributions_1.12.0
[9] MotifDb_1.46.0
[10] rlist_0.4.6.2
[11] msaR_0.6.0
[12] deepredeff_0.1.1
[13] seqinr_4.2-36
[14] ape_5.8
[15] msa_1.36.1
[16] OrganismDbi_1.46.0
[17] BSgenomeForge_1.4.0
[18] BSgenome_1.72.0
[19] BiocIO_1.14.0
[20] karyoploteR_1.30.0
[21] regioneR_1.36.0
[22] biovizBase_1.52.0
[23] Gviz_1.48.0
[24] AnnotationForge_1.46.0
[25] AnnotationHub_3.12.0
[26] BiocFileCache_2.12.0
[27] dbplyr_2.5.0
[28] UniProt.ws_2.44.0
[29] RSQLite_2.3.7
[30] gridExtra_2.3
[31] ggbio_1.52.0
[32] Rsubread_2.18.0
[33] ShortRead_1.62.0
[34] BiocParallel_1.38.0
[35] Rhisat2_1.20.0
[36] trackViewer_1.40.0
[37] rtracklayer_1.64.0
[38] R.utils_2.12.3
[39] R.oo_1.26.0
[40] R.methodsS3_1.8.2
[41] RColorBrewer_1.1-3
[42] scatterplot3d_0.3-44
[43] ggplot2_3.5.1
[44] gplots_3.1.3.1
[45] Cairo_1.6-2
[46] motifStack_1.48.0
[47] openxlsx_4.2.6.1
[48] GenomicFeatures_1.56.0
[49] AnnotationDbi_1.66.0
[50] GenomicAlignments_1.40.0
[51] Rsamtools_2.20.0
[52] Biostrings_2.72.1
[53] XVector_0.44.0
[54] SummarizedExperiment_1.34.0
[55] Biobase_2.64.0
[56] MatrixGenerics_1.16.0
[57] matrixStats_1.3.0
[58] GenomicRanges_1.56.1
[59] GenomeInfoDb_1.40.1
[60] IRanges_2.38.1
[61] S4Vectors_0.42.1
[62] BiocGenerics_0.50.0
[63] devtools_2.4.5
[64] usethis_3.0.0
loaded via a namespace (and not attached):
[1] fs_1.6.4 ProtGenerics_1.36.0
[3] bitops_1.0-8 DirichletMultinomial_1.46.0
[5] TFBSTools_1.42.0 httr_1.4.7
[7] InteractionSet_1.32.0 profvis_0.3.8
[9] tools_4.4.1 backports_1.5.0
[11] utf8_1.2.4 R6_2.5.1
[13] lazyeval_0.2.2 rhdf5filters_1.16.0
[15] urlchecker_1.0.1 withr_3.0.1
[17] prettyunits_1.2.0 GGally_2.2.1
[19] cli_3.6.3 SGSeq_1.38.0
[21] grImport_0.9-7 readr_2.1.5
[23] txdbmaker_1.0.1 foreign_0.8-87
[25] dichromat_2.0-0.1 sessioninfo_1.2.2
[27] plotrix_3.8-4 rstudioapi_0.16.0
[29] generics_0.1.3 hwriter_1.3.2.1
[31] gtools_3.9.5 dplyr_1.1.4
[33] zip_2.3.1 GO.db_3.19.1
[35] Matrix_1.7-0 interp_1.1-6
[37] fansi_1.0.6 abind_1.4-5
[39] lifecycle_1.0.4 yaml_2.3.10
[41] rhdf5_2.48.0 SparseArray_1.4.8
[43] blob_1.2.4 promises_1.3.0
[45] crayon_1.5.3 pwalign_1.0.0
[47] miniUI_0.1.1.1 lattice_0.22-6
[49] annotate_1.82.0 KEGGREST_1.44.1
[51] pillar_1.9.0 knitr_1.48
[53] statip_0.2.3 rjson_0.2.21
[55] codetools_0.2-20 strawr_0.0.92
[57] glue_1.7.0 data.table_1.15.4
[59] remotes_2.5.0 vctrs_0.6.5
[61] png_0.1-8 gtable_0.3.5
[63] poweRlaw_0.80.0 cachem_1.1.0
[65] xfun_0.46 S4Arrays_1.4.1
[67] mime_0.12 pracma_2.4.4
[69] timeDate_4032.109 ellipsis_0.3.2
[71] nlme_3.1-165 bit64_4.0.5
[73] progress_1.2.3 filelock_1.0.3
[75] fBasics_4032.96 KernSmooth_2.23-24
[77] rpart_4.1.23 splitstackshape_1.4.8
[79] colorspace_2.1-1 DBI_1.2.3
[81] Hmisc_5.1-3 nnet_7.3-19
[83] ade4_1.7-22 tidyselect_1.2.1
[85] timeSeries_4032.109 bit_4.0.5
[87] compiler_4.4.1 curl_5.2.1
[89] httr2_1.0.2 graph_1.82.0
[91] rjsoncons_1.3.1 htmlTable_2.4.3
[93] bezier_1.1.2 xml2_1.3.6
[95] DelayedArray_0.30.1 checkmate_2.3.2
[97] scales_1.3.0 caTools_1.18.2
[99] spatial_7.3-17 RBGL_1.80.0
[101] rappdirs_0.3.3 stringr_1.5.1
[103] digest_0.6.36 rmarkdown_2.28
[105] htmltools_0.5.8.1 pkgconfig_2.0.3
[107] jpeg_0.1-10 base64enc_0.1-3
[109] stabledist_0.7-2 fastmap_1.2.0
[111] ensembldb_2.28.0 rlang_1.1.4
[113] htmlwidgets_1.6.4 UCSC.utils_1.0.0
[115] shiny_1.9.1 jsonlite_1.8.8
[117] VariantAnnotation_1.50.0 RCurl_1.98-1.16
[119] magrittr_2.0.3 Formula_1.2-5
[121] GenomeInfoDbData_1.2.12 Rhdf5lib_1.26.0
[123] munsell_0.5.1 Rcpp_1.0.13
[125] reticulate_1.38.0 bamsignals_1.36.0
[127] stringi_1.8.4 stable_1.1.6
[129] zlibbioc_1.50.0 MASS_7.3-61
[131] plyr_1.8.9 pkgbuild_1.4.4
[133] ggstats_0.6.0 parallel_4.4.1
[135] deldir_2.0-4 CNEr_1.40.0
[137] hms_1.1.3 igraph_2.0.3
[139] RUnit_0.4.33 rmutil_1.1.10
[141] reshape2_1.4.4 biomaRt_2.60.1
[143] pkgload_1.4.0 TFMPvalue_0.0.9
[145] BiocVersion_3.19.1 XML_3.99-0.17
[147] evaluate_0.24.0 latticeExtra_0.6-30
[149] BiocManager_1.30.23 tzdb_0.4.0
[151] httpuv_1.6.15 tidyr_1.3.1
[153] purrr_1.0.2 clue_0.3-65
[155] BiocBaseUtils_1.6.0 xtable_1.8-4
[157] restfulr_0.0.15 AnnotationFilter_1.28.0
[159] later_1.3.2 tibble_3.2.1
[161] memoise_2.0.1 cluster_2.1.6
There's an error in the last line. Note that
sapply(lengths(z), seq_len)
will return alist
because the lengths vary. We need a vector, so have to convert that list to a vector. There are (at least) two ways to do that, using eitherunlist
or (my preference)do.call
, which is (probably) more likely to do what you expect.Hey, In revisiting this question, the following code works for the '+' strand. I am having issues with demarcating intron rank by transcript on '-' strand introns. Negative strand intron ranks by transcript are in the reverse orientation....(e.g. the output is 1-2-3-4-5 when it should be 5-4-3-2-1) as granges 'end' and 'start' should be inverted for the negative strand intron ranking convention. Is there a subsetting operation that can address this?