Hi, I am trying to split a TxDB or GRanges by chromsome.
The TxDB object is made from gencode (https://www.gencodegenes.org/human/) using:
txdb <- makeTxDbFromGFF()
Looking at the manual, to just use chr1 data for example, I would just use:
seqlevels(txdb) <- "chr1"
However, when I run:
seqlevels(txdb)
I get an error:
Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'seqlevels': invalid DB file
Ok fine, whatever, lets try to use the GRanges object I made from this TxDB object instead:
#Exons by transcript
Exons <- exonsBy(Gencode, by = "tx", use.names=TRUE)
Where running seqlevels(Exons)
outputs the names of 22 chromosomes, chrX, chrY and chrM. Perfect. However, since this is now not a txDB
object, I cant use seqlevels(txdb) <- "chr1"
Looking at the web I saw that people recommended using split()
:
ExonsSplitByChr <- split(Exons, seqlevels(Exons))
While I now have a list of lists for each chromosome, the slits each have ~4100 entries. I doubt that the genome has a perfectly matching amount of transcripts per chromosome....
Thank you for the help!
``` #Session Info:
R version 4.0.5 (2021-03-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16
Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] GenomicFeatures_1.42.3 AnnotationDbi_1.52.0 Biobase_2.50.0
[4] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7 IRanges_2.24.1
[7] S4Vectors_0.28.1 BiocGenerics_0.36.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 here_1.0.1
[3] lattice_0.20-45 png_0.1-7
[5] prettyunits_1.1.1 Rsamtools_2.6.0
[7] Biostrings_2.58.0 rprojroot_2.0.2
[9] assertthat_0.2.1 utf8_1.2.2
[11] BiocFileCache_1.14.0 R6_2.5.1
[13] RSQLite_2.2.9 httr_1.4.2
[15] pillar_1.6.4 zlibbioc_1.36.0
[17] rlang_0.4.12 progress_1.2.2
[19] curl_4.3.2 rstudioapi_0.13
[21] blob_1.2.2 Matrix_1.3-4
[23] reticulate_1.22 BiocParallel_1.24.1
[25] stringr_1.4.0 RCurl_1.98-1.5
[27] bit_4.0.4 biomaRt_2.46.3
[29] tinytex_0.35 DelayedArray_0.16.3
[31] compiler_4.0.5 rtracklayer_1.50.0
[33] xfun_0.28 pkgconfig_2.0.3
[35] askpass_1.1 openssl_1.4.5
[37] tidyselect_1.1.1 SummarizedExperiment_1.20.0
[39] tibble_3.1.6 GenomeInfoDbData_1.2.4
[41] matrixStats_0.61.0 XML_3.99-0.8
[43] fansi_0.5.0 crayon_1.4.2
[45] dplyr_1.0.7 dbplyr_2.1.1
[47] GenomicAlignments_1.26.0 bitops_1.0-7
[49] rappdirs_0.3.3 grid_4.0.5
[51] jsonlite_1.7.2 lifecycle_1.0.1
[53] DBI_1.1.1 magrittr_2.0.1
[55] stringi_1.7.6 cachem_1.0.6
[57] XVector_0.30.0 xml2_1.3.3
[59] ellipsis_0.3.2 generics_0.1.1
[61] vctrs_0.3.8 tools_4.0.5
[63] bit64_4.0.5 glue_1.5.1
[65] purrr_0.3.4 hms_1.1.1
[67] MatrixGenerics_1.2.1 fastmap_1.1.0
[69] memoise_2.0.1
```
Exons
is a GRangesList object and callingsplit()
on a GRangesList object is not a good idea. I don't even know if that works properly or what that is supposed to return.If all you want to do is mask all transcripts that are not on chr1, then
seqlevels(txdb) <- "chr1"
is definitely the way to go. So we should probably focus on the first problem you had and try to figure out whyseqlevels(txdb)
does not work on your TxDb object. Care to share the details of how you gottxdb
in the first place? Seeing the exact code you used to generate that object would help. Thanksyes definitely!
So i download the basic gene model from here: https://www.gencodegenes.org/human/
then I
and I use it from there.
No problem here:
Please show your
sessionInfo()
. Here is mine.sessionInfo():
Ok so it was because I used GTF and not the GFF3. The vinaigrettes said the command
makeTxDbFromGFF()
worked for both GTF and GFF3 so I did not think it mattered but I guess the command cant pick upcertain things if it is a GTF instead of a GFF3