Number of IRanges elements in TCGA HiSeqV2 dataset
1
@biomiha-11346
Last seen 5 months ago
UK/Cambridge
I am trying to analyse certain aspects of the TCGA RNA seq dataset. I downloaded a RangedSummarizedExperiment file (.Rdata) from the recount2 website: https://jhubiostatistics.shinyapps.io/recount/
I am now trying to filter the subset dataset to include only protein coding regions. If I look at:
rowData(rse_gene)$symbol I get an IRanges object with 58037 elements corresponding to the transcripts but if I unlist this object it suddenly becomes a vector with length = 58716. Sub-setting is then impossible because I get indexes that are out of bounds.
Can anyone clarify this discrepancy please?
r
tcga
recount
iranges
• 1.6k views
@lcolladotor
Last seen 16 days ago
United States
Hi,
The `symbol` is a CharacterList. Some might have more than one symbol as shown below.
Best,
Leonardo
> library(recount)
> rowData(rse_gene_SRP009615)$symbol
CharacterList of length 58037
[["ENSG00000000003"]] TSPAN6
[["ENSG00000000005"]] TNMD
[["ENSG00000000419"]] DPM1
[["ENSG00000000457"]] SCYL3
[["ENSG00000000460"]] C1orf112
[["ENSG00000000938"]] FGR
[["ENSG00000000971"]] CFH
[["ENSG00000001036"]] FUCA2
[["ENSG00000001084"]] GCLC
[["ENSG00000001167"]] NFYA
...
<58027 more elements>
> table(elementNROWS(rowData(rse_gene_SRP009615)$symbol))
1 2 3 4 5 6 7 8
57460 517 44 4 4 3 4 1
> sum(table(elementNROWS(rowData(rse_gene_SRP009615)$symbol)) * 1:8)
[1] 58716
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] recount_1.2.1 SummarizedExperiment_1.6.3 DelayedArray_0.2.7
[4] matrixStats_0.52.2 Biobase_2.36.2 GenomicRanges_1.28.3
[7] GenomeInfoDb_1.12.1 IRanges_2.10.2 S4Vectors_0.14.3
[10] BiocGenerics_0.22.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 jsonlite_1.5 splines_3.4.0 foreach_1.4.3
[5] GenomicFiles_1.12.0 Formula_1.2-1 bumphunter_1.17.2 latticeExtra_0.6-28
[9] doRNG_1.6.6 derfinder_1.10.4 BSgenome_1.44.0 GenomeInfoDbData_0.99.0
[13] Rsamtools_1.28.0 RSQLite_1.1-2 backports_1.1.0 lattice_0.20-35
[17] downloader_0.4 digest_0.6.12 RColorBrewer_1.1-2 XVector_0.16.0
[21] checkmate_1.8.2 qvalue_2.8.0 colorspace_1.3-2 htmltools_0.3.6
[25] Matrix_1.2-10 plyr_1.8.4 GEOquery_2.42.0 XML_3.98-1.7
[29] biomaRt_2.32.0 zlibbioc_1.22.0 xtable_1.8-2 scales_0.4.1
[33] BiocParallel_1.10.1 htmlTable_1.9 tibble_1.3.3 pkgmaker_0.22
[37] ggplot2_2.2.1 GenomicFeatures_1.28.2 nnet_7.3-12 lazyeval_0.2.0
[41] survival_2.41-3 magrittr_1.5 memoise_1.1.0 foreign_0.8-68
[45] tools_3.4.0 registry_0.3 data.table_1.10.4 stringr_1.2.0
[49] munsell_0.4.3 locfit_1.5-9.1 cluster_2.0.6 rngtools_1.2.4
[53] AnnotationDbi_1.38.1 Biostrings_2.44.1 compiler_3.4.0 rlang_0.1.1
[57] grid_3.4.0 RCurl_1.95-4.8 iterators_1.0.8 VariantAnnotation_1.22.1
[61] htmlwidgets_0.8 bitops_1.0-6 base64enc_0.1-3 rentrez_1.1.0
[65] derfinderHelper_1.10.0 gtable_0.2.0 codetools_0.2-15 DBI_0.6-1
[69] reshape2_1.4.2 R6_2.2.1 GenomicAlignments_1.12.1 gridExtra_2.2.1
[73] knitr_1.16 rtracklayer_1.36.3 Hmisc_4.0-3 stringi_1.1.5
[77] Rcpp_0.12.11 rpart_4.1-11 acepack_1.4.1
>
Login before adding your answer.
Traffic: 591 users visited in the last hour
Awesome! Thank you sir.